DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner notes the entry of the following papers:
Amended claims filed 10/26/2022.
Applicant arguments/remarks made in amendment filed 10/26/2022.
Claims 1-2, 6-7, 11-12, and 16-17 are amended.
Claims 1-20 are pending.
Response to Arguments
Applicant presents several arguments.  Each is addressed.
Examiner notes for clarity that pages 9-10 have been recited twice in the Remarks section.
Applicant argues that “Kim fails to teach or suggest ‘wherein the plurality of first processing elements is configured to perform an operation between the input first non-zero element and the input second non-zero element, based on depth information of the first non-zero element and depth information of the second non-zero element.’ (Emphasis added to indicate what the Examiner ignored in the claim.)”. (Remarks, page 12, paragraph 3, line 3.) However, the previous office action addressed depth information (see previous office action, page 5, paragraph 4.c.). Nevertheless, the emphasized portion of the above limitation has been deleted in the amended claim. Therefore the argument is moot.
Applicant argues that “Further, it is respectfully submitted the Examiner did not consider the recitation of “based on depth information of the first non-zero element and depth information of the second non-zero element.” (Remarks, page 13, paragraph 1, line 1.)  As mentioned above, the previous office action addressed depth information.  Nevertheless, the limitation has been deleted.  Therefore, the argument is moot.
Applicant argues that “Kim does not disclose the feature or depth information…as recited in amended Claim 1.” (Remarks, page 13, paragraph 5, line 1.)  The argument is moot in view of new grounds of rejection necessitated by amendment.  See detailed rejection.
Applicant argues that “Without conceding the patentability per se of the dependent claims, these claims are believed to be patentable over the cited prior art for at least the reasons given above with regard to their respective independent claims.” (Remarks, page 14, paragraph 2, line 1.)  However, the independent claims remain rejected, therefore the dependent claims remain rejected as well.
	
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 11-14 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al (Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks, herein Chen), Kim et al (A Novel Zero Weight/Activation-Aware Hardware Architecture of Convolutional Neural Network), Graham (Sparse 3D convolutional neural networks, herein Graham), and Chen, X. (Escoin: Efficient Convolutional Neural Network Inference on GPUs, herein Chen-2.). 
Regarding claim 1,
	Chen teaches an electronic apparatus for performing deep learning (Chen, page 127, column 2, paragraph 4, Line 1 “In this paper, we have implemented and fabricated a CNN accelerator, called Eyeriss, that can support high through-put CNN inference and optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip DRAM.” In other words, CNN accelerator is electronic apparatus, and support high through-put CNN inference is performing deep learning.) , the electronic apparatus comprising:
	a storage configured to store target data and kernel data (Chen, page 128, column 1, paragraph 1, line 2 “The main features of Eyeriss are as follows:
A spatial architecture using an array of 168 processing elements (PEs) that creates a four-level memory hierarchy.  Data movement can exploit the low-cost levels, such as the PE scratch pads (spads) and the inter-PE communication, to minimize data accesses to the high-cost levels, including the large on-chip global buffer (GLB) and the off-chip DRAM.

A CNN dataflow called Row Stationary (RS) that reconfigures the spatial architecture to map the computation of a given CNN shape and optimize for the best energy efficiency.

A network-on-chip (NoC) architecture that uses both multicast and point-to-point single-cycle data delivery to support the RS dataflow.

Run-length compression (RLC) and PE data gating that exploit the statistics of zero data in CNNs to further improve energy efficiency.” 

In other words, scratch pads, global buffer, and DRAM are storage configured to store target data and kernel data.) ; and
	a processor including a plurality of processing elements that are arranged in a matrix shape (Chen, Fig. 2, and page 127, column 2, paragraph 4, line 1 “In this paper, we have implemented and fabricated a CNN accelerator, called Eyeriss, that can support high through-put CNN inference and optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip DRAM.” 

    PNG
    media_image1.png
    271
    598
    media_image1.png
    Greyscale

In other words, CNN accelerator is processor, and from Fig. 2, 12x14 PE array is plurality of processing elements that are arranged in a matrix shape.) , wherein the processor is configured to:
	input, to each of the plurality of processing elements, a first non-zero element from among a plurality of first elements included in the target data (Chen, page 132, column 2, paragraph 4, line 1, and Figs. 4 and 8 “RLC is used in Eyeriss to exploit the zeros in fmaps and save DRAM bandwidth. Fig. 8 shows an example of RLC encoding. Consecutive zeros with a maximum run length of 31 are represented using a 5-b number as the Run.  The next value is inserted directly as a 16-b Level, and the count for run starts again.” And, page 132, column 2, paragraph 5, line 2 “The accelerator reads the encoded ifmaps from DRAM, decompresses it with the RLC decoder, and writes it into the GLB.” and page 133, column 2, paragraph, 3, line 1 “The global input network (GIN) is optimized for a single-cycle multicast from the GLB to a group of PEs that receive the same filter weight, ifmap value, or psum.”

    PNG
    media_image2.png
    343
    1183
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    199
    621
    media_image3.png
    Greyscale

In other words, ifmap being input to the rows of PEs (see Fig. 4b) is input to each of the plurality of processing elements, RLC encoding (see Fig. 8) is non-zero elements, and ifmap is target data. Examiner notes that the claimed invention does not explain how the first non-zero element is found from the target data in either the claims or the specification.  Nor does the claimed invention explain how the original places of non-zero elements are saved for replacement in either the target data or kernel data to be used in future convolutions in either the claims or the specification. (See instant application, specification, page 8, line 1 “The processor 120 may input a first non-zero element among the plurality of first elements included in the target data to each of the plurality of processing elements. For example, the processor 120 may identify a first non-zero element, i.e., an element that is not zero, from the target data stored in the storage 100, and input the identified first non-zero element into the plurality of processing elements.  That is, the processor 120 may extract only the first non-zero element from the target data stored in the storage 110 in real time.”) Typically this is accomplished using some form of compression/decompression scheme, but none is presented.  In the absence of a description or pseudocode for how this is accomplished in the claimed invention, any method that accomplishes the same is acceptable in the prior art.) , and
[sequentially input, to plurality of first processing elements included in a first row from among the plurality of processing elements, a second non-zero element from among the plurality of elements included in the kernel data,]
[wherein the plurality of first processing elements is configured to perform an operation between a first non-zero element having same depth information as the input second non-zero element from among the at least one first non-zero element and the input second non-zero element in each cycle,]
[wherein the depth information relates to the information encoded in a third dimension of a three-dimensional data, and]
[wherein the second non-zero element include only non-zero value.]
Thus far, Chen does not explicitly teach sequentially input, to plurality of first processing elements included in a first row from among the plurality of processing elements, a second non-zero element from among the plurality of elements included in the kernel data, 
	Kim teaches sequentially input, to plurality of first processing elements included in a first row from among the plurality of processing elements, a second non-zero element from among the plurality of elements included in the kernel data (Kim, Fig. 2, and page 1463, column 2, paragraph 3, line 10 “The overall execution flow of the proposed architecture can be summarized as follows:
Given a convolutional layer, we first divide the spatial dimension4 of activations into work groups (WGs).  Each WG is assigned to a set of PEs to produce output feature maps from input activations in the WG (e.g., PEs grouped by a dotted box in Fig. 2 process a WG).

Activations and weights are broadcast from on-chip SRAM to PEs. All PEs in a WG receive the same activation (e.g., A0 for PE 0/1 of WG 0 in Fig. 3a), while PEs in different WGs but with the same index receive the same weights (e.g., K0 for PE 0 of WG 0/1 in Fig. 3a). The exact mechanism will be explained later in this section.

Based on the activations and weights assigned to each PE, PEs calculate the partial sums of convolution and accumulate them into the partial sum storage in Act SRAM (arrow annotated with ‘Psum’ in Fig. 2).

After a PE completes the convolution for an output feature map, the convolution result (‘Conv result’ in Fig. 2) is transferred to the ReLU module to produce the output of the convolutional layer, ‘Output FM’ (feature map) in Fig. 2, and a non-zero bit vector.  Then the final output is stored into Act SRAM.

Step 2 to 4 are repeated until all input activations are consumed.

The output feature maps are used as the input activations for the next layer in the CNN.”

(Note 4: Input activation has three dimensions, width (W), height (H) and channel (C) as shown in Fig. 3. WGs divide the input activation in the spatial dimension, W x H.)”

And, page 1464, column 2, paragraph 2, line 1, and Fig 2. “In order to exploit zero weights, each kernel tile has its own non-zero bit vector (i.e., a bit vector that stores the information about whether each weight is non-zero or not).  The non-zero bit vector of kernel weights is pre-computed at design time and is stored in Weight SRAM together with the associated kernel weights.  It is also broadcast to WGs when the kernel weights are broadcast.” and, page 1465, column 1, paragraph 2, line 1 “Fig. 4 shows the microarchitecture of a zero-aware PE. Each PE is composed of a fetch controller, data path, and three local buffers (Act buffer, Weight buffer and Psum buffer). The fetch controller receives activations/weights and their associated non-zero bit vectors from Act/Weight SRAM/ As illustrated in Fig. 4, it performs a logical AND operation of the two non-zero bit vectors to compute the indices of Act and Weight buffers where both activation and weight are non-zero.  Then it determines which entries to read from the Act and Weight buffers at each cycle (curraddr and nextaddr in Fig. 4).  This allows us to skip the multiplications whose results will be zero (i.e., due to either zero activation or zero weight).”

    PNG
    media_image4.png
    281
    568
    media_image4.png
    Greyscale


    PNG
    media_image5.png
    287
    577
    media_image5.png
    Greyscale

In other words, fetch controller receives activations/weights and their associated non-zero bit vectors from Act/Weight SRAM is sequentially input to each of a plurality of first processing elements, and nextaddr of the non-zero bit vector of kernel weights is a second non-zero element from among the plurality of elements included in the kernel data.),
	Kim teaches wherein the plurality of first processing elements is configured to perform an operation between a first non-zero element having same depth information as the input second non-zero element from among the at least one first non-zero element and the input second non-zero element in each cycle, (Kim, page 1463, column 2, paragraph 3, bullet 1 “Given a convolutional layer, we first divide the spatial dimension4 of activations into work groups (WGs).  Each WG is assigned to a set of PEs to produce output feature maps from input activation in the WG (e.g., PEs grouped by a dotted box in Fig. 2 process a WG).” And, note 4 “Input activation has three dimensions, width (W), height (H) and channel (C) as shown in Fig. 3. WGs divide the input activation in the spatial dimension, W x H.”   And, bullet 3 “Based on the activations and weights assigned to each PE, PEs calculate the partial sums of convolution and accumulate them into the partial sum storage in Act SRAM (arrow annotated with ’Psum’ in Fig. 2).”

    PNG
    media_image6.png
    452
    1148
    media_image6.png
    Greyscale

In other words, set of PEs is plurality of processing elements configured to perform an operation, input activation is the first non-zero element, channel dimension “C” is depth dimension, WGs divide the input activation in the spatial dimension (W x H) is having the same depth (in other words, each slice W x H is a channel i.e. “depth” dimension), weights is the second non-zero element, activations assigned to each PE is at least one first non-zero element, PEs calculate the partial sums of convolution is operation between the input first non-zero element and the input second non-zero element, and each set of operations between the first element and second element is a cycle.)
	Kim and Chen are both directed to improving inference time of convolutional neural networks by eliminating zero multiplications during convolutions. Chen teaches an electronic apparatus for convolutions that speeds up execution by avoiding multiplications whose results will be zero, but does not explicitly show using depth information in the kernel data. Kim teaches a zero weight/activation-aware architecture for convolutional neural network that skips zero multiplication for three dimensional kernel data.  In view of the teaching of Chen it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Kim into Chen.  This would result in being able to skip multiplications of convolutional operations whose results will be zero in three dimensional kernel data.
	One of ordinary skill in the art would be motivated to do this to speed up throughput in CNN inference by exploiting zero values. (Kim, page 1, column 1, paragraph 1, line 1 “It is imperative to accelerate convolutional neural networks (CNNs) due to their ever-widening application areas from server, mobile to IoT devices.  Based on the fact that CNNs can be characterized by a significant amount of zero values in both kernel weights and activations, we propose a novel hardware accelerator for CNNs exploiting zero weights and activation.”) 
	Thus far, the combination of Chen and Kim does not explicitly teach wherein the depth information relates to the information encoded in a third dimension of a three-dimensional data, and .
	Graham teaches wherein the depth information relates to the information encoded in a third dimension of a three-dimensional data (Graham, Figure 2: and page 1, paragraph 1, line 1 “We have implemented a convolutional neural network designed for processing sparse three-dimensional input data.” And, page 9, paragraph 5, line 1 “We have shown that sparse 3D CNNs can be implemented efficiently, and produce interesting results for a variety of types of 3D data.”

    PNG
    media_image7.png
    351
    817
    media_image7.png
    Greyscale

In other words, three-dimensional input data is information is encoded in a third dimension of a three-dimensional data.), and 
	Both Graham and the combination of Chen and Kim are directed to sparse matrix multiplication, among other things.  The combination of Chen and Kim teaches an electronic apparatus for convolutions that speeds up execution by avoiding multiplications whose results will be zero using depth information in the kernel data, but does not explicitly show the depth information relates to information encoded in a third dimension of three-dimensional data.  Graham shows convolutional neural networks where the data three-dimensional. In view of the teaching of the combination of Chen and Kim, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching Graham into the combination of Chen and Kim.  This would result in an electronic apparatus for convolutions that speeds up execution by avoiding multiplications whose results will be zero using depth information in the kernel data, where the depth information relates to information encoded in a third dimension of three-dimensional data.
	One of ordinary skill in the art would be motivated to do this because there are significant real-world problems that have three-dimensional structure and where sparse data is present. (Graham, page 2, paragraph 2, line 1 “The example of the string is just a thought experiment.  However, there are many real-world problems in domains such as robotics and biochemistry, where understanding 3D structure is important and where sparsity is applicable.”)
	Thus far, the combination of Chen, Kim and Graham does not explicitly teach wherein the second non-zero element include only non-zero value.
Chen-2 teaches wherein the second non-zero element include only non-zero value. (Chen-2, page 3, column 2, paragraph 2, line 1 “After pruning, the remaining weights are stored in a sparse matrix format.  Compressed sparse row (CSR) format, as shown in Fig. 4, is often used to store the sparse weight matrix in a compressed form.  The CSR data structure consists of three arrays.  The data array value stores only the non-zero elements row by row. To find out the original location of each non-zero element, two auxiliary data structures are added.  The column-indices array colidx contains nnz integers (nnz is the total number of non-zero elements), and entry colidx[i] indicates the column id or the ith element in value.  The row-pointers array rowptr contains M+1 (M is the number of rows of the matrix) integers, and entry rowptr[i] is the starting index in colidx of the ith row.  This implies that rowptr[i+1] – rowptr[i] is the number of non-zero elements in the ith row.”

    PNG
    media_image8.png
    197
    559
    media_image8.png
    Greyscale

In other words, the data array value stores only the non-zero elements row by row is the second non-zero element includes only non-zero value.)
	Both Chen-2 and the combination of Chen, Kim, and Graham are directed to fast multiplication of sparse matrices.  In view of the teaching of Chen, Kim, and Graham, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Chen-2. into the combination of Chen, Kim, and Graham. This would result in storing non-zero elements in compressed sparse row format.
One of ordinary skill would be motivated to do this to improve efficiency in sparse matrix computation. (Chen-2, page 1, column 1, paragraph 2, line 1 “Two major problems cause this unsatisfactory performance on GPUs. First, lowering convolution onto matrix multiplication reduces data reuse opportunities and wastes memory bandwidth.  Second, the sparsity brought by pruning makes the computation irregular, which leads to inefficiency when running on massively parallel GPUs.”)
Regarding claim 2,
	The combination of Chen, Kim, Graham, and Chen-2 teaches the electronic apparatus of claim 1, wherein
	each of the plurality of processing elements comprises a plurality of register files (Chen, Page 134, Column 2, Paragraph 3, Line 1 “Fig. 12 shows the architecture of a PE.” And, column 2, paragraph 3, line 14 “The filter spad is implemented in a 224-b X 16-b SRAM due to its large size; the ifmap and psum spads of size 12-b X 16-b and 24-b X 16-b, respectively are implemented using registers.” 

    PNG
    media_image9.png
    376
    613
    media_image9.png
    Greyscale

In other words, the ifmap spad and psum spad in the PE are a plurality of register files. See Fig. 12.) , and
	wherein the processor is further configured to: identify a corresponding processing element from among the plurality of processing elements based on row information and column information of the at least one first non-zero element (Chen, page 133, column 2, paragraph 3, line 19, and Fig. 11 “Each data read from the GLB is augmented with a (row, col) tag by the top-level controller, and the GIN guarantees that the data are delivered to all and only the X-buses and then PEs with the ID that matches the tag within a single cycle.”

    PNG
    media_image10.png
    667
    589
    media_image10.png
    Greyscale

In other words, PE is processing element, data is at least one non-zero element, and, GIN guarantees that the data are delivered to all and only the X-buses and then PEs with the ID that matches the tag is identify a corresponding processing element from the plurality of processing elements based on row information and column information.  See Fig. 11.) , and
	input the at least one first non-zero element to a corresponding register file from among the plurality of register files included in the identified processing elements, based on the depth information of the first non-zero element (Chen, page 128, table 1, and Algorithm 1, 

    PNG
    media_image11.png
    287
    590
    media_image11.png
    Greyscale


    PNG
    media_image12.png
    270
    585
    media_image12.png
    Greyscale

“where, O, I, W, and B are the matrices of the ofmaps, ifmaps, filters, and biases, respectively.”
Algorithm 1 defines “the computation of a layer” where the target data and kernel data are accessed by iterating through the matrices.  In other words, channel (C) is depth information, the target input matrix (ifmap) is I. And, from Fig. 12, ifmap spad is the input register file. This is inputting at least one first non-zero element to a corresponding register file based on depth of information (channel) of the first non-zero element. Examiner notes the instant application chooses to identify a two-dimensional array as a three-dimensional array (see instant application, specification, page 7, line 3 “The processor 120 may identify data stored in a plurality of two-dimensional cells as three-dimensional target data and kernel data.”) For this to be accomplished, some form of transformation or indexing is required. Without a detailed description of how this is implemented, such as pseudocode, any method in the prior art that accomplishes the same objective is acceptable.).

Regarding claim 3,
	The combination of Chen, Kim, Graham, and Chen-2 teaches the electronic apparatus of claim 2, wherein
the processor is further configured to sequentially input the second non-zero element to each of the plurality of first processing elements based on row information, column information, and the depth information of the second non-zero element. (Chen, page 128, column 1, paragraph 3, line 1 “II. CNN Basics, The CNN algorithm is constructed by stacking multiple computation layers for feature extraction and classification [30].  Modern CNNs achieve their superior accuracy by building a very deep hierarchy of layers [2]-[5], [7]. Which transform the input image data into highly abstract representations called feature maps (fmaps). The primary computation in the CNN layers is performing the high-dimensional convolutions.  A layer applies filters on the input fmaps (ifmaps) to extract embedded characteristics and generate the output fmaps (ofmaps) by accumulating the partial sums (psums).  The dimensions of both filters and fmaps are “4-D” each filter of fmap is a 3-D structure consisting of multiple 2-D planes, i.e., channels,1 and a batch of 3-D ifmaps is processed by a group of 3-D filters in a layer. In addition, there is a 1-D bias that is added to the filtering results.  Given the shape of parameters in Table I, the computation of a layer is defined 

    PNG
    media_image13.png
    275
    589
    media_image13.png
    Greyscale

where O, I, W, and B are the matrices of the ofmaps, ifmaps, filters, and biases, respectively.  U is a given stride size.  Fig. 1 shows a visualization of this computation (ignoring biases).  After the convolutions, activation functions, such as the rectified linear unit (ReLU) [31], are applied to introduce nonlinearity.”

    PNG
    media_image14.png
    462
    587
    media_image14.png
    Greyscale

In other words, ifmap is input, and the input matrix algorithm (1) is sequential input based on row information, column information, and depth information.)
Regarding claim 4,
	The combination of Chen, Kim, Graham, and Chen-2 teaches the electronic apparatus of claim 3, wherein
the processor is further configured to: sequentially input a second non-zero element included in one row and one column, from among the second non-zero element, to each of the plurality of first processing elements based on depth, and (Chen, page 128, column 1, paragraph 3, line 1. See mapping of claim 3 above.)
	when all of the second non-zero elements included in the one row and the one column are input to each of the plurality of first processing elements, input the second non-zero element included in a row and a column that is different from the one row and the one column to each of the plurality of first processing elements. (Chen, page 128, column 1, paragraph 3, line 1. See mapping of claim 3 above.)
Claims 11-14 are method claims corresponding to apparatus claims 1-4, respectively.  Claim 11 cites “a method of controlling an electronic apparatus to perform deep learning” whereas claim 1 cites “an electronic apparatus for performing deep learning.” Otherwise they are the same. Similarly, claims 12-14 are directed to a method whereas claims 2-4 are directed to an apparatus. Otherwise they are the same. The same prior art applies to both method and apparatus. Therefore, claims 11-14 are rejected for the same reasons as claims 1-4, respectively.
Claims 5-10, and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Chen, Kim, Graham, Chen-2, and Yavits et al (Sparse Matrix Multiplication on CAM Based Accelerator, herein Yavits).
Regarding claim 5,
	The combination of Chen, Kim, Graham, and Chen-2 teaches the electronic apparatus of claim 4, wherein the processor is further configured to:
	Thus far, the combination of Chen, Kim, Graham, and Chen-2 does not explicitly teach 	when there is no second non-zero element in the one row and the one column, input zero to each of the plurality of first processing elements, and when the zero is input to each of the plurality of first processing elements, input the second non-zero element included in a different row and column, based on a number of the second non-zero elements included in the different row and column, to each of the plurality of first processing elements. 
	Yavits teaches  when there is no second non-zero element in the one row and the one column, input zero to each of the plurality of first processing elements, and when the zero is input to each of the plurality of first processing elements, input the second non-zero element included in a different row and column, based on a number of the second non-zero elements included in the different row and column, to each of the plurality of first processing elements. (Yavits, page 2, Fig. 2, Step 3 “Read Bik,…,Bik; a. If no match is found in acceleration module m, set Bim = 0”

    PNG
    media_image15.png
    675
    611
    media_image15.png
    Greyscale

In other words, iterate through the kernel matrix, if no non-zero element is found, set the value to zero, and go to the next row.) 
	Both Yavits and the combination of Chen, Kim, Graham, and Chen-2 are directed to processing sparse matrices by avoiding the cost of zero multiplication, among other things. The combination of Chen, Kim, Graham and Chen-2 teaches an electronic apparatus for convolutions that speeds up execution by avoiding multiplications whose results will be zero using depth information in the kernel data, where the data is three-dimensional, but does not teach when there is no second non-zero element in the one row and the one column, inputting zero to each of the plurality of first processing elements.   Yavits teaches a sparse matrix multiplication accelerator that when there is no second non-zero element in the one row and the one column, inputting zero to each of the plurality of first processing elements. In view of the teaching of the combination of Chen, Kim, Graham, and Chen-2 it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Yavits into the combination of Chen, Kim, Graham, and Chen-2. This would result in an electronic apparatus for convolutions that speeds up execution by avoiding multiplications whose results will be zero using depth information in the kernel data, where the data is three-dimensional, and when there is no second non-zero element in the one row and the one column, inputting zero to each of the plurality of first processing elements.
	One of ordinary skill in the art would be motivated to do so in order to improve the efficiency of sparse matrix multiplication. (Yavits, page 1, column 1, paragraph 1, line 1 “Sparse matrix multiplication is a frequent bottleneck in large scale linear algebra applications, especially in data mining and machine learning [16].  The efficiency of sparse matrix multiplication becomes even more relevant with the emergence of big data, giving rise to very large vector and matrix sizes.”)
Regarding claim 6,
	The combination of Chen, Kim, Graham, Chen-2, and Yavits teaches the electronic apparatus of claim 3, wherein the processor is further configured to,
	when a depth that has no first non-zero element in all the rows and columns from among the at least one first non-zero elements stored in each of the plurality of processing elements is identified, omit input of the second non-zero element corresponding to the depth from among the second element (see Yavits, page 2, Fig. 2, see mapping of claim 5. And, from Step 3 Bik,…,Bik; is at least one of the first non-zero elements.) , and
	sequentially input the second non-zero element not corresponding to the depth to each of the plurality of first processing elements. (Yavits, page 2, fig. 2 In other words, the algorithm, from Fig.2, iterates through the non-zero elements of the input matrix.  This is sequentially input the second non-zero element.)
Regarding claim 7,
	The combination of Chen, Kim, Graham, Chen-2, and Yavits teaches the electronic apparatus of claim 3, wherein
	the processor further includes a plurality of preliminary processing elements (Yavits, Fig. 1, Fig. 2 and page 2, paragraph 1, line 1 “The CAM consists of an INDEX register that holds the comparison data pattern (the column index) and an array of CAM rows (Fig. 1). On a match of the index and one of the rows, the corresponding match line is set.  The match lines are fed into the juxtaposed RAM array as word lines. Thus, a match in the CAM selects one word in the RAM.  The CAM-RAM pair supports two operations: (1) initialization and (2) search-and-read.  During initialization, the nonzero elements of the sparse (right) vector (designated B) are written (from memory) into the RAM, and their corresponding indices are written into the CAM, so that the nonzero element and its index are stored in corresponding rows. During search-and-read, the column index of a nonzero element of the (left) sparse matrix (designated A) is place in the INDEX register and compare is performed. The matching row selects the corresponding word in the RAM and the selected nonzero B element is read from the RAM into the multiplier.” In other words, CAM-RAM pairs are used as processing elements, and also are used to fill the role of preliminary processing elements.) , and

    PNG
    media_image16.png
    635
    570
    media_image16.png
    Greyscale

	wherein the processor is further configured to: when a depth of which the non-zero element is within a predetermined number in all the rows and columns corresponding to the depth, is identified, from among the at least one first non-zero elements stored in each of the plurality of processing elements, omit input of the second non-zero element corresponding to the depth and sequentially input the second non-zero elements not corresponding to the depth to each of the plurality of first processing elements, and input the at least one first non-zero element corresponding to the depth and the second non-zero element corresponding to the depth to a plurality of preliminary processing elements to perform operation (Yavits, Fig. 1, and Fig. 2. Examiner notes that from the specification of the instant application that at least one intent of the claimed invention is to avoid unnecessary multiplications by zero in a convolution operation (Instant application, specification, page 3, line 10 “Accordingly, an aspect of the present disclosure is to provide an electronic apparatus that omits an unnecessary operation in a convolution operation process to improve an operation speed and a control method thereof.  Another aspect of the present disclosure is to provide an electronic apparatus that may improve speed of a convolution operation by omitting an operation of part of target data and part of kernel data according to zero included in the target data and a control method thereof.”).  In addition, the instant application chooses to identify a two-dimensional array as a three-dimensional array (Specification, page 7, line 3 “The processor 120 may identify data stored in a plurality of two-dimensional cells as three-dimensional target data and kernel data.”)  However, the present claim and preceding claims, fail to describe any software or hardware mechanism that actually specifies how to deal with zero elements.  Therefore, in absence of a description for how this is accomplished, any method that accomplishes the same in the prior art is acceptable.  Yavits architecture in Fig. 1 and algorithm in Fig. 2 show a method that accomplishes the goal of avoiding zero multiplication.) .
Regarding claim 8,
	The combination of Chen, Kim, Graham, Chen-2, and Yavits teaches the electronic apparatus of claim 3, wherein the processor is further configured to:
	when the operation between non-zero elements in the plurality of first processing elements is completed, control the plurality of processing elements to shift the second non-zero elements that are input to the plurality of first processing elements to each of a plurality of second processing elements included in a second row, and when the operation between non-zero elements is completed in the plurality of second processing elements, control the plurality of processing elements to shift the second non-zero elements that are shifted to the plurality of second processing elements to each of a plurality of third processing elements included in a third row from among the plurality of processing elements (Yavits, Fig. 1 and Fig. 2, See mapping of claim 7 above.).
Regarding claim 9,
	The combination of Chen, Kim, Graham, Chen-2, and Yavits teaches the electronic apparatus of claim 8, wherein the processor is further configured to,
	when the second non-zero element that is input to each of the plurality of processing elements belongs to a same row and a same column as a second non-zero element that is used immediately before, accumulate an operation result of the input second non-zero element with a previous operation result and store the accumulated operation results in one of the plurality of register files(Yavits, Fig. 1 and Fig. 2, See mapping of claim 7 above.).
Regarding claim 10, 
	The combination of Chen, Kim, Graham, Chen-2, and Yavits teaches the electronic apparatus of claim 8, wherein the processor is further configured to,
	when the second non-zero element that is input to each of the plurality of processing elements does not belong to a same row and a same column as a second non-zero element used for an operation immediately before, shift an operation result stored in one of the plurality of register files of each of the plurality of processing elements to an adjacent processing element, and accumulate an operation result by the input second non-zero element to the shifted operation result and store the accumulated operation results in one of the plurality of register files (Yavits, Fig. 1 and Fig. 2, See mapping of claim 7 above.).
Claims 15 -20 are method claims corresponding to apparatus claims 5-10. Otherwise, they are the same. Therefore, claims 15-20 are rejected for the same reasons as claims 5-10, respectively.
Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to BART RYLANDER whose telephone number is (571)272-8359. The examiner can normally be reached Monday - Thursday 8:00 to 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/B.I.R./Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124