DETAILED ACTION

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 03/24/2022 has been entered.
 

	Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Applicant’s claim for the benefit of a prior-filed U.S. German Patent Application No. DE 102018203709.4, filed on March 12, 2018, which is acknowledged.

Drawings
The drawings were received on 03/11/2019.  These drawings are acceptable. Replacement drawings  received 04/17/2019 are acceptable to be entered.

Specification
The substitute specification filed 04/17/2019 has been entered.

Response to Arguments
Applicant’s remarks and amendments filed 03/24/2021 have been fully considered.
Applicant’s arguments regarding the rejection of claims under USC 35 103 are directed to subject matter in the amended claim not previously examined by the examiner. Therefore, applicants arguments are rendered moot. The examiner refers to the rejection under 35 U.S.C. 103 in the current office action for more details.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1 and 10-16  are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Park et al. (US Pub. No. 2019/0190538, hereinafter ‘Park’) in view of Brothers et al. (US 20170011288, hereinafter ‘Bro’).

Regarding independent claim 1 Park teaches a method for operating a calculation system including a neural network, the neural network including multiple neuron layers, the calculation system including a processing unit for sequential calculation of the neural network and an external memory external thereto that buffers intermediate results of the calculations in the processing unit, the method comprising:  (operations for training neural networks and determining calculations, as claimed sequential computations on multiple sequential layers as depicted in Fig. 1.; And the computing system for computing computations in a distributed computing environment with external memory elements and processing units as depicted in Fig. 5 and in 0004-0005: As will be described in greater detail below, the instant disclosure details various systems and methods for reducing bandwidth consumption for memory accesses per­formed by an AI accelerator by compressing data written to memory and decompressing data read from memory after the data is received at the AI accelerator. For example, a computing system may include a memory device that stores compressed parameters for a layer of a neural network and a special-purpose hardware processing unit programmed to, for the layer of the neural network: (1) receive the com­pressed parameters from the memory device, (2) decom­press the compressed parameters, and (3) apply the decom­pressed parameters in an arithmetic operation of the layer of the neural network…Additionally or alternatively, the memory device may include a dynamic memory device that is remote relative to the special-purpose hardware processing unit;
 Examiner notes Neural Network are known to include a plurality of neuron layers as depicted in Park Fig. 2 depicts an example of a neural network including plurality of neuron layers, in 0034: FIG. 2 is a block diagram of an exemplary feed­forward neural network 200 capable of benefiting from the accelerators described herein. Neural network 200 may include an input layer 202, an output layer 204, and a series of five activation layers-activation layer 212, activation layer 214, activation layer 216, activation layer 218, and activation layer 220. While FIG. 2 provides an example with five activation layers, neural network 200 may include any other suitable number of activation layers (e.g., one activa­tion layer, dozens of activation layers, thousands of activa­tion layers, etc.).)
 incrementally calculating, using data sections of an input feature map, data sections of an output feature map which each represent a group of intermediate results, using a first one of the neuron layers of the neural network, the input feature map being used as input to the first one of the neuron layers; (claimed incremental sections as the depicted feature maps representing claimed intermediate results of the respective sequential convolution layers, as depicted in Fig. 3, in 0036-0037: … FIG. 3 shows a neural network 300 capable of benefiting from the accelerators described herein. As such in this figure, neural network 300 may include a variety of different types of layers 310 (some which may be fully connected feed-forward layers, such as those shown in FIG. 2). In convolution layer 312, an input 302 may undergo convolutional transformations, which may be calculated by hardware such as hardware processing unit 160, accelerator 500, and/or processor 714. For example, input 302 may undergo convolutions based on the filters and quantization parameters of convolution layer 312 to produce feature maps 304... FIG. 3 also shows that feature maps 304 output by convolution layer 312 may undergo subsampling ( e.g., pool­ing), based on the filters and parameters of subsampling layer 314, to produce feature maps 306, which may be reduced-size feature maps. The convolution and subsam­pling of layers 312 and 314 may be performed a single time or multiple times before sending an output (e.g., feature maps 306) to a fully connected layer, such as fully connected layer 316…;
 Examiner further notes the calculations in convolutional neural networks (CNN)include using sections of claimed feature maps and output feature map as depicted in Fig. 3 for each layer incrementally, in 0036-0037: While FIG. 2 shows one way to conceptualize a feed-forward neural network, there are a variety of other types of neural networks and ways to illustrate and concep­tualize neural networks. For example, FIG. 3 shows a neural network 300 capable of benefiting from the accelerators described herein… FIG. 3 also shows that feature maps 304 output by convolution layer 312 may undergo subsampling ( e.g., pool­ing), based on the filters and parameters of subsampling layer 314, to produce feature maps 306, which may be reduced-size feature maps. The convolution and subsam­pling of layers 312 and 314 may be performed a single time or multiple times before sending an output (e.g., feature maps 306) [incrementally calculating, using data sections of an input feature map, data sections of an output feature map which each represent a group of intermediate result] to a fully connected layer, such as fully connected layer 316. Fully connected layer 316, which FIG. 3 shows one example of, may process feature maps 306 [using a first one of the neuron layers of the neural network, the input feature map being used as input to the first one of the neuron layers]  to identify the most probable inference or classification for input 302 and may provide this classification or inference as output 320.)
lossy compression of at least one of the data sections of the output feature map to obtain compressed intermediate results; transmitting the compressed intermediate results to the external memory, wherein the external memory is external to the processing unit; (compressing data to write to external memory including the results of the neural network layers for accelerating computation and eliminate memory bottleneck, in 0040-0042: In various embodiments, memory devices for stor­ing compressed data may include any type or form of volatile or non-volatile storage device or medium capable of storing data. In some embodiments, a memory device may be separate from (e.g., remote from) [claimed wherein the external memory is external to the processing unit]  an accelerator or may be located directly on ( e.g., local to) an accelerator…Embodiments of the instant disclosure may also employ lossy (irreversible) com­pression techniques that may use inexact approximations and/or partial data discarding techniques., And in Fig. 3)
Park further teaches the accelerator processing the claimed data sections of the output feature map to obtain compressed intermediate results as depicted in Fig. 3 and Fig. 1, in 0036-0037: While FIG. 2 shows one way to conceptualize a feed-forward neural network, there are a variety of other types of neural networks and ways to illustrate and concep­tualize neural networks. For example, FIG. 3 shows a neural network 300 capable of benefiting from the accelerators described herein… FIG. 3 also shows that feature maps 304 output by convolution layer 312 may undergo subsampling ( e.g., pool­ing), based on the filters and parameters of subsampling layer 314, to produce feature maps 306, which may be reduced-size feature maps. The convolution and subsam­pling of layers 312 and 314 may be performed a single time or multiple times before sending an output (e.g., feature maps 306) to a fully connected layer, such as fully connected layer 316. Fully connected layer 316, which FIG. 3 shows one example of, may process feature maps 306  to identify the most probable inference or classification for input 302 and may provide this classification or inference as output 320; And in 0025: Turning to the figures, the following will provide, with reference to FIG. 1, detailed descriptions of an exem­plary network environment in which an accelerator with compression and decompression features may be utilized… The discussion of FIG. 5 presents an exemplary accelerator according to aspects of the present disclosure. The discussion of FIGS. 6A-6B covers processes for compression and decompression with accelerators. The following also provides, with reference to FIG. 7, an example of a computing system with a central processing unit (CPU) capable of implementing some of the steps or processes discussed herein.)
retrieving the compressed intermediate results from the external memory for calculations using a second one of the neuron layers of the neural network, the second one of the neuron layers being a different neuron layer than the first one of the neuron layers;  (claimed sequential process for retrieving compressed intermediate results via reading operations and claimed decompressing of the retrieved results and claimed further computation as step 610 depicted in Fig. 6A:

    PNG
    media_image1.png
    736
    254
    media_image1.png
    Greyscale

In 0050-0058: FIG. 6A is a flow diagram of an exemplary com­puter-implemented method 600 for performing compression and/or decompression within an accelerator. The steps shown in FIG. 6A may be performed by any suitable computer-executable code and/or computing system, includ­ing the system(s) illustrated in FIGS. 1, 5, and 7…Compressed data may be written to memory at any suitable time during network training, between training and inference, and/or during inference. Also, different portions of the compressed data may be stored in a local cache, a remote memory device, both, or neither…For example, processing unit 565 may read the compressed data from memory device 580 or fetch the compressed data from system memory 716 of computing system 710. Additionally or alternatively, processor 714 of computing system 710 may read the compressed data from system memory 716. Compressed data may be read from memory at any suitable time during network training, between training and inference, and/or during inference…; 
Park further teaches the processing of data on unique layers of the Neural Network as depicted in Fig. 2 depicts an example of a neural network including plurality of neuron layers, in 0034: FIG. 2 is a block diagram of an exemplary feed­forward neural network 200 capable of benefiting from the accelerators described herein. Neural network 200 may include an input layer 202, an output layer 204, and a series of five activation layers-activation layer 212, activation layer 214, activation layer 216, activation layer 218, and activation layer 220. While FIG. 2 provides an example with five activation layers, neural network 200 may include any other suitable number of activation layers (e.g., one activa­tion layer, dozens of activation layers, thousands of activa­tion layers, etc.).)
decompressing the retrieved compressed intermediated results to provide decompressed intermediate results; and  performing the calculations on the decompressed intermediate results using the second one of the neuron layers of the neural network, the decompressed intermediate results being used as input to the second one of the neuron layers of the neural network.  (claimed sequential process for retrieving compressed intermediate results via reading operations and claimed decompressing of the retrieved results and claimed further computation as step 610 depicted in Fig. 6A:

    PNG
    media_image1.png
    736
    254
    media_image1.png
    Greyscale

In 0050-0058: FIG. 6A is a flow diagram of an exemplary com­puter-implemented method 600 for performing compression and/or decompression within an accelerator. The steps shown in FIG. 6A may be performed by any suitable computer-executable code and/or computing system, includ­ing the system(s) illustrated in FIGS. 1, 5, and 7…Compressed data may be written to memory at any suitable time during network training, between training and inference, and/or during inference. Also, different portions of the compressed data may be stored in a local cache, a remote memory device, both, or neither…For example, processing unit 565 may read the compressed data from memory device 580 or fetch the compressed data from system memory 716 of computing system 710. Additionally or alternatively, processor 714 of computing system 710 may read the compressed data from system memory 716. Compressed data may be read from memory at any suitable time during network training, between training and inference, and/or during inference…Decompression subsystem 575 may decompress all the retrieved compressed model data and extract the parameters needed for a particular operation and/or may decompress only a subset of the retrieved model data…For example, network-layer logical units 435 may receive the decompressed parameters from decom­pression subsystem 575 and may use the decompressed parameters in any suitable arithmetic operation ( e.g., a multiply operation, an accumulate operation, a convolution operation, vector or matrix multiplication, etc.) [claimed the decompressed intermediate results being used as input to the second one of the neuron layers of the neural network])

incrementally calculating, using data sections of the decompressed intermediate results, data sections of a second output feature map which each represent a group of second intermediate results, using the second one of the neuron layers of the neural network, the decompressed intermediate results being used as input to the second one of the neuron layers of the neural network; convolutional neural networks (CNN)include using sections of claimed feature maps and output feature map as depicted in Fig. 3 for each layer incrementally, in 0036-0037: While FIG. 2 shows one way to conceptualize a feed-forward neural network, there are a variety of other types of neural networks and ways to illustrate and concep­tualize neural networks. For example, FIG. 3 shows a neural network 300 capable of benefiting from the accelerators described herein… FIG. 3 also shows that feature maps 304 output by convolution layer 312 may undergo subsampling ( e.g., pool­ing), based on the filters and parameters of subsampling layer 314, to produce feature maps 306, which may be reduced-size feature maps. The convolution and subsam­pling of layers 312 and 314 may be performed a single time or multiple times before sending an output (e.g., feature maps 306) [incrementally calculating, using data sections of an input feature map, data sections of an output feature map which each represent a group of intermediate result] to a fully connected layer, such as fully connected layer 316. Fully connected layer 316, which FIG. 3 shows one example of, may process feature maps 306  to identify the most probable inference or classification for input 302 and may provide this classification or inference as output 320; Examiner notes that the feature maps are group to compute intermediate results in sequential neural network layers as depicted in Fig. 3:

    PNG
    media_image2.png
    952
    1380
    media_image2.png
    Greyscale

)

lossy compression of at least one of the data sections of the second output feature map, to obtain compressed second intermediate results; (claimed second output feature maps in the sequential layers of the neural network as depicted in Fig. 3; where the processing is done using a compression algorithm as claimed lossy compression of data sections, in 0006-0008: … Additionally or alternatively, the compres­sion subsystem may be configured to compress the model data by (1) distinguishing between sparse and non-sparse data in the parameters and (2) applying a compression algorithm to the parameters based on the distinguished sparse and the non-sparse data in the parameters. Further­more, in these and other embodiments, the compression subsystem may be configured to compress the model data by implementing a lossy compression algorithm…)
While Park expressly teaches the processing of data using a convolutional neural network of sequential data processing neuron layers using compressing and decompressing of model data including feature map; where the feature maps are used to processing data sequentially noted above using compression algorithms, in 0006: According to various examples, the computing system may include a compression subsystem that is com­municatively coupled to the memory device and configured to compress the model data and store the compressed data in the memory device… Further­more, in these and other embodiments, the compression subsystem may be configured to compress the model data by implementing a lossy compression algorithm)
Park also teaches data flow as a sequential process through layers to produced output, in 0035-50: In the example shown in FIG. 2, data flows from input layer 202 through activation layers 212-220 to output layer 204 (i.e., from left to right). As shown, each value from the nodes of input layer 202 may be duplicated and sent to the nodes of activation layer 212. At activation layer 212, a set of weights (i.e., a filter) may be applied to the layer inputs, and each node may output a weighted sum to activation layer 214. This process may be repeated at each activation layer in sequence to create outputs at output layer 204… . As such in this figure, neural network 300 may include a variety of different types of layers 310 (some which may be fully connected feed-forward layers, such as those shown in FIG. 2… . In some embodiments, data may be compressed before being written to memory, and the com­pressed data may be read from memory and decompressed on an accelerator before being used in neural network computations. Compression may be applied to an entire set of parameters for a neural network layer or may be selec­tively applied, for example, to a subset of parameters of a neural network or layer. Certain data, such as filter weights, may be compressed and cached locally for all or a portion of the processing involved in a particular neural network layer (or set of neural network layers)…)
Park does not expressly disclose the sequential processing, using the layers of a neural networks, of through the intermediate results as disclosed in  the limitations:
transmitting the compressed second intermediate results to the external memory; retrieving the compressed second intermediate results from the external memory for additional calculations using a third one of the neuron layers of the neural network, the third one of the neuron layers being a different neuron layer than the first one of the neuron layers and the second one of the neuron layers;
decompressing the retrieved compressed second intermediate results to provide decompressed second intermediate results; performing the additional calculations on the decompressed second intermediate results using the third one of the neuron layers of the neural network, the decompressed second intermediate results being used as input to the third one of the neuron layers of the neural network.
Bro does expressly teach the sequential processing, using the layers of a neural networks, of through the intermediate results as disclosed in  the limitations:
transmitting the compressed second intermediate results to the external memory; retrieving the compressed second intermediate results from the external memory for additional calculations using a third one of the neuron layers of the neural network, the third one of the neuron layers being a different neuron layer than the first one of the neuron layers and the second one of the neuron layers; (in 0037: As pictured, input data paths 104 include a data staging unit 120 and a weight decompressor 122. Data staging units 120 are configured to receive data, e.g., input data such as feature maps, values, and/or intermediate results, from memory units 110. Data staging units 120 are also configured to provide data obtained from memory units 110 to AAC arrays 106. Weight decompressor 122 is configured to decompress weights [claimed decompressing the retrieved compressed second intermediate results to provide decompressed second intermediate results; performing the additional calculations on the decompressed second intermediate results using the third one of the neuron layers of the neural network, the decompressed second intermediate results being used as input to the third one of the neuron layers of the neural network] read from memory units 110 that are stored in compressed form [claimed transmitting the compressed second intermediate results to the external memory; retrieving the compressed second intermediate results from the external memory for additional calculations using a third one of the neuron layers of the neural network, the third one of the neuron layers being a different neuron layer than the first one of the neuron layers and the second one of the neuron layers]...; Where the third layer is part of the plurality of layers for sequencing the received compressed data for decompression and processing in 0004: A neural network may be used to extract "features" from complex input data. The neural network may include a plurality of layers. Each layer may receive input data and generate output data by processing the input data to the layer. The output data may be a feature map of the input data that the neural network generates by convolving an input image or a feature map with convolution kernels. Initial layers of a neural network, e.g., convolution layers, may be operative to extract low level features such as edges and/or gradients from an input such as an image. The initial layers of a neural network are also called feature extraction layers. Subsequent layers of the neural network, referred to as feature classification layers, may extract or detect progres­sively more complex features such as eyes, a nose, or the like. Feature classification layers are also referred to as "fully-connected layers."; And in 0101-0102: To process all feature maps from a layer N of a neural network to generate all feature maps in layer N+ I of the neural network [claimed the decompressed second intermediate results being used as input to the third one of the neuron layers of the neural network], MMU 700 may employ a combination of scatter and gather matrix multiply operations. The choice of which mode to use and when is specified by a network optimization utility to minimize overall computation and storage footprint of the intermediate data to be maintained in generating the next layer. MMU 700 includes a control unit 704. Control unit 704 may include one or more registers 706 that store data representative of the current (x, y) locations of one or more input feature maps being processed (e.g., a first data set) and a weight table 708 defining one or more matrices (weights) to be applied (e.g., a second data set)... Control unit 704 is capable of storing and/or executing a sequence of instructions referred to as an "execution sequence."; Examiner notes where the feature maps comprise the weight values provided along the data processing sequence path, in 0059-0060: Control unit 102 may execute a convolution macro instruction. In general, the convolution macro instruction causes AAC arrays 106 to perform convolution and accu­mulate the results. In one example embodiment, control unit 102, in executing a convolution macro instruction, causes input data paths 104 to read a portion of the input data and the weights from memory units 110 and provide the input data and weights to AAC arrays 106 . As discussed, weight decompressor 122 within input data path 104 is capable of decompressing weights so that AAC arrays 106 receive decompressed weights. AAC arrays 106, responsive to control signals from control unit 102 in executing the same convolution macro instruction may convolve an x-y region ( e.g., an NxN region where N is an integer greater than 1) of an input feature map ( e.g., the input data) with a plurality of weights to generate an output feature map or a portion of an output feature map…)

decompressing the retrieved compressed second intermediate results to provide decompressed second intermediate results; performing the additional calculations on the decompressed second intermediate results using the third one of the neuron layers of the neural network, the decompressed second intermediate results being used as input to the third one of the neuron layers of the neural network. (in 0037: As pictured, input data paths 104 include a data staging unit 120 and a weight decompressor 122. Data staging units 120 are configured to receive data, e.g., input data such as feature maps, values, and/or intermediate results, from memory units 110. Data staging units 120 are also configured to provide data obtained from memory units 110 to AAC arrays 106. Weight decompressor 122 is configured to decompress weights [claimed decompressing the retrieved compressed second intermediate results to provide decompressed second intermediate results; performing the additional calculations on the decompressed second intermediate results using the third one of the neuron layers of the neural network, the decompressed second intermediate results being used as input to the third one of the neuron layers of the neural network] read from memory units 110 that are stored in compressed form [claimed transmitting the compressed second intermediate results to the external memory; retrieving the compressed second intermediate results from the external memory for additional calculations using a third one of the neuron layers of the neural network, the third one of the neuron layers being a different neuron layer than the first one of the neuron layers and the second one of the neuron layers]...; Where the third layer is part of the plurality of layers for sequencing the received compressed data for decompression and processing in 0004: A neural network may be used to extract "features" from complex input data. The neural network may include a plurality of layers. Each layer may receive input data and generate output data by processing the input data to the layer. The output data may be a feature map of the input data that the neural network generates by convolving an input image or a feature map with convolution kernels. Initial layers of a neural network, e.g., convolution layers, may be operative to extract low level features such as edges and/or gradients from an input such as an image. The initial layers of a neural network are also called feature extraction layers. Subsequent layers of the neural network, referred to as feature classification layers, may extract or detect progres­sively more complex features such as eyes, a nose, or the like. Feature classification layers are also referred to as "fully-connected layers."; And in 0101-0102: To process all feature maps from a layer N of a neural network to generate all feature maps in layer N+ I of the neural network [claimed the decompressed second intermediate results being used as input to the third one of the neuron layers of the neural network], MMU 700 may employ a combination of scatter and gather matrix multiply operations. The choice of which mode to use and when is specified by a network optimization utility to minimize overall computation and storage footprint of the intermediate data to be maintained in generating the next layer. MMU 700 includes a control unit 704. Control unit 704 may include one or more registers 706 that store data representative of the current (x, y) locations of one or more input feature maps being processed (e.g., a first data set) and a weight table 708 defining one or more matrices (weights) to be applied (e.g., a second data set)... Control unit 704 is capable of storing and/or executing a sequence of instructions referred to as an "execution sequence."; Examiner notes where the feature maps comprise the weight values provided along the data processing sequence path, in 0059-0060: Control unit 102 may execute a convolution macro instruction. In general, the convolution macro instruction causes AAC arrays 106 to perform convolution and accu­mulate the results. In one example embodiment, control unit 102, in executing a convolution macro instruction, causes input data paths 104 to read a portion of the input data and the weights from memory units 110 and provide the input data and weights to AAC arrays 106 . As discussed, weight decompressor 122 within input data path 104 is capable of decompressing weights so that AAC arrays 106 receive decompressed weights. AAC arrays 106, responsive to control signals from control unit 102 in executing the same convolution macro instruction may convolve an x-y region ( e.g., an NxN region where N is an integer greater than 1) of an input feature map ( e.g., the input data) with a plurality of weights to generate an output feature map or a portion of an output feature map…)
The Park and Bro are references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing information processing techniques s for buffering intermediate calculation results of individual neuron layers used to processing input data using neural networks layers.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for processing intermediate results for information processing of data using data movement processing techniques as disclosed by Bro with the information processing system for processing neural network computations as disclosed by Park.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods of processing convolutions of feature maps in sequence layers of a neural networks using external storage (Bro,0055-000059 & 0111-0113); Doing so will helps optimized neural network computations and improved power efficiency, (Bro, 0113).

	
Regarding claim 10, the rejection of claim 1 is incorporated. Park in combination with Bro further teaches the method as recited in claim 1, wherein the neural network is a convolutional neural network including the multiple neuron layers, each of the neuron layers being assigned neuron parameters, one element of the output feature map being obtained by applying the neuron parameters assigned to the output feature map to a data section of the input feature map. (claimed neural network as depicted in Fig. 2 and Fig. 3, in 0034-0036: FIG. 2 is a block diagram of an exemplary feed­forward neural network 200 capable of benefiting from the accelerators described herein. Neural network 200 may include an input layer 202, an output layer 204, and a series of five activation layers-activation layer 212, activation layer 214, activation layer 216, activation layer 218, and activation layer 220… While FIG. 2 shows one way to conceptualize a feed-forward neural network, there are a variety of other types of neural networks and ways to illustrate and concep­tualize neural networks. For example, FIG. 3 shows a neural network 300 capable of benefiting from the accelerators described herein… In convolution layer 312, an input 302 may undergo convolutional transformations, which may be calculated by hardware such as hardware processing unit 160, accelerator 500, and/or processor 714. For example, input 302 may undergo convolutions based on the filters and quantization parameters of convolution layer 312 to produce feature maps 304 [claimed each of the neuron layers being assigned neuron parameters, one element of the output feature map being obtained by applying the neuron parameters assigned to the output feature map to a data section of the input feature map as processed from a previous layer to a proceed layer as depicted in Fig. 3]…; And in 0037-0041: FIG. 3 also shows that feature maps 304 output by convolution layer 312 may undergo subsampling ( e.g., pool­ing), based on the filters and parameters of subsampling layer 314, to produce feature maps 306, which may be reduced-size feature maps [Examiner note: feature maps of CNN are transformed as an output to a proceeding layer of an input feature map processed by the previous layer]… As explained above in the discussion of FIG. 3, in a convolutional neural network each activation layer may be a set of nonlinear functions of spatially nearby subsets of outputs of a prior layer... As noted, a hardware accelerator may be specially configured to perform computations for layers of a neural network, and the performance of certain layers of the neural network may be limited by memory bandwidth ( e.g., limited in the amount of data available on a memory channel) between the hardware accelerator and a memory device… Compression may be applied to an entire set of parameters for a neural network layer or may be selec­tively applied, for example, to a subset of parameters of a neural network or layer [claimed each of the neuron layers being assigned neuron parameters of the neuron layer subset ]. Certain data, such as filter weights, may be compressed and cached locally for all or a portion of the processing involved in a particular neural network layer (or set of neural network layers).)

Regarding claim 12, the rejection of claim 1 is incorporated. Park in combination with Bro further teaches the method as recited in claim 1, wherein, after the retrieval of the compressed intermediate results from the external memory, (compressing data to write to external memory including the claimed intermediate results of the neural network layers for accelerating computation and eliminate memory bottleneck, in 0040-0042: In various embodiments, memory devices for stor­ing compressed data may include any type or form of volatile or non-volatile storage device or medium capable of storing data. In some embodiments, a memory device may be separate from (e.g., remote from) an accelerator or may be located directly on ( e.g., local to) an accelerator…Embodiments of the instant disclosure may also employ lossy (irreversible) com­pression techniques that may use inexact approximations and/or partial data discarding techniques…)
the retrieved compressed intermediate results are decoded using a decoding method and are subsequently back- transformed to obtain the decompressed intermediate results. (claimed sequential process of retrieved intermediate to obtain results on proceeding layers via reading operations and claimed decompressing, as the claimed backed transformed process, as depicted in Fig. 6A, In 0050-0058: FIG. 6A is a flow diagram of an exemplary com­puter-implemented method 600 for performing compression and/or decompression within an accelerator. The steps shown in FIG. 6A may be performed by any suitable computer-executable code and/or computing system, includ­ing the system(s) illustrated in FIGS. 1, 5, and 7…Compressed data may be written to memory at any suitable time during network training, between training and inference, and/or during inference. Also, different portions of the compressed data may be stored in a local cache, a remote memory device, both, or neither…For example, processing unit 565 may read the compressed data from memory device 580 or fetch the compressed data from system memory 716 of computing system 710. Additionally or alternatively, processor 714 of computing system 710 may read the compressed data from system memory 716. Compressed data may be read from memory at any suitable time during network training, between training and inference, and/or during inference…Decompression subsystem 575 may decompress all the retrieved compressed model data and extract the parameters needed for a particular operation and/or may decompress only a subset of the retrieved model data…For example, network-layer logical units 435 may receive the decompressed parameters from decom­pression subsystem 575 and may use the decompressed parameters in any suitable arithmetic operation ( e.g., a multiply operation, an accumulate operation, a convolution operation, vector or matrix multiplication, etc.))

Regarding independent claim 13 Park teaches a calculation system, comprising: a convolutional neural network including multiple neuron layers; (as depicted in Fig. 2 and Fig. 3, in 0038: As explained above in the discussion of FIG. 3, in a convolutional neural network each activation layer may be a set of nonlinear functions of spatially nearby subsets of outputs of a prior layer...; And in 0034-0036: FIG. 2 is a block diagram of an exemplary feed­forward neural network 200 capable of benefiting from the accelerators described herein. Neural network 200 may include an input layer 202, an output layer 204, and a series of five activation layers-activation layer 212, activation layer 214, activation layer 216, activation layer 218, and activation layer 220… While FIG. 2 shows one way to conceptualize a feed-forward neural network, there are a variety of other types of neural networks and ways to illustrate and concep­tualize neural networks…
Examiner notes Neural Network are known to include a plurality of neuron layers as depicted in Park Fig. 2 depicts an example of a neural network including plurality of neuron layers, in 0034: FIG. 2 is a block diagram of an exemplary feed­forward neural network 200 capable of benefiting from the accelerators described herein. Neural network 200 may include an input layer 202, an output layer 204, and a series of five activation layers-activation layer 212, activation layer 214, activation layer 216, activation layer 218, and activation layer 220. While FIG. 2 provides an example with five activation layers, neural network 200 may include any other suitable number of activation layers (e.g., one activa­tion layer, dozens of activation layers, thousands of activa­tion layers, etc.).)
a processing unit including a processor configured to perform sequential calculations of the neural network; and a memory external to the processing unit configured to buffer intermediate results of the calculations in the processing unit;  (operations for training neural networks and determining calculations, as claimed sequential computations on multiple sequential layers as depicted in Fig. 1.; And the computing system for computing computations in a distributed computing environment with external memory elements and processing units as depicted in Fig. 5 and in 0004-0005: As will be described in greater detail below, the instant disclosure details various systems and methods for reducing bandwidth consumption for memory accesses per­formed by an AI accelerator by compressing data written to memory and decompressing data read from memory after the data is received at the AI accelerator. For example, a computing system may include a memory device that stores compressed parameters for a layer of a neural network and a special-purpose hardware processing unit programmed to, for the layer of the neural network: (1) receive the com­pressed parameters from the memory device, (2) decom­press the compressed parameters, and (3) apply the decom­pressed parameters in an arithmetic operation of the layer of the neural network…Additionally or alternatively, the memory device may include a dynamic memory device that is remote relative to the special-purpose hardware processing unit.)
wherein the processing unit is configured to: incrementally calculate, using data sections of an input feature map, data sections of an output feature map which each represent a group of intermediate results, using a first one of the neuron layers of the neural network, the input feature map being used as input to the first one of the neuron layers; (claimed incremental sections as the depicted feature maps representing claimed intermediate results of the respective sequential convolution layers, as depicted in Fig. 3, in 0036-0037: … FIG. 3 shows a neural network 300 capable of benefiting from the accelerators described herein. As such in this figure, neural network 300 may include a variety of different types of layers 310 (some which may be fully connected feed-forward layers, such as those shown in FIG. 2). In convolution layer 312, an input 302 may undergo convolutional transformations, which may be calculated by hardware such as hardware processing unit 160, accelerator 500, and/or processor 714. For example, input 302 may undergo convolutions based on the filters and quantization parameters of convolution layer 312 to produce feature maps 304... FIG. 3 also shows that feature maps 304 output by convolution layer 312 may undergo subsampling ( e.g., pool­ing), based on the filters and parameters of subsampling layer 314, to produce feature maps 306, which may be reduced-size feature maps. The convolution and subsam­pling of layers 312 and 314 may be performed a single time or multiple times before sending an output (e.g., feature maps 306) to a fully connected layer, such as fully connected layer 316…;
 Examiner further notes the calculations in convolutional neural networks (CNN)include using sections of claimed feature maps and output feature map as depicted in Fig. 3 for each layer incrementally, in 0036-0037: While FIG. 2 shows one way to conceptualize a feed-forward neural network, there are a variety of other types of neural networks and ways to illustrate and concep­tualize neural networks. For example, FIG. 3 shows a neural network 300 capable of benefiting from the accelerators described herein… FIG. 3 also shows that feature maps 304 output by convolution layer 312 may undergo subsampling ( e.g., pool­ing), based on the filters and parameters of subsampling layer 314, to produce feature maps 306, which may be reduced-size feature maps. The convolution and subsam­pling of layers 312 and 314 may be performed a single time or multiple times before sending an output (e.g., feature maps 306) [incrementally calculate, using data sections of an input feature map, data sections of an output feature map which each represent a group of intermediate results] to a fully connected layer, such as fully connected layer 316. Fully connected layer 316, which FIG. 3 shows one example of, may process feature maps 306 [using a first one of the neuron layers of the neural network, the input feature map being used as input to the first one of the neuron layers] to identify the most probable inference or classification for input 302 and may provide this classification or inference as output 320.)
conduct a lossy compression of at least one of the data sections of the output feature map to obtain compressed intermediate results; and transmit the compressed intermediate results to the external memory. (compressing data to write to external memory including the results of the neural network layers for accelerating computation and eliminate memory bottleneck, in 0040-0042: In various embodiments, memory devices for stor­ing compressed data may include any type or form of volatile or non-volatile storage device or medium capable of storing data. In some embodiments, a memory device may be separate from (e.g., remote from) an accelerator or may be located directly on ( e.g., local to) an accelerator…Embodiments of the instant disclosure may also employ lossy (irreversible) com­pression techniques that may use inexact approximations and/or partial data discarding techniques.; And in Fig. 3)
Examiner notes the recited claimed units executed by the distributed processor via executed computer instructions, in 0039: …The CPU 110 may include a single core processor or a multi-core processor. The CPU 110 may process or execute programs and/or data stored in the memory 140. For example, the CPU 110 may control the function of the neural network device 130 by executing programs ("one or more programs of instructions") stored in the memory 140 to implement some or all of the operations described herein.
wherein the calculation system is configured to retrieve the compressed intermediate results from the external memory (claimed sequential process for retrieving compressed intermediate results via reading operations and claimed decompressing of the retrieved results and claimed further computation as step 610 depicted in Fig. 6A:

    PNG
    media_image1.png
    736
    254
    media_image1.png
    Greyscale

In 0050-0058: FIG. 6A is a flow diagram of an exemplary com­puter-implemented method 600 for performing compression and/or decompression within an accelerator. The steps shown in FIG. 6A may be performed by any suitable computer-executable code and/or computing system, includ­ing the system(s) illustrated in FIGS. 1, 5, and 7…Compressed data may be written to memory at any suitable time during network training, between training and inference, and/or during inference. Also, different portions of the compressed data may be stored in a local cache, a remote memory device, both, or neither…For example, processing unit 565 may read the compressed data from memory device 580 or fetch the compressed data from system memory 716 of computing system 710. Additionally or alternatively, processor 714 of computing system 710 may read the compressed data from system memory 716. Compressed data may be read from memory at any suitable time during network training, between training and inference, and/or during inference…; 
Park further teaches the processing of data on unique layers of the Neural Network as depicted in Fig. 2 depicts an example of a neural network including plurality of neuron layers, in 0034: FIG. 2 is a block diagram of an exemplary feed­forward neural network 200 capable of benefiting from the accelerators described herein. Neural network 200 may include an input layer 202, an output layer 204, and a series of five activation layers-activation layer 212, activation layer 214, activation layer 216, activation layer 218, and activation layer 220. While FIG. 2 provides an example with five activation layers, neural network 200 may include any other suitable number of activation layers (e.g., one activa­tion layer, dozens of activation layers, thousands of activa­tion layers, etc.).)

and to decompress the retrieved compressed intermediate results to provide decompressed intermediate results; and wherein the processing unit is configured to: (claimed sequential process for retrieving compressed intermediate results via reading operations and claimed decompressing of the retrieved results and claimed further computation as step 610 depicted in Fig. 6A:

    PNG
    media_image1.png
    736
    254
    media_image1.png
    Greyscale

In 0050-0058: FIG. 6A is a flow diagram of an exemplary com­puter-implemented method 600 for performing compression and/or decompression within an accelerator. The steps shown in FIG. 6A may be performed by any suitable computer-executable code and/or computing system, includ­ing the system(s) illustrated in FIGS. 1, 5, and 7…Compressed data may be written to memory at any suitable time during network training, between training and inference, and/or during inference. Also, different portions of the compressed data may be stored in a local cache, a remote memory device, both, or neither…For example, processing unit 565 may read the compressed data from memory device 580 or fetch the compressed data from system memory 716 of computing system 710. Additionally or alternatively, processor 714 of computing system 710 may read the compressed data from system memory 716. Compressed data may be read from memory at any suitable time during network training, between training and inference, and/or during inference…Decompression subsystem 575 may decompress all the retrieved compressed model data and extract the parameters needed for a particular operation and/or may decompress only a subset of the retrieved model data…For example, network-layer logical units 435 may receive the decompressed parameters from decom­pression subsystem 575 and may use the decompressed parameters in any suitable arithmetic operation ( e.g., a multiply operation, an accumulate operation, a convolution operation, vector or matrix multiplication, etc.) [claimed the decompressed intermediate results being used as input to the second one of the neuron layers of the neural network])
While Park expressly teaches the processing of data using a convolutional neural network of sequential data processing neuron layers using compressing and decompressing of model data including feature map; where the feature maps are used to processing data sequentially noted above using compression algorithms, in 0006: According to various examples, the computing system may include a compression subsystem that is com­municatively coupled to the memory device and configured to compress the model data and store the compressed data in the memory device… Further­more, in these and other embodiments, the compression subsystem may be configured to compress the model data by implementing a lossy compression algorithm)
Park also teaches data flow as a sequential process through layers to produced output, in 0035-50: In the example shown in FIG. 2, data flows from input layer 202 through activation layers 212-220 to output layer 204 (i.e., from left to right). As shown, each value from the nodes of input layer 202 may be duplicated and sent to the nodes of activation layer 212. At activation layer 212, a set of weights (i.e., a filter) may be applied to the layer inputs, and each node may output a weighted sum to activation layer 214. This process may be repeated at each activation layer in sequence to create outputs at output layer 204… . As such in this figure, neural network 300 may include a variety of different types of layers 310 (some which may be fully connected feed-forward layers, such as those shown in FIG. 2… . In some embodiments, data may be compressed before being written to memory, and the com­pressed data may be read from memory and decompressed on an accelerator before being used in neural network computations. Compression may be applied to an entire set of parameters for a neural network layer or may be selec­tively applied, for example, to a subset of parameters of a neural network or layer. Certain data, such as filter weights, may be compressed and cached locally for all or a portion of the processing involved in a particular neural network layer (or set of neural network layers)…)
 The remaining limitations are similar to those in claim 1 and are rejected under the same rationale. 
Claimed units as processing instructions, in Park 0082-0087: … Computing system 710 may also employ any number of software, firmware, and/or hardware con­figurations. For example, one or more of the example embodiments disclosed herein may be encoded as a com­puter program (also referred to as computer software, soft­ware applications, computer-readable instructions, or com­puter control logic) on a computer-readable medium. The term "computer-readable medium," as used herein, gener­ally refers to any form of device, carrier, or medium capable of storing or carrying computer-readable instructions… In addition, in certain embodiments one or more of these modules may represent one or more software applications or programs that, when executed by a computing device, may cause the computing device to perform one or more tasks. For example, one or more of the modules described and/or illustrated herein may represent modules stored and configured to run on one or more of the computing devices or systems described and/or illustrated herein… The preceding description has been provided to enable others skilled in the art to best utilize various aspects of the exemplary embodiments disclosed herein. This exem­plary description is not intended to be exhaustive or to be limited to any precise form disclosed. Many modifications and variations are possible without departing from the spirit and scope of the instant disclosure. The embodiments dis­closed herein should be considered in all respects illustrative and not restrictive.

Regarding claim 14, the rejection of claim 13 is incorporated. Park in combination with Bro further teaches the calculation system as recited in claim 13, further comprising: a decompression unit including hardware configured to retrieve the compressed intermediate results from the external memory, to decompress the retrieve compressed intermediate results, and to provide the decompressed intermediate results to the second one of the neuron layers as the input to the second one of the neuron layers. (claimed sequential process for retrieving compressed intermediate results via reading operations and claimed decompressing of the retrieved results and claimed further computation as step 610 depicted in Fig. 6A:

    PNG
    media_image1.png
    736
    254
    media_image1.png
    Greyscale

In 0050-0058: FIG. 6A is a flow diagram of an exemplary com­puter-implemented method 600 for performing compression and/or decompression within an accelerator. The steps shown in FIG. 6A may be performed by any suitable computer-executable code and/or computing system, includ­ing the system(s) illustrated in FIGS. 1, 5, and 7…Compressed data may be written to memory at any suitable time during network training, between training and inference, and/or during inference. Also, different portions of the compressed data may be stored in a local cache, a remote memory device, both, or neither…For example, processing unit 565 may read the compressed data from memory device 580 or fetch the compressed data from system memory 716 of computing system 710. Additionally or alternatively, processor 714 of computing system 710 may read the compressed data from system memory 716. Compressed data may be read from memory at any suitable time during network training, between training and inference, and/or during inference…Decompression subsystem 575 may decompress all the retrieved compressed model data and extract the parameters needed for a particular operation and/or may decompress only a subset of the retrieved model data…For example, network-layer logical units 435 may receive the decompressed parameters from decom­pression subsystem 575 and may use the decompressed parameters in any suitable arithmetic operation ( e.g., a multiply operation, an accumulate operation, a convolution operation, vector or matrix multiplication, etc.)
Examiner further notes the calculations in convolutional neural networks (CNN)include using sections of claimed feature maps and output feature map including parameters as intermediate results to the second one of the neuron layers as the input to the second one of the neuron layers depicted in Fig. 3 for each layer incrementally, in 0036-0037.)

	
Regarding independent claim 15 Park in combination with Bro teaches method of using a calculation system, the method comprising
The claim limitations are similar to claim 13 limitations and are rejected under the same rationale.
processing, using the calculation system, image data of camera images in (i) a driver assistance system for carrying out a driver assistance function, or (ii) a system for autonomously operating the motor vehicle, to carry out an object identification method, a segmentation method or a classification method for the image data. (claimed image data for classification of the image data, in 0050: In a neural network, low-level layers, e.g., convo­lution layers, may extract low-level features an edge or gradient of a face image) from input data or an input feature map and high-level layers, e.g., fully-connected layers, may extract or detect high-level features, i.e., classes (e.g., eyes and a nose of the face image) from the input feature map.; And, in  0032: In some embodiments, server 106 may access data (e.g., data provided by computing devices 102(1)-(N)) for analysis. For example, server 106 may perform various types of machine learning tasks on data. For instance, server 106 may use machine learning algorithms to perform speech recognition ( e.g., to automatically caption videos), to enable computer vision (e.g., to identify objects in images, to classify images, to identify action in video, to turn pan­oramic photos into interactive 360 images, etc.), in recom­mender systems ( e.g., information filtering systems that predict user preferences), for facial recognition and human pose estimation, in document analysis, and/or to perform a variety of other tasks.)
Examiner notes the recited claimed units executed by the distributed processor via executed computer instructions, in 0039: …The CPU 110 may include a single core processor or a multi-core processor. The CPU 110 may process or execute programs and/or data stored in the memory 140. For example, the CPU 110 may control the function of the neural network device 130 by executing programs ("one or more programs of instructions") stored in the memory 140 to implement some or all of the operations described herein.

Regarding independent claim 16, Park in combination with Bro teaches a non-transitory machine-readable storage medium on which is stored a computer program for operating a calculation system (in 0039: …The CPU 110 may include a single core processor or a multi-core processor. The CPU 110 may process or execute programs and/or data stored in the memory 140. For example, the CPU 110 may control the function of the neural network device 130 by executing programs ("one or more programs of instructions") stored in the memory 140 to implement some or all of the operations described herein.)
The claim limitations are similar to claim 1  limitations and are rejected under the same rationale.

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Park et al. (US Pub. No. 2019/0190538, hereinafter ‘Park’), in view of Brothers et al. (US 20170011288, hereinafter ‘Bro’), and  in further view of Bar -On et al . (US Pub. No. 2018/0293758, hereinafter ‘Bar’).

Regarding claim 3, the rejection of claim 1 is incorporated. Park in combination with Bro further teaches the method as recited in claim 1, wherein, for the lossy compression, the data sections of the output feature map are each transformed into a …domain,  and the transformed data sections are filtered element-wise to obtain modified data sections. (claimed transformed sections a convolutional transformation based on filters as claimed filtered element-wised to modify data sections to produce the out feature maps of the convolution layers and transformed by the subsampling layer as depicted in Fig 3, in 0037-0037: ... FIG. 3 also shows that feature maps 304 output by convolution layer 312 may undergo subsampling ( e.g., pool­ing), based on the filters and parameters of subsampling layer 314, to produce feature maps 306, which may be reduced-size feature maps. The convolution and subsam­pling of layers 312 and 314 may be performed a single time or multiple times before sending an output (e.g., feature maps 306) to a fully connected layer, such as fully connected layer 316. Fully connected layer 316, which FIG. 3 shows one example of, may process feature maps 306 to identify the most probable inference or classification for input 302 and may provide this classification or inference as output 320. ; And process the parameter data of the feature maps using lossy algorithms, in 0046: …For example, data may be compressed using a complex lossless compres­sion algorithm when being stored in DDR 462 since reads to DDR 462 may be dependent on memory bandwidth. Data that is to be stored on-chip on an accelerator may be compressed and decompressed with less aggressive, more lossy, and/or simpler compression schemes ( e.g., these parameters may be used frequently and may therefore need to be compressed and/or decompressed rapidly) and stored in SRAM 464.)
While Park discloses the computation of neural networks using data transformation and quantization as part of the lossy compression process, as disclosed above. Park and Bro do not expressly teach the claimed elements directed to the limitation: the data sections are each transformed into a frequency domain.
Bar teaches the limitation: the data sections are each transformed into a frequency domain (as depicted in Fig. 6A  in 0285-0287:  … include an apparatus comprising logic, at least partially including hardware logic, to implement a lossy compression algorithm which utilizes a data transform and quantization process to compress data in a convolutional neural network (CNN) layer… wherein the apparatus compresses one or more weights in a convolutional neural network (CNN) layer in a frequency domain…). 
The Park, Bro, and Bar references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing information processing system for processing neural network computations.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the method for transforming data section in a frequency domain as disclosed by Bar with the information processing system for processing neural network computations as collectively disclosed by Park and Bro.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods in Park, Bro, and Bar in order to implement deep neural networks in a parallel processing environment “to increase processing efficiency. The efficiency provided by parallel machine learning algorithm implementations allows the use of high capacity networks and enables those networks to be trained on larger datasets”, (Bar, 0002).

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Park et al. (US Pub. No. 2019/0190538, hereinafter ‘Park’) in view of Brothers et al. (US 20170011288, hereinafter ‘Bro’),  in further view of Bar -On et al . (US Pub. No. 2018/0293758, hereinafter ‘Bar’), and in further view of Jun-seok Park (US Pat. Pub. No. 2018/0253635, hereinafter ‘Park2’).

Regarding claim 4, the rejection of claim 3 is incorporated. Park in combination with Bro and Bar further teaches the method as recited in claim 3, wherein the element-wise filtering includes a multiplication of a portion of elements of the transformed data sections with 0 to obtain modified data sections including a number of "0" elements which is … than a number of "0" elements in the data sections of the output feature map. (claimed filtering processing to included claimed zeros using activation function to obtained modified data sections as claimed, in 0035-0036: … As shown, each value from the nodes of input layer 202 may be duplicated and sent to the nodes of activation layer 212. At activation layer 212, a set of weights (i.e., a filter) may be applied to the layer inputs, and each node may output a weighted sum to activation layer 214… For example, input 302 may undergo convolutions based on the filters and quantization parameters of convolution layer 312 to produce feature maps 304. In some embodiments, convolution layer 312 may also include a rectification sublayer (e.g., a layer implemented via a rectified linear unit, also known as a RELU layer) with an activation function.; And claimed output feature map from pervious layer as depicted in Fig. 3)
While Park teaches the use transformation of features including sections of 0 based on activation functions as discussed above.
Park and Bro do not expressly teach the use of a threshold hold value including in the activation function for transforming 0 sections as claimed.
Bar teaches the use of a threshold hold value including in the activation function for transforming 0 sections as claimed. (in 0170: … Several types of non-linear activation functions may be used. One particular type is the rectified linear unit (ReLU), which uses an activation function defined as f(x)=max(0,x), such that the activation is thresholded at zero.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Park, Bro, and Bar for the same reasons disclosed above.
While Park, Bro, and Bar in combination teach the use of data transformations for processing feature maps, the references do not expressly disclose the limitation: … the transformed data sections with 0 to obtain modified data sections including a number of "0" elements which is greater than a number of "0" elements in the data sections of the input feature map.
Park2 does expressly teach the limitation: the transformed data sections with 0 to obtain modified data sections including a number of "0" elements which is greater than a number of "0" elements in the data sections of the output feature map. (using zero padding to obtained claimed modified data sections including 0 element which is greater as depicted in Fig. 11A:

    PNG
    media_image3.png
    608
    879
    media_image3.png
    Greyscale

In 0105-0108: … the neural network device 130 may generate an input feature list, which includes an index and data with respect to each of input features having a non-zero value, from an input feature map in matrix… FIG. llA is a diagram of an example in which zero-padding is applied to an input feature map IFM in a neural network…. Zero-padding in a neural network is adding zeros to the input feature map IFM in all outward directions, i.e., row and column directions. When zero-padding is applied to the input feature map IFM, an input feature map with zero­padding, i.e., a zero-padded input feature map IFM_Z may be generated. When one zero is added to every outward direction of the input feature map IFM, as shown in FIG. llA, a location, i.e., an index, of each input feature may be increased by 1…; Examiner notes for neural networks input feature maps are output feature maps from a previous layer where the operations are performed on the output features maps from the pervious layer, in 0044-0046: Each of the first through third layers 11, 12, and 13 may receive input data or a feature map generated in a previous layer as an input feature map and may generate an output feature map or a recognition signal REC, by per­forming an operation on the input feature map. At this time, the feature map is data which represents various features of input data.. The first layer 11 may perform a convolution of the first feature map FMl and a weight map WM to generate the second feature map FM2. The weight map WM may filter the first feature map FMl and may be referred to as a filter or a kernel. The depth, i.e., the number of channels of the weight map WM, may be the same as the depth, i.e., the number of channels of the first feature map FMl. The convolution may be performed on the same channels in both the weight map WM and the first feature map FMl. The weight map WM shifts by traversing the first feature map FMl as a sliding window. The amount of shift may be referred to as a "stride length" or a "stride". During a shift, each weight included in the weight map WM may be multiplied by and added to all feature values in an area where the weight map WM overlaps the first feature map FMl. One channel of the second feature map FM2 may be generated by performing a convolution of the first feature map FMl and the weight map WM…)
	
The Park, Bro, Bar, and Park2 references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing information processing system for processing neural network computations.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the method for transforming data section using zero-padding as disclosed by Park with the information processing system for processing neural network computations as collectively disclosed by Park, Bro, and Bar.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods in Park, Bro, Bar, and Park2 in order to add “zeros to the input feature map IFM in all outward directions”  as “[w]hen zero-padding is applied to the input feature map IFM, an input feature map with zero­padding, i.e., a zero-padded input feature map IFM_Z may be generated” (Park2, 105-108); doing so will help facilitate efficient processing of input features having different spatial locations of the feature map over different processing circuits, (Park2, 0185).

Claims 5-6 are rejected under 35 U.S.C. 103 as being unpatentable over Park et al. (US Pub. No. 2019/0190538, hereinafter ‘Park’) in view of Brothers et al. (US 20170011288, hereinafter ‘Bro’),  in further view of Bar -On et al . (US Pub. No. 2018/0293758, hereinafter ‘Bar’) and in further view of Yan et al. (US Pat. Pub. No. 2018/0218518, hereinafter ‘Yan’).

	Regarding claim 5, the rejection of claim 3 is incorporated. Park in combination Bro and Bar further teaches the method as recited in claim 3, wherein at least one of the modified data sections is encoded together, with the aid of a predefined encoding method to obtain the compressed intermediate results, with the aid of run length encoding or entropy encoding. (claimed encoding of claimed modified sections with predefined as claimed lossless compression algorithm with claim entropy encoding as Huffman coding, in 0042: … Other examples of compression schemes may include lossless compression algorithms such as lookup tables with Huffman coding ( e.g., an algorithm that assigns variable-length codes to inputs, where lengths of the assigned codes are based on frequen­cies of matrix elements) or other reversible compression techniques that allow original data to be completely recon­structed from the compressed data…)
	While Park, Bro, and Bar in combination disclose the use of an entropy based encoding process Park, Bro, and Bar do not expressly disclose the use of a run-length encoding process.
	Yan discloses the use of a run-length encoding process. (in 0045: FIG. 3B illustrates a conceptual diagram of input data 305 and another compact data format 320, in accor­dance with one embodiment. A run-length encoding scheme is used to generate the compacted multi-bit data encoded in the compact data format 320. The compact data format 320 includes each non-zero value and the number of zeros between each non-zero element.)
The Park, Bro, Bar, and Yan references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing information-processing system for processing neural network computations.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the method for processing multi-bit data for sparse convolution neural networks as disclosed by Yan with the information processing system for processing neural network computations as collectively disclosed by Park, Bro, and Bar.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods in Park, Bro, Bar, and Yan in order to process bit data by using a processing element for accessing when the “ multi-bit data is determined to equal zero and a single bit signal is transmit­ted from the memory interface to the processing element in lieu of the multi-bit data”; doing so will enable “method, computer program product, and system for sparse convolutional neural networks that improves efficiency”, (Yan, Abstract).

Regarding claim 6, the rejection of claim 5 is incorporated. Park in combination with Bro, Bar, and Yan further teaches the method as recited in claim 5, wherein a compression matrix is applied to the transformed data sections for the element-wise filtering, the compression matrix being separately predefined for each calculation layer of the sequential calculation of the neural network. (sparse matrix compression as claimed compression matrix applied to claimed transformed sections, in 0042: FIG. 4 shows a data flow 400 that depicts how data may be decompressed and compressed according to aspects of the present disclosure. Either or both of DDR 402 and SRAM 404 may store compressed parameters for a neural network. These parameters ( e.g., model data) may have been compressed using any of a variety of different types of compression schemes. For example, a sparse-matrix com­pression scheme may be used to compress data. Examples of sparse-matrix compression schemes may include a com­pressed-sparse-row scheme (e.g., an algorithm that creates a format that may represent a matrix with one-dimensional arrays that contain…; And applying the process sequentially over the network layers as claimed calculation layer of the neural net, in 0044: Network-layer logical units 435 may apply the decompressed parameters in a variety of types of arithmetic operations. For example, the parameters may be applied in filtering or convolution operations, which may be matrix operations, or other operations such as RELU operations or pooling operations. In some embodiments, these parameters may be updated during execution of the layer ( e.g., during backpropagation)…; And 0048: Network-layer logical units 435 may include one or more logical units or other calculation hardware, such as matrix multipliers or general matrix-matrix multiplication (GEMM) units, tensor units, or other logical and/or arith­metic units used for performing calculations for a layer ( e.g., as part of training and/or inference operations)…)

Claims 8-9 are rejected under 35 U.S.C. 103 as being unpatentable over Park et al. (US Pub. No. 2019/0190538, hereinafter ‘Park’), in view of Brothers et al. (US 20170011288, hereinafter ‘Bro’), in view  further of Bar -On et al . (US Pub. No. 2018/0293758, hereinafter ‘Bar’) and  in further view of Yan et al. (US Pat. Pub. No. 2018/0218518, hereinafter ‘Yan’) and in further view of Thiagarajan et al. (US Pub No. 2018/0084253, hereinafter ‘Thia’).

Regarding claim 8, the rejection of claim 6 is incorporated. Park in combination with Bro, Bar, and Yan further teaches the method as recited in claim 6, wherein the compression matrix is predefined in such a way that higher-frequency components of the transformed data sections are filtered. (claimed compressed matrix filtered frequency components predefined via lossless compression algorithm, , in 0042: FIG. 4 shows a data flow 400 that depicts how data may be decompressed and compressed according to aspects of the present disclosure. Either or both of DDR 402 and SRAM 404 may store compressed parameters for a neural network… For example, a sparse-matrix com­pression scheme may be used to compress data. Examples of sparse-matrix compression schemes may include a com­pressed-sparse-row scheme... Other examples of compression schemes may include lossless compression algorithms such as lookup tables with Huffman coding ( e.g., an algorithm that assigns variable-length codes to inputs, where lengths of the assigned codes are based on frequen­cies of matrix elements) …)
While Park discloses the compressing and quantization of matrix components for preforming operations in a sequential neural network as discussed above. Park, Bro, Bar, and Yan do not expressly disclose the processing of frequencies of matrix elements such that high-frequency components are filtered.
Thia does expressly disclose the processing of frequencies of matrix elements such that high-frequency components are filtered. (in 0047: Generally, the values inserted into the base quan­tization matrix are expected to be larger than the values that are replaced in the base quantization matrix. This is expected to increase the likelihood that information in the transform matrix at corresponding positions will be discarded, thereby increasing the likelihood of higher-frequency components being discarded during compression and increasing the amount of compression…; And filtering high frequency components by setting teem to zero, in 0054: In some cases, the highest-frequency values or higher­frequency values of the quantized transform matrix may be set to zero with the techniques described in U.S. Provisional Patent App. 62/513,681, titled MODIFYING COEFFI­CIENTS OF A TRANSFORM MATRIX…)
The Park, Bro, Bar, Yan, and Thia references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing information processing system for compressing data used in matrix computations by machine learning algorithms .
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the method for processing video data using quantization and entropy coding to produce compressed video data as disclosed by Thia with the information processing system for processing neural network computations as collectively disclosed by Park, Bro, Bar, and Yan.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods in Park, Bro, Bar, Yan, and Thia in order to process video data using quantization and entropy coding to produce compressed video data; doing so will “increase the likelihood that information in the transform matrix at corresponding positions will be discarded, thereby increasing the likelihood of higher-frequency components being discarded during compression and increasing the amount of compression”, (Thia, 0047); and “enhance compression resulting from subsequent entropy coding operations”, (Thia, 0054).

Regarding claim 9, the rejection of claim 8 is incorporated. Park in combination with Bar, Bro, Yan, and Thia further teaches the method as recited in claim 8, wherein the compression matrix is predefined in that, during a training of the neural network, matrix elements of the filter matrices for each calculation layer are trained together with neuron parameters of neurons of the neural network with the aid of a back-propagation method. (claimed training process using backprop in 0044: Network-layer logical units 435 may apply the decompressed parameters in a variety of types of arithmetic operations. For example, the parameters may be applied in filtering or convolution operations, which may be matrix operations, or other operations such as RELU operations or pooling operations. In some embodiments, these parameters may be updated during execution of the layer ( e.g., during backpropagation)…)


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure listed below:
Verhelst et al. (NPL: “Embedded deep neural network processing: Algorithmic and processor techniques bring deep learning to iot and edge devices”): teaches the use of data embedding in Neural Network processing and the use of data topology and data flow techniques for computing operations for neural networks to enhance performance efficiency. Citation: Verhelst M, Moons B. Embedded deep neural network processing: Algorithmic and processor techniques bring deep learning to iot and edge devices. IEEE Solid-State Circuits Magazine. 2017 Nov 15;9(4):55-65.
Chalfin et al. (US Pub No. 20180239992): teaches the use of weight parameters in artificial neural network layers using image compression scheme. 
Jiang et al. (NPL: “Medical image analysis with artificial neural networks”): teaches the analysis of images using neural networks and lossy algorithms.
Robina (NPL: “Absolutely lossless compression of medical images”): teaches the analysis of images using neural networks and lossy algorithms.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUWATOSIN ALABI whose telephone number is (571)272-0516.  The examiner can normally be reached on Monday-Friday, 8:00am-5:00pm EST..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached on (303) 297-4307.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/OLUWATOSIN O ALABI/Examiner, Art Unit 2129