DETAILED ACTION
This action is in response to communications filed on 03/11/2019 in which claims 1-16 are still
pending.
This action is non-final.

	Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Applicant’s claim for the benefit of a prior-filed U.S. German Patent Application No. DE 102018203709.4, filed on March 12, 2018, which is acknowledged.

Drawings
The drawings were received on 03/11/2019.  These drawings are acceptable. Replacement drawings  received 04/17/2019 are acceptable to be entered.

Specification
The substitute specification filed 04/17/2019 has been entered.

Information Disclosure Statement
The information disclosure statements (IDSs) submitted on 06/10/2019 and 03/11/2019 has been considered by the examiner. 

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are listed below where the generic place holder is in bold and the functional language italicized.
Claim 13 limitations:
a processing unit configured to perform sequential calculations of the neural network
and a memory external to the processing unit configured to buffer intermediate results of the calculations in the processing unit; 
wherein the processing unit is configured to: incrementally calculate data sections, which each represent a group of intermediate results, with the aid of a neural network;
Claim 14 limitations:
a decompression unit which is configured to retrieve the compressed intermediate results from the external memory for a calculation with the aid of the neural network, to decompress the retrieve compressed intermediate results, and to carry out a further calculation as a function of the decompressed intermediate results.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 6-9 and 13-14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 6, the term "better" in claim limitation “wherein the compression matrix is matched to the encoding method of the transformed data sections in such a way that a better compression is achieved than with a direct application of the encoding method to the non-transformed data sections” is a relative term which renders the claim indefinite.  Specifically, what qualifies one compression as better than another is unclear. In addition, the term "better" is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. 
Regarding claims 7-9, that depend on claim 6,  the claims do not resolve the noted deficiency above and are therefore appropriately rejected. 

Regarding claim 7, the claim recites the limitation "the non-transformed data sections".  There is insufficient antecedent basis for this limitation in the claim.

Regarding claims 13-14 the claim limitations invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (see analysis in claim interpretation section above). However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Specifically, the applicant’s 
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1-2 and 10-16  are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Park et al. (US Pub. No. 2019/0190538, hereinafter ‘Park’).

Regarding independent claim 1 Park teaches a method for operating a calculation system including a neural network, the calculation system including a processing unit for sequential calculation of the neural network and an external memory external thereto that buffers intermediate results of the calculations in the processing unit, the method comprising:  (operations for training neural networks and determining calculations, as claimed sequential computations on multiple sequential layers as depicted in Fig. 1.; And the computing system for computing computations in a distributed computing environment with external memory elements and processing units as depicted in Fig. 5 and in 0004-0005: As will be described in greater detail below, the instant disclosure details various systems and methods for reducing bandwidth consumption for memory accesses per­formed by an AI accelerator by compressing data written to memory and decompressing data read from memory after the data is received at the AI accelerator. For example, a computing system may include a memory device that stores compressed parameters for a layer of a neural network and a special-purpose hardware processing unit programmed to, for the layer of the neural network: (1) receive the com­pressed parameters from the memory device, (2) decom­press the compressed parameters, and (3) apply the decom­pressed parameters in an arithmetic operation of the layer of the neural network…Additionally or alternatively, the memory device may include a dynamic memory device that is remote relative to the special-purpose hardware processing unit.)
incrementally calculating data sections of an input feature map, which each represent a group of intermediate results, with the aid of a neural network; (claimed incremental sections as the depicted feature maps representing claimed intermediate results of the respective convolution layers, as depicted in Fig. 3, in 0036-0037: … FIG. 3 shows a neural network 300 capable of benefiting from the accelerators described herein. As such in this figure, neural network 300 may include a variety of different types of layers 310 (some which may be fully connected feed-forward layers, such as those shown in FIG. 2). In convolution layer 312, an input 302 may undergo convolutional transformations, which may be calculated by hardware such as hardware processing unit 160, accelerator 500, and/or processor 714. For example, input 302 may undergo convolutions based on the filters and quantization parameters of convolution layer 312 to produce feature maps 304... FIG. 3 also shows that feature maps 304 output by convolution layer 312 may undergo subsampling ( e.g., pool­ing), based on the filters and parameters of subsampling layer 314, to produce feature maps 306, which may be reduced-size feature maps. The convolution and subsam­pling of layers 312 and 314 may be performed a single time or multiple times before sending an output (e.g., feature maps 306) to a fully connected layer, such as fully connected layer 316…)
lossy compression of at least one the data sections to obtain compressed intermediate results; and transmitting the compressed intermediate results to the external memory. (compressing data to write to external memory including the results of the neural network layers for accelerating computation and eliminate memory bottleneck, in 0040-0042: In various embodiments, memory devices for stor­ing compressed data may include any type or form of volatile or non-volatile storage device or medium capable of storing data. In some embodiments, a memory device may be separate from (e.g., remote from) an accelerator or may be located directly on ( e.g., local to) an accelerator…Embodiments of the instant disclosure may also employ lossy (irreversible) com­pression techniques that may use inexact approximations and/or partial data discarding techniques.)

	Regarding claim 2, the rejection of claim 1 is incorporated. Park further teaches the method as recited in claim 1, further comprising retrieving the compressed intermediate results from the external memory for a calculation with the aid of the neural network; decompressing the retrieved intermediate results; and performing a further calculation is carried out as a function of the decompressed intermediate results. (claimed sequential process for retrieving compressed intermediate results via reading operations and claimed decompressing of the retrieved results and claimed further computation as step 610 depicted in Fig. 6A:

    PNG
    media_image1.png
    736
    254
    media_image1.png
    Greyscale

In 0050-0058: FIG. 6A is a flow diagram of an exemplary com­puter-implemented method 600 for performing compression and/or decompression within an accelerator. The steps shown in FIG. 6A may be performed by any suitable computer-executable code and/or computing system, includ­ing the system(s) illustrated in FIGS. 1, 5, and 7…Compressed data may be written to memory at any suitable time during network training, between training and inference, and/or during inference. Also, different portions of the compressed data may be stored in a local cache, a remote memory device, both, or neither…For example, processing unit 565 may read the compressed data from memory device 580 or fetch the compressed data from system memory 716 of computing system 710. Additionally or alternatively, processor 714 of computing system 710 may read the compressed data from system memory 716. Compressed data may be read from memory at any suitable time during network training, between training and inference, and/or during inference…Decompression subsystem 575 may decompress all the retrieved compressed model data and extract the parameters needed for a particular operation and/or may decompress only a subset of the retrieved model data…For example, network-layer logical units 435 may receive the decompressed parameters from decom­pression subsystem 575 and may use the decompressed parameters in any suitable arithmetic operation ( e.g., a multiply operation, an accumulate operation, a convolution operation, vector or matrix multiplication, etc.))

Regarding claim 10, the rejection of claim 1 is incorporated. Park further teaches the method as recited in claim 1, wherein the neural network is a convolutional neural network including multiple neuron layers, which are each assigned neuron parameters, one element of an output feature map being obtained by applying the neuron parameters assigned to the output feature map to a data section of at least one input feature map. (claimed neural network as depicted in Fig. 2 and Fig. 3, in 0034-0036: FIG. 2 is a block diagram of an exemplary feed­forward neural network 200 capable of benefiting from the accelerators described herein. Neural network 200 may include an input layer 202, an output layer 204, and a series of five activation layers-activation layer 212, activation layer 214, activation layer 216, activation layer 218, and activation layer 220… While FIG. 2 shows one way to conceptualize a feed-forward neural network, there are a variety of other types of neural networks and ways to illustrate and concep­tualize neural networks. For example, FIG. 3 shows a neural network 300 capable of benefiting from the accelerators described herein… In convolution layer 312, an input 302 may undergo convolutional transformations, which may be calculated by hardware such as hardware processing unit 160, accelerator 500, and/or processor 714. For example, input 302 may undergo convolutions based on the filters and quantization parameters of convolution layer 312 to produce feature maps 304…; And in 0038: As explained above in the discussion of FIG. 3, in a convolutional neural network each activation layer may be a set of nonlinear functions of spatially nearby subsets of outputs of a prior layer...)

Regarding claim 11, the rejection of claim 1 is incorporated. Park further teaches the method as recited in claim 1, wherein the compressed intermediate results are transmitted to the external memory, and the compressed intermediate results being retrieved from the external memory for a further calculation with the aid of the neural network and being decompressed to obtain decompressed data sections. (compressing data to write to external memory including the claimed intermediate results of the neural network layers for accelerating computation and eliminate memory bottleneck, in 0040-0042: In various embodiments, memory devices for stor­ing compressed data may include any type or form of volatile or non-volatile storage device or medium capable of storing data. In some embodiments, a memory device may be separate from (e.g., remote from) an accelerator or may be located directly on ( e.g., local to) an accelerator…Embodiments of the instant disclosure may also employ lossy (irreversible) com­pression techniques that may use inexact approximations and/or partial data discarding techniques…; And decompressing as depicted in Fig. 6A, in 0050: FIG. 6A is a flow diagram of an exemplary com­puter-implemented method 600 for performing compression and/or decompression within an accelerator. The steps shown in FIG. 6A may be performed by any suitable computer-executable code and/or computing system, includ­ing the system(s) illustrated in FIGS. 1, 5, and 7...)

(compressing data to write to external memory including the claimed intermediate results of the neural network layers for accelerating computation and eliminate memory bottleneck, in 0040-0042: In various embodiments, memory devices for stor­ing compressed data may include any type or form of volatile or non-volatile storage device or medium capable of storing data. In some embodiments, a memory device may be separate from (e.g., remote from) an accelerator or may be located directly on ( e.g., local to) an accelerator…Embodiments of the instant disclosure may also employ lossy (irreversible) com­pression techniques that may use inexact approximations and/or partial data discarding techniques…)
the retrieves compressed intermediate results are decoded using a decoding method and are subsequently back- transformed to obtain the decompressed intermediate results. (claimed sequential process for retrieving compressed intermediate results via reading operations and claimed decompressing, as the claimed backed transformed process, of the retrieved results as step 610 depicted in Fig. 6A, In 0050-0058: FIG. 6A is a flow diagram of an exemplary com­puter-implemented method 600 for performing compression and/or decompression within an accelerator. The steps shown in FIG. 6A may be performed by any suitable computer-executable code and/or computing system, includ­ing the system(s) illustrated in FIGS. 1, 5, and 7…Compressed data may be written to memory at any suitable time during network training, between training and inference, and/or during inference. Also, different portions of the compressed data may be stored in a local cache, a remote memory device, both, or neither…For example, processing unit 565 may read the compressed data from memory device 580 or fetch the compressed data from system memory 716 of computing system 710. Additionally or alternatively, processor 714 of computing system 710 may read the compressed data from system memory 716. Compressed data may be read from memory at any suitable time during network training, between training and inference, and/or during inference…Decompression subsystem 575 may decompress all the retrieved compressed model data and extract the parameters needed for a particular operation and/or may decompress only a subset of the retrieved model data…For example, network-layer logical units 435 may receive the decompressed parameters from decom­pression subsystem 575 and may use the decompressed parameters in any suitable arithmetic operation ( e.g., a multiply operation, an accumulate operation, a convolution operation, vector or matrix multiplication, etc.))

Regarding independent claim 13 Park teaches a calculation system, comprising: a convolutional neural network; (as depicted in Fig. 2 and Fig. 3, in 0038: As explained above in the discussion of FIG. 3, in a convolutional neural network each activation layer may be a set of nonlinear functions of spatially nearby subsets of outputs of a prior layer...; And in 0034-0036: FIG. 2 is a block diagram of an exemplary feed­forward neural network 200 capable of benefiting from the accelerators described herein. Neural network 200 may include an input layer 202, an output layer 204, and a series of five activation layers-activation layer 212, activation layer 214, activation layer 216, activation layer 218, and activation layer 220… While FIG. 2 shows one way to conceptualize a feed-forward neural network, there are a variety of other types of neural networks and ways to illustrate and concep­tualize neural networks…)
a processing unit configured to perform sequential calculations of the neural network; and a memory external to the processing unit configured to buffer intermediate results of the calculations in the processing unit;  (operations for training neural networks and determining calculations, as claimed sequential computations on multiple sequential layers as depicted in Fig. 1.; And the computing system for computing computations in a distributed computing environment with external memory elements and processing units as depicted in Fig. 5 and in 0004-0005: As will be described in greater detail below, the instant disclosure details various systems and methods for reducing bandwidth consumption for memory accesses per­formed by an AI accelerator by compressing data written to memory and decompressing data read from memory after the data is received at the AI accelerator. For example, a computing system may include a memory device that stores compressed parameters for a layer of a neural network and a special-purpose hardware processing unit programmed to, for the layer of the neural network: (1) receive the com­pressed parameters from the memory device, (2) decom­press the compressed parameters, and (3) apply the decom­pressed parameters in an arithmetic operation of the layer of the neural network…Additionally or alternatively, the memory device may include a dynamic memory device that is remote relative to the special-purpose hardware processing unit.)
wherein the processing unit is configured to: incrementally calculate data sections, which each represent a group of intermediate results, with the aid of a neural network; (claimed incremental sections as the depicted feature maps representing claimed intermediate results of the respective convolution layers, as depicted in Fig. 3, in 0036-0037: … FIG. 3 shows a neural network 300 capable of benefiting from the accelerators described herein. As such in this figure, neural network 300 may include a variety of different types of layers 310 (some which may be fully connected feed-forward layers, such as those shown in FIG. 2). In convolution layer 312, an input 302 may undergo convolutional transformations, which may be calculated by hardware such as hardware processing unit 160, accelerator 500, and/or processor 714. For example, input 302 may undergo convolutions based on the filters and quantization parameters of convolution layer 312 to produce feature maps 304... FIG. 3 also shows that feature maps 304 output by convolution layer 312 may undergo subsampling ( e.g., pool­ing), based on the filters and parameters of subsampling layer 314, to produce feature maps 306, which may be reduced-size feature maps. The convolution and subsam­pling of layers 312 and 314 may be performed a single time or multiple times before sending an output (e.g., feature maps 306) to a fully connected layer, such as fully connected layer 316…)
conduct a lossy compression of at least one of the data sections to obtain compressed intermediate results; and transmit the compressed intermediate results to the external memory. (compressing data to write to external memory including the results of the neural network layers for accelerating computation and eliminate memory bottleneck, in 0040-0042: In various embodiments, memory devices for stor­ing compressed data may include any type or form of volatile or non-volatile storage device or medium capable of storing data. In some embodiments, a memory device may be separate from (e.g., remote from) an accelerator or may be located directly on ( e.g., local to) an accelerator…Embodiments of the instant disclosure may also employ lossy (irreversible) com­pression techniques that may use inexact approximations and/or partial data discarding techniques.)
Examiner notes the recited claimed units executed by the distributed processor via executed computer instructions, in 0039: …The CPU 110 may include a single core processor or a multi-core processor. The CPU 110 may process or execute programs and/or data stored in the memory 140. For example, the CPU 110 may control the function of the neural network device 130 by executing programs ("one or more programs of instructions") stored in the memory 140 to implement some or all of the operations described herein.

Regarding claim 14, the rejection of claim 13 is incorporated. Park further teaches the calculation system as recited in claim 13, further comprising: a decompression unit which is configured to retrieve the compressed intermediate results from the external memory for a calculation with the aid of the neural network, to decompress the retrieve compressed intermediate results, and to carry out a further calculation as a function of the decompressed intermediate results. (claimed sequential process for retrieving compressed intermediate results via reading operations and claimed decompressing of the retrieved results and claimed further computation as step 610 depicted in Fig. 6A:

    PNG
    media_image1.png
    736
    254
    media_image1.png
    Greyscale

In 0050-0058: FIG. 6A is a flow diagram of an exemplary com­puter-implemented method 600 for performing compression and/or decompression within an accelerator. The steps shown in FIG. 6A may be performed by any suitable computer-executable code and/or computing system, includ­ing the system(s) illustrated in FIGS. 1, 5, and 7…Compressed data may be written to memory at any suitable time during network training, between training and inference, and/or during inference. Also, different portions of the compressed data may be stored in a local cache, a remote memory device, both, or neither…For example, processing unit 565 may read the compressed data from memory device 580 or fetch the compressed data from system memory 716 of computing system 710. Additionally or alternatively, processor 714 of computing system 710 may read the compressed data from system memory 716. Compressed data may be read from memory at any suitable time during network training, between training and inference, and/or during inference…Decompression subsystem 575 may decompress all the retrieved compressed model data and extract the parameters needed for a particular operation and/or may decompress only a subset of the retrieved model data…For example, network-layer logical units 435 may receive the decompressed parameters from decom­pression subsystem 575 and may use the decompressed parameters in any suitable arithmetic operation ( e.g., a multiply operation, an accumulate operation, a convolution operation, vector or matrix multiplication, etc.))

Regarding independent claim 15 Park teaches method of using a calculation system, the method comprising: providing a calculation system, the calculation system including: a convolutional neural network; (as depicted in Fig. 2 and Fig. 3, in 0038: As explained above in the discussion of FIG. 3, in a convolutional neural network each activation layer may be a set of nonlinear functions of spatially nearby subsets of outputs of a prior layer...; And in 0034-0036: FIG. 2 is a block diagram of an exemplary feed­forward neural network 200 capable of benefiting from the accelerators described herein. Neural network 200 may include an input layer 202, an output layer 204, and a series of five activation layers-activation layer 212, activation layer 214, activation layer 216, activation layer 218, and activation layer 220… While FIG. 2 shows one way to conceptualize a feed-forward neural network, there are a variety of other types of neural networks and ways to illustrate and concep­tualize neural networks…)
processing unit configured to perform sequential calculations of the neural network; and a memory external to the processing unit configured to buffer intermediate results of the calculations in the processing unit;  (operations for training neural networks and determining calculations, as claimed sequential computations on multiple sequential layers as depicted in Fig. 1.; And the computing system for computing computations in a distributed computing environment with external memory elements and processing units as depicted in Fig. 5 and in 0004-0005: As will be described in greater detail below, the instant disclosure details various systems and methods for reducing bandwidth consumption for memory accesses per­formed by an AI accelerator by compressing data written to memory and decompressing data read from memory after the data is received at the AI accelerator. For example, a computing system may include a memory device that stores compressed parameters for a layer of a neural network and a special-purpose hardware processing unit programmed to, for the layer of the neural network: (1) receive the com­pressed parameters from the memory device, (2) decom­press the compressed parameters, and (3) apply the decom­pressed parameters in an arithmetic operation of the layer of the neural network…Additionally or alternatively, the memory device may include a dynamic memory device that is remote relative to the special-purpose hardware processing unit.)
wherein the processing unit is configured to: incrementally calculate data sections, which each represent a group of intermediate results, with the aid of a neural network, (claimed incremental sections as the depicted feature maps representing claimed intermediate results of the respective convolution layers, as depicted in Fig. 3, in 0036-0037: … FIG. 3 shows a neural network 300 capable of benefiting from the accelerators described herein. As such in this figure, neural network 300 may include a variety of different types of layers 310 (some which may be fully connected feed-forward layers, such as those shown in FIG. 2). In convolution layer 312, an input 302 may undergo convolutional transformations, which may be calculated by hardware such as hardware processing unit 160, accelerator 500, and/or processor 714. For example, input 302 may undergo convolutions based on the filters and quantization parameters of convolution layer 312 to produce feature maps 304... FIG. 3 also shows that feature maps 304 output by convolution layer 312 may undergo subsampling ( e.g., pool­ing), based on the filters and parameters of subsampling layer 314, to produce feature maps 306, which may be reduced-size feature maps. The convolution and subsam­pling of layers 312 and 314 may be performed a single time or multiple times before sending an output (e.g., feature maps 306) to a fully connected layer, such as fully connected layer 316…)
conduct a lossy compression of at least one of the data sections to obtain compressed intermediate results, and transmit the compressed intermediate results to the external memory. (compressing data to write to external memory including the results of the neural network layers for accelerating computation and eliminate memory bottleneck, in 0040-0042: In various embodiments, memory devices for stor­ing compressed data may include any type or form of volatile or non-volatile storage device or medium capable of storing data. In some embodiments, a memory device may be separate from (e.g., remote from) an accelerator or may be located directly on ( e.g., local to) an accelerator…Embodiments of the instant disclosure may also employ lossy (irreversible) com­pression techniques that may use inexact approximations and/or partial data discarding techniques.)
processing, using the calculation system, image data of camera images in (i) a driver assistance system for carrying out a driver assistance function, or (ii) a system for autonomously operating the motor vehicle, to carry out an object identification method, a segmentation method or a classification method for the image data. (claimed image data for classification of the image data, in 0050: In a neural network, low-level layers, e.g., convo­lution layers, may extract low-level features an edge or gradient of a face image) from input data or an input feature map and high-level layers, e.g., fully-connected layers, may extract or detect high-level features, i.e., classes (e.g., eyes and a nose of the face image) from the input feature map.; And, in  0032: In some embodiments, server 106 may access data (e.g., data provided by computing devices 102(1)-(N)) for analysis. For example, server 106 may perform various types of machine learning tasks on data. For instance, server 106 may use machine learning algorithms to perform speech recognition ( e.g., to automatically caption videos), to enable computer vision (e.g., to identify objects in images, to classify images, to identify action in video, to turn pan­oramic photos into interactive 360 images, etc.), in recom­mender systems ( e.g., information filtering systems that predict user preferences), for facial recognition and human pose estimation, in document analysis, and/or to perform a variety of other tasks.)
Examiner notes the recited claimed units executed by the distributed processor via executed computer instructions, in 0039: …The CPU 110 may include a single core processor or a multi-core processor. The CPU 110 may process or execute programs and/or data stored in the memory 140. For example, the CPU 110 may control the function of the neural network device 130 by executing programs ("one or more programs of instructions") stored in the memory 140 to implement some or all of the operations described herein.
Regarding independent claim 16, Park teaches a machine-readable storage medium on which is stored a computer program for operating a calculation system (in 0039: …The CPU 110 may include a single core processor or a multi-core processor. The CPU 110 may process or execute programs and/or data stored in the memory 140. For example, the CPU 110 may control the function of the neural network device 130 by executing programs ("one or more programs of instructions") stored in the memory 140 to implement some or all of the operations described herein.)
The claim limitations are similar to claim limitations and are rejected under the same rationale.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Park et al. (US Pub. No. 2019/0190538, hereinafter ‘Park’) in further view of Bar -On et al . (US Pub. No. 2018/0293758, hereinafter ‘Bar’).

Regarding claim 3, the rejection of claim 1 is incorporated. Park further teaches the method as recited in claim 1, wherein, for the lossy compression, the data sections are each transformed into a …domain,  and the transformed data sections are filtered element-wise to obtain modified data (claimed transformed sections a convolutional transformation based on filters as claimed filtered element-wised to modify data sections to produce feature maps, in 0036-0037: ... For example, FIG. 3 shows a neural network 300 capable of benefiting from the accelerators described herein. As such in this figure, neural network 300 may include a variety of different types of layers 310 (some which may be fully connected feed-forward layers, such as those shown in FIG. 2). In convolution layer 312, an input 302 may undergo convolutional transformations, which may be calculated by hardware such as hardware processing unit 160, accelerator 500, and/or processor 714. For example, input 302 may undergo convolutions based on the filters and quantization parameters of convolution layer 312 to produce feature maps 304. In some embodiments, convolution layer 312 may also include a rectification sublayer (e.g., a layer implemented via a rectified linear unit, also known as a RELU layer) with an activation function... FIG. 3 also shows that feature maps 304 output by convolution layer 312 may undergo subsampling ( e.g., pool­ing), based on the filters and parameters of subsampling layer 314, to produce feature maps 306, which may be reduced-size feature maps; And managing the memory of the using lossy compression, in 0051-0052: … For example, compression subsystem 577 in FIG. 5 may compress parameters for a layer in a neural network…. In addition, model data may be compressed at any suitable time during network training, between training and inference, and/or during inference. Furthermore, all or a portion of the model data of a layer or network may be compressed, and different portions of the model data may be compressed using different compression schemes… The compression scheme (or schemes) used by compression subsystem 577 to compress model data may have been selected based on an acceptable level of loss in compression, in order to reduce a latency for decompression, and/or based on any other criteria. In some embodiments, the compression scheme may be selected to maximize performance gains from reducing memory bandwidth...)
While Park discloses the computation of neural networks using data transformation and quantization as part of the lossy compression process, as disclosed above. Park does not expressly teach 
Bar teaches the limitation: the data sections are each transformed into a frequency domain (as depicted in Fig. 6A  in 0285-0287:  … include an apparatus comprising logic, at least partially including hardware logic, to implement a lossy compression algorithm which utilizes a data transform and quantization process to compress data in a convolutional neural network (CNN) layer… wherein the apparatus compresses one or more weights in a convolutional neural network (CNN) layer in a frequency domain…). 
The Park and Bar references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing information processing system for processing neural network computations.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the method for transforming data section in a frequency domain as disclosed by Bar with the information processing system for processing neural network computations as disclosed by Park.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods in Park and Bar in order to implement deep neural networks in a parallel processing environment “  to increase processing efficiency. The efficiency provided by parallel machine learning algorithm implementations allows the use of high capacity networks and enables those networks to be trained on larger datasets”, (Bar, 0002).

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Park et al. (US Pub. No. 2019/0190538, hereinafter ‘Park’) in further view of Bar -On et al . (US Pub. No. 2018/0293758, hereinafter ‘Bar’) in further view of Jun-seok Park (US Pat. Pub. No. 2018/0253635, hereinafter ‘Park2’).

	Regarding claim 4, the rejection of claim 3 is incorporated. Park further teaches the method as recited in claim 3, wherein the element-wise filtering includes a multiplication of a portion of elements of the transformed data sections with 0 to obtain modified data sections including a number of "0" elements which is … than a number of "0" elements in the data sections of the input feature map. (claimed filtering processing to included claimed zeros using activation function to obtained modified data sections as claimed, in 0035-0036: … As shown, each value from the nodes of input layer 202 may be duplicated and sent to the nodes of activation layer 212. At activation layer 212, a set of weights (i.e., a filter) may be applied to the layer inputs, and each node may output a weighted sum to activation layer 214… For example, input 302 may undergo convolutions based on the filters and quantization parameters of convolution layer 312 to produce feature maps 304. In some embodiments, convolution layer 312 may also include a rectification sublayer (e.g., a layer implemented via a rectified linear unit, also known as a RELU layer) with an activation function.)
	Bar teaches the ReLU activation for transforming sections with 0 as claimed. (in 0170: … Several types of non-linear activation functions may be used. One particular type is the rectified linear unit (ReLU), which uses an activation function defined as f(x)=max(0,x), such that the activation is thresholded at zero.)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Park and Bar for the same reasons disclosed above.
	
While Park and Bar teach the use of data transformations for processing feature maps, the references do not expressly disclose the limitation: … the transformed data sections with 0 to obtain 
	Park2 does expressly teach the limitation: the transformed data sections with 0 to obtain modified data sections including a number of "0" elements which is greater than a number of "0" elements in the data sections of the input feature map. (using zero padding to obtained claimed modified data sections including 0 element which is greater as depicted in Fig. 11A:

    PNG
    media_image2.png
    608
    879
    media_image2.png
    Greyscale

In 0105-0108: … the neural network device 130 may generate an input feature list, which includes an index and data with respect to each of input features having a non-zero value, from an input feature map in matrix… FIG. llA is a diagram of an example in which zero-padding is applied to an input feature map IFM in a neural network…. Zero-padding in a neural network is adding zeros to the input feature map IFM in all outward directions, i.e., row and column directions. When zero-padding is applied to the input feature map IFM, an input feature map with zero­padding, i.e., a zero-padded input feature map IFM_Z may be generated. When one zero is added to every outward direction of the input feature map IFM, as shown in FIG. llA, a location, i.e., an index, of each input feature may be increased by 1…)
	
The Park, Bar, and Park2 references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing information processing system for processing neural network computations.
 to combine the method for transforming data section using zero-padding as disclosed by Park with the information processing system for processing neural network computations as collectively disclosed by Park and Bar.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods in Park, Bar, and Park2 in order to add “zeros to the input feature map IFM in all outward directions”  as “[w]hen zero-padding is applied to the input feature map IFM, an input feature map with zero­padding, i.e., a zero-padded input feature map IFM_Z may be generated” (Park2, 105-108); doing so will help facilitate efficient processing of input features having different spatial locations of the feature map over different processing circuits, (Park2, 0185).

Claims 5-7 are rejected under 35 U.S.C. 103 as being unpatentable over Park et al. (US Pub. No. 2019/0190538, hereinafter ‘Park’) in view of Bar -On et al . (US Pub. No. 2018/0293758, hereinafter ‘Bar’) in further view of Yan et al. (US Pat. Pub. No. 2018/0218518, hereinafter ‘Yan’).

	Regarding claim 5, the rejection of claim 3 is incorporated. Park in combination with Bar further teaches the method as recited in claim 3, wherein at least one of the modified data sections is encoded together, with the aid of a predefined encoding method to obtain the compressed intermediate results, with the aid of run length encoding or entropy encoding. (claimed encoding of claimed modified sections with predefined as claimed lossless compression algorithm with claim entropy encoding as Huffman coding, in 0042: … Other examples of compression schemes may include lossless compression algorithms such as lookup tables with Huffman coding ( e.g., an algorithm that assigns variable-length codes to inputs, where lengths of the assigned codes are based on frequen­cies of matrix elements) or other reversible compression techniques that allow original data to be completely recon­structed from the compressed data…)

	Yan discloses the use of a run-length encoding process. (in 0045: FIG. 3B illustrates a conceptual diagram of input data 305 and another compact data format 320, in accor­dance with one embodiment. A run-length encoding scheme is used to generate the compacted multi-bit data encoded in the compact data format 320. The compact data format 320 includes each non-zero value and the number of zeros between each non-zero element.)
The Park, Bar, and Yan references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing information-processing system for processing neural network computations.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the method for processing multi-bit data for sparse convolution neural networks as disclosed by Yan with the information processing system for processing neural network computations as collectively disclosed by Park and Bar.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods in Park, Bar, and Yan in order to process bit data by using a processing element for accessing when the “ multi-bit data is determined to equal zero and a single bit signal is transmit­ted from the memory interface to the processing element in lieu of the multi-bit data”; doing so will enable “method, computer program product, and system for sparse convolutional neural networks that improves efficiency”, (Yan, Abstract).

Regarding claim 6, the rejection of claim 5 is incorporated. Park in combination with Bar and Yan further teaches the method as recited in claim 5, wherein a compression matrix is applied to the transformed data sections for the element-wise filtering, the compression matrix being separately (sparse matrix compression as claimed compression matrix applied to claimed transformed sections, in 0042: FIG. 4 shows a data flow 400 that depicts how data may be decompressed and compressed according to aspects of the present disclosure. Either or both of DDR 402 and SRAM 404 may store compressed parameters for a neural network. These parameters ( e.g., model data) may have been compressed using any of a variety of different types of compression schemes. For example, a sparse-matrix com­pression scheme may be used to compress data. Examples of sparse-matrix compression schemes may include a com­pressed-sparse-row scheme (e.g., an algorithm that creates a format that may represent a matrix with one-dimensional arrays that contain…; And applying the process sequentially over the network layers as claimed calculation layer of the neural net, in 0044: Network-layer logical units 435 may apply the decompressed parameters in a variety of types of arithmetic operations. For example, the parameters may be applied in filtering or convolution operations, which may be matrix operations, or other operations such as RELU operations or pooling operations. In some embodiments, these parameters may be updated during execution of the layer ( e.g., during backpropagation)…; And 0048: Network-layer logical units 435 may include one or more logical units or other calculation hardware, such as matrix multipliers or general matrix-matrix multiplication (GEMM) units, tensor units, or other logical and/or arith­metic units used for performing calculations for a layer ( e.g., as part of training and/or inference operations)…)

Regarding claim 7, the rejection of claim 6 is incorporated. Park in combination with Bar and Yan further teaches the method as recited in claim 6, wherein the compression matrix is matched to the encoding method of the transformed data sections in such a way that a better compression is achieved than with a direct application of the encoding method to the non-transformed data sections. (claimed use of matrix in the convolution operation as the claimed matrix matched to non-transformed data sections in the fully connected layers for making inferences, in 0036-0037: … In convolution layer 312, an input 302 may undergo convolutional transformations, which may be calculated by hardware such as hardware processing unit 160, accelerator 500, and/or processor 714. For example, input 302 may undergo convolutions based on the filters and quantization parameters of convolution layer 312 to produce feature maps 304 [claimed compression matrix is matched to the encoding method of the transformed data sections in such a way that a better compression is achieved than with a direct application of the encoding]. In some embodiments, convolution layer 312 may also include a rectification sublayer (e.g., a layer implemented via a rectified linear unit, also known as a RELU layer) with an activation function… The convolution and subsam­pling of layers 312 and 314 may be performed a single time or multiple times before sending an output (e.g., feature maps 306) to a fully connected layer, such as fully connected layer 316. Fully connected layer 316, which FIG. 3 shows one example of, may process feature maps 306 to identify the most probable inference or classification for input 302 and may provide this classification or inference as output 320.)

Claims 8-9 are rejected under 35 U.S.C. 103 as being unpatentable over Park et al. (US Pub. No. 2019/0190538, hereinafter ‘Park’) in view of Bar -On et al . (US Pub. No. 2018/0293758, hereinafter ‘Bar’) in further view of Yan et al. (US Pat. Pub. No. 2018/0218518, hereinafter ‘Yan’) and in further view of Thiagarajan et al. (US Pub No. 2018/0084253, hereinafter ‘Thia’).

Regarding claim 8, the rejection of claim 6 is incorporated. Park in combination with Bar and Yan further teaches the method as recited in claim 6, wherein the compression matrix is predefined in such a way that higher-frequency components of the transformed data sections are filtered. (claimed compressed matrix filtered frequency components predefined via lossless compression algorithm, , in 0042: FIG. 4 shows a data flow 400 that depicts how data may be decompressed and compressed according to aspects of the present disclosure. Either or both of DDR 402 and SRAM 404 may store compressed parameters for a neural network… For example, a sparse-matrix com­pression scheme may be used to compress data. Examples of sparse-matrix compression schemes may include a com­pressed-sparse-row scheme... Other examples of compression schemes may include lossless compression algorithms such as lookup tables with Huffman coding ( e.g., an algorithm that assigns variable-length codes to inputs, where lengths of the assigned codes are based on frequen­cies of matrix elements) …)
While Park discloses the compressing and quantization of matrix components for preforming operations in a sequential neural network as discussed above. Park, Bar, and Yan do not expressly disclose the processing of frequencies of matrix elements such that high-frequency components are filtered.
Thia does expressly disclose the processing of frequencies of matrix elements such that high-frequency components are filtered. (in 0047: Generally, the values inserted into the base quan­tization matrix are expected to be larger than the values that are replaced in the base quantization matrix. This is expected to increase the likelihood that information in the transform matrix at corresponding positions will be discarded, thereby increasing the likelihood of higher-frequency components being discarded during compression and increasing the amount of compression…; And filtering high frequency components by setting tem to zero, in 0054: In some cases, the highest-frequency values or higher­frequency values of the quantized transform matrix may be set to zero with the techniques described in U.S. Provisional Patent App. 62/513,681, titled MODIFYING COEFFI­CIENTS OF A TRANSFORM MATRIX…)
The Park, Bar, Yan, and Thia references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing information processing system for compressing data used in matrix computations by machine learning algorithms .
 to combine the method for processing video data using quantization and entropy coding to produce compressed video data as disclosed by Thia with the information processing system for processing neural network computations as collectively disclosed by Park, Bar, and Yan.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods in Park, Bar, Yan, and Thia in order to process video data using quantization and entropy coding to produce compressed video data; doing so will “increase the likelihood that information in the transform matrix at corresponding positions will be discarded, thereby increasing the likelihood of higher-frequency components being discarded during compression and increasing the amount of compression”, (Thia, 0047); and “enhance compression resulting from subsequent entropy coding operations”, (Thia, 0054).

Regarding claim 9, the rejection of claim 8 is incorporated. Park in combination with Bar, Yan, and Thia further teaches the method as recited in claim 8, wherein the compression matrix is predefined in that, during a training of the neural network, matrix elements of the filter matrices for each calculation layer are trained together with neuron parameters of neurons of the neural network with the aid of a back-propagation method. (claimed training process using backprop in 0044: Network-layer logical units 435 may apply the decompressed parameters in a variety of types of arithmetic operations. For example, the parameters may be applied in filtering or convolution operations, which may be matrix operations, or other operations such as RELU operations or pooling operations. In some embodiments, these parameters may be updated during execution of the layer ( e.g., during backpropagation)…)



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUWATOSIN ALABI whose telephone number is (571)272-0516.  The examiner can normally be reached on Monday-Friday, 8:00am-5:00pm EST..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/O.O.A./Examiner, Art Unit 2126        
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126