DETAILED ACTION
1.	This office action is in response to the Application No.16519643 filed on 10/02/2019. Claims 1-19, 22 are presented for examination and are currently pending.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
3.	Claim 1 and 22 is objected to because of the following informalities:  
	In claim 1, the limitation recites “perform select one or more operations among multiple operations including a multiplication based on the input signal, …, according to the computation mode,”. It should be “perform a selected one or more operations among multiple operations including a multiplication based on the input signal, …, according to the computation mode”.
	In claim 22, the limitation recites “an apparatus for a neural network comprising: an input processor suitable for receiving an input signal … and a computation circuit suitable for performing a lattice multiplication …”. The term suitable suggests the processor and computation circuit is capable of performing operations (i.e. intended use) but does not require the processor and circuit to be configured or programmed to perform the operations listed in claim 22.  For purpose of compact prosecution, it will interpret the processor and circuit as being configured to perform the listed operations in claim 22.
	 Appropriate correction is required.

Claim Rejections - 35 USC § 101 
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
 

4.	Claims 1-19 are rejected under 35 U.S.C 101 because the claimed invention is directed towards an abstract idea without significantly more.

	Step 1
	Independent claim 1 is directed to an apparatus, and falls into one of the four statutory categories.
	Step 2A, Prong 1
	Claim 1 recites the following abstract ideas:
	to decide a computation mode according to precision of an input signal, (mental process directed to observation of the precision of the input data) and 	
	change or maintain the precision of the input signal according to the decided computation mode; (mental process with the aid of pen and paper directed to rounding the input data in response to the observation)
	perform select one or more operations among multiple operations including a multiplication based on the input signal, boundary migration to rearrange multiple signals divided from the input signal, (mathematical concepts directed to calculations performed on input data) and 
	an addition of the input signal subjected to the boundary migration, according to the computation mode, (mathematical concepts directed to calculations performed on input data) and 
	perform the selected one or more operations on the input signal. (mathematical concepts directed to calculations performed on input data)  

	Step 2A, Prong 2
	Claim 1 recites the following additional elements:
	an input processor (the limitation is directed to a generic computer component that is used to observe the input data and does not integrate the abstract idea into a practical application)
	receive the input signal, (the limitation is directed to transmission of data and does not integrate the abstract idea into a practical application)
	a computation circuit (the limitation is directed to a generic computer component that is used to carry out the mathematical calculations and does not integrate the abstract idea into a practical application)
	Step 2B
	Claim 1 recites the following additional elements:
	an input processor (the limitation is directed to a generic computer component that is used to observe the input data and does not amount to significantly more than judicial exception. See MPEP 2106.05(f))
	receive the input signal, (the limitation is directed to transmission of data and does not amount to significantly more, see MPEP 2106.05(d)(II)(i))
	a computation circuit (the limitation is directed to a generic computer component that is used to carry out the mathematical calculations and does not amount to significantly more, see MPEP 2106.05(f))

5.	Dependent claim 2 is directed to an apparatus, and falls into one of the four statutory categories.  
	Claim 2 recites the following abstract ideas:
	wherein, in changing the precision of the input signal, (mental process directed to altering the input data in response to the observation)
	divides the input signal into the multiple signals, each having a smaller number of bits than the number of bits in the input signal according to the computation mode (mathematical concepts directed to calculations performed on input data)
	Claim 2 recites the following additional limitations:
	the input processor (the limitation is directed to a generic computer component that is used to observe the input data and does not integrate the abstract idea into a practical application)
	transfers the multiple signals to the computation circuit. (the limitation is directed to using a generic computer component in data transmission. This does not integrate the abstract idea into a practical application)
	Claim 2 recites the following additional limitations:
	the input processor (the limitation is directed to a generic computer component that is used to observe the input data. This does not amount to significantly more than judicial exception. See MPEP 2106.05(f))
	transfers the multiple signals to the computation circuit. (the limitation is directed to transmission of data. This does not amount to significantly more, see MPEP 2106.05(d)(II)(iv))

6.	Dependent claim 3 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 3 recite the following abstract ideas:
	wherein, in dividing the input signal, the input processor divides bits of the input signal in half. (mathematical concepts directed to calculations performed on input data)
	claim 3 do not recite additional element

7.	Dependent claim 4, is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 4 recites the following abstract ideas:
	to perform a computation on the multiple signals whose precisions have been changed, according to a lattice multiplication rule; (mathematical concepts directed to a multiplication calculation performed on input data)
	to perform the boundary migration (mental process directed to rearranging input data) and 
	an addition on an output value of the plurality of first multipliers (mathematical concepts directed to addition calculation performed on input data)
	Claim 4 recites the following additional elements:
	a plurality of first multipliers (this limitation is directed to using a generic computer component to carry out the multiplication of the input data. This does not integrate the abstract idea into a practical application)
	a boundary migrator (this limitation is directed to transmission of input data. This does not integrate the abstract idea into a practical application)
	Claim 4 recites the following additional elements:
	a plurality of first multipliers (this limitation is directed to using a generic computer component to carry out the multiplication of the input data. This does not amount to significantly more than judicial exception. See MPEP 2106.05(f))
	a boundary migrator (this limitation is directed to transmission of input data. This does not amount to significantly more, see MPEP 2106.05(d)(II)(iv))

8.	Dependent claim 5, is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 5 recites the following abstract ideas:
	perform a bit-wise AND operation on the multiple signals, (mathematical concepts directed to addition calculation performed on input data) and
	generate individual lattice values for each of the multiple signals by performing a bit-wise addition on the respective multiple signals to perform a carry update in a first direction. (mathematical concepts directed to addition calculation performed on input data)
	Claim 5 recites the following additional elements:
	the first multipliers (this limitation is directed to using a generic computer component to carry out the multiplication of the input data. This does not integrate the abstract idea into a practical application)
	Claim 5 recites the following additional elements:
	the first multipliers (this limitation is directed to using a generic computer component to carry out the multiplication of the input data. This does not amount to significantly more than judicial exception. See MPEP 2106.05(f))

9.	Dependent claim 6 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 6 recites the following abstract ideas:
	performs the boundary migration by rearranging the individual lattice values at boundary migration positions matched with the positions of the corresponding multiple signals, (mental process with the aid of pen and paper to rearrange input data) and 
	generates a result value by adding the boundary migration values in a second direction. (mathematical concepts directed to addition calculations performed on input data)
	Claim 6 recites the following additional elements:
	the boundary migrator (this limitation is directed to transmission of input data. This does not integrate the abstract idea into a practical application)
	Claim 6 recites the following additional elements:
	the boundary migrator (this limitation is directed to transmission of input data. This does not amount to significantly more, see MPEP 2106.05(d)(II)(iv))
	
10.	Dependent claim 7 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 7 recite the following abstract ideas:
	to accumulate an output value of the first flip-flop (mathematical concepts directed to addition calculations performed on output value)
	Claim 7 recites the following additional elements:
	a first flip-flop (the limitation is directed to a generic computer component used to store binary bit of data. This limitation does not integrate the abstract idea into a practical application)
	to perform a retiming operation on the result value received from the boundary migrator; (this limitation is directed to a conventional computer operation to timing of data. This limitation does not integrate the abstract idea into a practical application)
	a first accumulator (this limitation is directed to a generic computer component used to perform the addition of the output value. This limitation does not integrate the abstract idea into a practical application) and
	a second flip-flop (the limitation is directed to a generic computer component used to store binary bit of data. This limitation does not integrate the abstract idea into a practical application)
	to perform a retiming operation on an output value received from the first accumulator and output the retimed result value. (this limitation is directed to a conventional computer operation to timing of data. This limitation does not integrate the abstract idea into a practical application)
	Claim 7 recites the following additional elements:
	a first flip-flop (the limitation is directed to a generic computer component used to store binary bit of data. This does not amount to significantly more, see MPEP 2106.05(d)(II)(iv))
	to perform a retiming operation on the result value received from the boundary migrator; (this limitation is directed to a conventional computer operation to timing of data. This does not amount to significantly more, see US7197053, col 4, lines 34-39)
	a first accumulator (this limitation is directed to a generic computer component used to perform the addition of the output value. This does not amount to significantly more, see MPEP 2106.05(d)(II)(iv)) and
	a second flip-flop (the limitation is directed to a generic computer component used to store binary bit of data. This does not amount to significantly more, see MPEP 2106.05(d)(II)(iv))
	to perform a retiming operation on an output value received from the first accumulator and output the retimed result value. (this limitation is directed to a conventional computer operation to timing of data. This does not amount to significantly more, see US7197053, col 4, lines 34-39)

11.	Dependent claim 8 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 8 recites the following abstract ideas:
	to generate a result value by perform a computation on the input signal according to a lattice multiplication rule. (mathematical concepts directed to calculations performed on input data)
	Claim 8 recites the following additional elements:
	the computation circuit comprises a second computation circuit comprising a plurality of second multipliers (the limitation is directed to a generic computer component that is used to carry out the mathematical calculations. This does not integrate the abstract idea into a practical application)
	 Claim 8 recites the following additional elements:
	the computation circuit comprises a second computation circuit comprising a plurality of second multipliers (the limitation is directed to a generic computer component that is used to carry out the mathematical calculations and does not amount to significantly more, see MPEP 2106.05(d)(II)(iv))

12.	Dependent claim 9 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 9 recites the following abstract ideas:
	to perform an addition on the result value; (mathematical concepts directed to calculations performed on input data) and
	Claim 9 recites the following additional elements:
	a second accumulator (this limitation is directed to a generic computer component used to perform the addition of the result value. This does not integrate the abstract idea into a practical application)
	a third flip-flop (the limitation is directed to a generic computer component used to store binary bit of data. This does not integrate the abstract idea into a practical application)
	 to perform a retiming operation on the result value from the second accumulator and output the retimed result value (this limitation is directed to a conventional computer operation to timing of data. This does not integrate the abstract idea into a practical application)
	Claim 9 recites the following additional elements:
	a second accumulator (this limitation is directed to a generic computer component used to perform the addition of the result value. This does not amount to significantly more than judicial exception see MPEP 2106.05(d)(II)(iv))
	a third flip-flop (the limitation is directed to a generic computer component used to store binary bit of data. This does not amount to significantly more than judicial exception see MPEP 2106.05(d)(II)(iv))
	 to perform a retiming operation on the result value from the second accumulator and output the retimed result value (this limitation is directed to a conventional computer operation to timing of data. This does not amount to significantly more, see US7197053, col 4, lines 34-39)

13.	Dependent claim 10 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 10 recite the following abstract ideas:
	to perform a lattice multiplication on the first and second input signals and output a first result value; (mathematical concepts directed to a multiplication calculation performed on input data)
	to perform the boundary migration and an addition on the first result value from the third multiplier to generate a second result value; (mathematical concepts directed to addition calculations performed on input data) and
	Claim 10 recites the following additional elements:
	a third multiplier (this limitation is directed to using a generic computer component to carry out the multiplication of the input data. This does not integrate the abstract idea into a practical application)
	an adder (this limitation is directed to a computer component used to carry out the mathematical calculations. This does not integrate the abstract idea into a practical application)
	a fourth flip-flop (the limitation is directed to a generic computer component used to store binary bit of data. This does not integrate the abstract idea into a practical application)
	to perform an retiming operation on the second result value and output the retimed second result value. (this limitation is directed to a conventional computer operation to timing of data. This limitation does not integrate the abstract idea into a practical application)
	Claim 10 recites the following additional elements:
	a third multiplier (this limitation is directed to using a generic computer component to carry out the multiplication of the input data. This does not amount to significantly more than the judicial exception, see MPEP 2106.05(f))
	an adder (this limitation is directed to a computer component used to carry out the mathematical calculations. This does not amount to significantly more than the judicial exception, see MPEP 2106.05(f))
	a fourth flip-flop (the limitation is directed to a generic computer component used to store binary bit of data. This does not amount to significantly more, see MPEP 2106.05(d)(II)(iv))
	to perform an retiming operation on the second result value and output the retimed second result value. (this limitation is directed to a conventional computer operation to timing of data. This does not amount to significantly more, see US7197053, col 4, lines 34-39)

14.	Dependent claim 11 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 11 recite the following abstract ideas:
	performs a counting function (mathematical concepts directed to counting calculations)
	Claim 11 recites the following additional elements:
	the adder and controls computation logic for the first and second input signals to be repeatedly performed a set number of times. (this limitation is directed to a computer component used to carry out the mathematical calculations. This does not integrate the abstract idea into a practical application)
	Claim 11 recites the following additional elements:
	the adder and controls computation logic for the first and second input signals to be repeatedly performed a set number of times. (this limitation is directed to a computer component used to carry out the mathematical calculations. This does not amount to significantly more than the judicial exception, see MPEP 2106.05(f))
	
15.	Dependent claim 12 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 12 do not recite any abstract idea.
	Claim 12 recites the following additional elements:
	a fifth flip-flop configured to transfer the first input signal to a first another computation circuit adjacent thereto; (the limitation is directed to a generic computer component used for transmission of data and does not integrate the abstract idea into a practical application)
	a sixth flip-flop configured to transfer the second input signal to a second another computation circuit adjacent thereto; (the limitation is directed to a generic computer component used for transmission of data. This does not integrate the abstract idea into a practical application)
	a multiplexer configured to output any one of the second result value from the fourth flip-flop and a result value from the first another computation circuit; (the limitation is directed to a generic computer component used for transmission of data. This does not integrate the abstract idea into a practical application) and
	a seventh flip-flop configured to output the result value from the multiplexer. (the limitation is directed to a generic computer component used for transmission of data. This does not integrate the abstract idea into a practical application)
	Claim 12 recites the following additional elements:
	a fifth flip-flop configured to transfer the first input signal to a first another computation circuit adjacent thereto; (the limitation is directed to a generic computer component used for transmission of data. This does not amount to significantly more, see MPEP 2106.05(d)(II))
	a sixth flip-flop configured to transfer the second input signal to a second another computation circuit adjacent thereto; (the limitation is directed to a generic computer component used for transmission of data. This does not amount to significantly more, see MPEP 2106.05(d)(II))
	a multiplexer configured to output any one of the second result value from the fourth flip-flop and a result value from the first another computation circuit; (the limitation is directed to a generic computer component used for transmission of data.  This does not amount to significantly more, see MPEP 2106.05(d)(II)) and
	a seventh flip-flop configured to output the result value from the multiplexer. (the limitation is directed to a generic computer component used for transmission of data This does not amount to significantly more, see MPEP 2106.05(d)(II))

16.	Dependent claim 13 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 13 recites the following abstract ideas:
	performs a multiplication on each of the multiple signals derived from the input signal, using any of lattice multiplication, Booth multiplication, Dadda multiplication and Wallace multiplication (mathematical concepts directed to a multiplication calculation performed on input data)
	Claim 13 recites the following additional elements:
	computation circuit (the limitation is directed to a generic computer component that is used to carry out the mathematical calculations and does not integrate the abstract idea into a practical application)
	Claim 13 recites the following additional elements:
	computation circuit (the limitation is directed to a generic computer component that is used to carry out the mathematical calculations. This does not amount to significantly more, see MPEP 2106.05(f))

17.	Independent claim 14 is directed to a method, and falls into one of the four statutory categories.
	Claim 14 recites the following abstract ideas:
	deciding a computation mode according to precision of an input signal; (mental process directed to observation of the precision of the input data)
	changing or maintaining the precision of the input signal according to the decided computation mode; (mental process directed to altering the input data in response to the observation)
	selecting one or more operations among multiple operations including a multiplication based on the input signal, boundary migration to rearrange multiple signals divided from the input signal, (mathematical concepts directed to calculations performed on input data) and 
	an addition of the input signal subjected to the boundary migration, according to the computation mode; (mathematical concepts directed to calculations performed on input data) and
	performing the one or more selected operations on the changed input signals. (mathematical concepts directed to calculations performed on input data)  
	Claim 14 do not recite any additional elements.

18.	Dependent claim 15 is directed to a method, and falls into one of the four statutory categories.
	Claim 15 recites the following abstract ideas:
	dividing the input signal into the multiple signals, each having a smaller number of bits than the number of bits in the input signal, according to the computation mode; (mathematical concepts directed to calculations performed on input data) and
	Claim 15 recites the following additional limitations:
	outputting the multiple signals. (this limitation is directed to outputting input data. This does not integrate the abstract idea into a practical application)
	Claim 15 recites the following additional limitations:
	outputting the multiple signals. (this limitation is directed to outputting input data. This does not amount to significantly more, see MPEP 2106.05(d)(II)(i))

19.	Dependent claim 16 is directed to a method, and falls into one of the four statutory categories.
	Claim 16 recites the following abstract ideas:
	dividing bits of the input signal in half. (mathematical concepts directed to calculations performed on input data)
	Claim 16 do not recite any additional limitations.

20.	Dependent claim 17 is directed to a method, and falls into one of the four statutory categories.
	Claim 17 recites the following abstract ideas:
	performing of the one or more selected operations comprises the steps of: performing a computation on the multiple signals whose precisions have been changed, according to a lattice multiplication rule; (mathematical concepts directed to a multiplication calculation performed on input data) and
	performing the boundary migration (mental process directed to rearranging input data) and 
	an addition on the computation result. (mathematical concepts directed to addition calculation performed on input data)
	Claim 17 do not recite any additional elements.

21.	Dependent claim 18 is directed to a method, and falls into one of the four statutory categories.
	Claim 18 recites the following abstract ideas:
	performing a bit-wise AND operation on the multiple signals; (mathematical concepts directed to addition calculation performed on input data) and
	generating individual lattice values for each of the multiple signals by performing a bit-wise addition on the respective multiple signals to perform a carry update in a first direction. (mathematical concepts directed to addition calculation performed on input data)
	Claim 18 do not recite any additional elements.

22.	Dependent claim 19 is directed to a method, and falls into one of the four statutory categories.
	Claim 19 recites the following abstract ideas:
	performing the boundary migration by rearranging the individual lattice values at boundary migration positions matched with the positions of the corresponding multiple signals; (mental process directed to rearranging input data) and
	generating a result value by adding the boundary migration values in a second direction. (mathematical concepts directed to addition calculation performed on data)
	Claim 19 do not recite any additional limitations.

23.	Independent claim 22 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 22 recites the following abstract ideas:
	performing a lattice multiplication on each of the multiple signals, (mathematical concepts directed to a multiplication calculation performed on input data) and 
	performing migration on multiplication results thereof to generate a multiplication result corresponding to the input signal. (mental process directed to rearranging input data)
	Claim 22 recites additional limitations:
	an input processor suitable for receiving an input signal corresponding to an n×n lattice, (the limitation is directed to a generic computer component that is used to retrieve data. This does not integrate the abstract idea into a practical application) and 
	processing the input signal to generate multiple signals respectively corresponding to (n/2)×(n/2) sub lattices of the n×n lattice; (this limitation is directed to analyzing data. This does not integrate the abstract idea into a practical application) and
	a computation circuit (the limitation is directed to a generic computer component that is used to carry out the mathematical calculations. This does not integrate the abstract idea into a practical application)
	Claim 22 recites additional limitations:
	an input processor suitable for receiving an input signal corresponding to an n×n lattice, (the limitation is directed to a generic computer component that is used to retrieve data. This does not amount to significantly more, see MPEP 2106.05(d)(II)(iv)) and 
	processing the input signal to generate multiple signals respectively corresponding to (n/2)×(n/2) sub lattices of the n×n lattice; (this limitation is directed to analyzing data. This does not amount to significantly more, see MPEP 2106.05(f)) and
	a computation circuit (the limitation is directed to a generic computer component that is used to carry out the mathematical calculations. This does not amount to significantly more, see MPEP 2106.05(f))


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

24.	Claims 1 and 14 are rejected under 35 U.S.C 102(a)(1) as being anticipated by Ould-Ahmed-Vall et al (US20180315159 filed 10/20/2017)

	Regarding claim 1, Ould-Ahmed-Vall teaches an accelerating apparatus for a neural network (The compute framework 606 can abstract the underlying instructions provided to the GPGPU driver 608 to enable the machine learning framework 604 to take advantage of hardware acceleration via the GPGPU hardware 610 [0143]; A machine learning application 602 can be configured to train a neural network [0141]) comprising:
	an input processor configured to decide a computation mode according to precision of an input signal, (For example, the GPGPU 1306 can support instructions to perform low precision computations such as 8-bit and 4-bit integer vector operations. [0186]; … an operation on a pair of 8-bit integer inputs [0203].The Examiner notes that 8-bit integer is an input signal with low precision (e.g. INT8 OR 8-bit integer value [0198])) and 
	change or maintain the precision of the input signal according to the decided computation mode; (step 1: receive request to perform a numerical operation at a first precision 1702, step 2: perform the numerical operation using a number of bits associated with a second precision that is lower than the first precision 1704, Fig. 17. The Examiner notes that this implies changing the precision of the input signal in an INT8 computation mode) and
	a computation circuit configured to receive the input signal from the input processor, (The GPGPU 700 receives commands from the host processor [0145]) 	 (The multiplier 1506 is configurable to perform a multiply or divide operation at half precision for a data type, …The multiplier 1506 can also perform an 8-bit multiply operation for an INT8 integer value [0198])
	perform select one or more operations among multiple operations including a multiplication based on the input signal,	boundary migration to rearrange multiple signals divided from the input signal and an addition of the input signal subjected to the boundary migration, according to the computation mode, (In some embodiments, a pipeline select command 3113 is used when a command sequence requires the graphics processor to explicitly switch between pipelines. In some embodiments, a pipeline select command 3113 is required only once within an execution context before issuing pipeline commands unless the context is to issue commands for both pipelines [0272]; For example, the multiplier 1506 can perform … a 16-bit multiply operation for a 16-bit integer operation. The multiplier 1506 can also perform an 8-bit multiply operation for an INT8 integer value [0198])  and 
	perform the selected one or more operations on the input signal (For example, an operation on a pair of 8-bit integer inputs 1605A-1605B can be performed via an operation thread 1606D via a dynamic floating-point unit 1608D to generate an 8-bit integer output 1616 [0203])

	Regarding claim 14, Ould-Ahmed-Vall teaches an operating method of an accelerating apparatus for a neural network, (The compute framework 606 can abstract the underlying instructions provided to the GPGPU driver 608 to enable the machine learning framework 604 to take advantage of hardware acceleration via the GPGPU hardware 610 [0143]; A machine learning application 602 can be configured to train a neural network [0141]) comprising:
	deciding a computation mode according to precision of an input signal; (For example, the GPGPU 1306 can support instructions to perform low precision computations such as 8-bit and 4-bit integer vector operations. [0186]; … an operation on a pair of 8-bit integer inputs [0203]. The Examiner notes that 8-bit integer is an input signal with low precision (e.g. INT8 OR 8-bit integer value [0198]
	changing or maintaining the precision of the input signal according to the decided computation mode; (step 1: receive request to perform a numerical operation at a first precision 1702, step 2: perform the numerical operation using a number of bits associated with a second precision that is lower than the first precision 1704, Fig. 17. The Examiner notes that this implies changing the precision of the input signal in an INT8 computation mode)
	selecting one or more operations among multiple operations including a multiplication based on the input signal, boundary migration to rearrange multiple signals divided from the input signal, and an addition of the input signal subjected to the boundary migration, according to the computation mode; (In some embodiments, a pipeline select command 3113 is used when a command sequence requires the graphics processor to explicitly switch between pipelines. In some embodiments, a pipeline select command 3113 is required only once within an execution context before issuing pipeline commands unless the context is to issue commands for both pipelines [0272]; For example, the multiplier 1506 can perform … a 16-bit multiply operation for a 16-bit integer operation. The multiplier 1506 can also perform an 8-bit multiply operation for an INT8 integer value [0198]) and
	performing the one or more selected operations on the changed input signals. (For example, an operation on a pair of 8-bit integer inputs 1605A-1605B can be performed via an operation thread 1606D via a dynamic floating-point unit 1608D to generate an 8-bit integer output 1616 [0203])

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.





25.	Claim 2, 3, 15, 16 are rejected under 35 U.S.C. 103 as being unpatentable over Ould-Ahmed-Vall et al (US20180315159 filed 10/20/2017) in view of Lim et al (US20180082400)
	
	Regarding claim 2, Ould-Ahmed-Vall teaches the accelerating apparatus according to claim 1, Ould-Ahmed-Vall teaches wherein, in changing the precision of the input signal, (step 1: receive request to perform a numerical operation at a first precision 1702, step 2: perform the numerical operation using a number of bits associated with a second precision that is lower than the first precision 1704, Fig. 17.) and 
	transfers the multiple signals to the computation circuit (Computing the remaining bits of the result at block 1710 can be performed, in one embodiment, via overflow logic units, such as the overflow multiplier 1504, as in FIG. 15 [0206])
	Ould-Ahmed-Vall does not explicitly teach the input processor divides the input signal into the multiple signals, each having a smaller number of bits than the number of bits in the input signal according to the computation mode,
	Lim teaches the input processor divides the input signal into the multiple signals, each having a smaller number of bits than the number of bits in the input signal according to the computation mode, (Vision module 322 performs various operations to facilitate computer vision operations at CPU 208 such as facial detection in image data [0050];  computer vision applications such as image classification, scene detection, facial expression detection, human detection, object detection, scene classification, and text classification [0059]; FIG. 3 is a block diagram illustrating image processing pipelines implemented using an image signal processor, according to one embodiment. [0007]; the computation core 516 is designed to process only 8 bit data. To process input data of 16 bit, each input data is first divided into two 8 bit data portions: One data portion is 8 bit image data including 8 most significant bits (MSB) and the other data portion includes 8 least significant bits (LSB) [0088]. The Examiner notes that computer vision is a field in artificial intelligence)
	It would have been obvious for a person having ordinary skill in the art before the effective filling date to have modified the method of Ould-Ahmed-Vall to incorporate the teachings of Lim for the benefit of performing computation based on a subset of input data or a subset of kernels in a single cycle (Lim [0069])
	
	Regarding claim 3, Modified Ould-Ahmed-Vall teaches the accelerating apparatus according to claim 2, Lim teaches wherein, in dividing the input signal, the input processor divides bits of the input signal in half. (Vision module 322 performs various operations to facilitate computer vision operations at CPU 208 such as facial detection in image data [0050]; To process input data of 16 bit, each input data is first divided into two 8 bit data portions: One data portion is 8 bit image data including 8 most significant bits (MSB) and the other data portion includes 8 least significant bits (LSB) [0088]. The Examiner notes that computer vision is a field in artificial intelligence)
	The same motivation to combine dependent claim 2 applies here.

	Regarding claim 15, Ould-Ahmed-Vall teaches the operating method according to claim 14, wherein the changing or maintaining of the precision comprises: (step 1: receive request to perform a numerical operation at a first precision 1702, step 2: perform the numerical operation using a number of bits associated with a second precision that is lower than the first precision 1704, Fig. 17.) and
	outputting the multiple signals. (Computing the remaining bits of the result at block 1710 can be performed, in one embodiment, via overflow logic units, such as the overflow multiplier 1504, as in FIG. 15 [0206])
	Ould-Ahmed-Vall does not explicitly teach dividing the input signal into the multiple signals, each having a smaller number of bits than the number of bits in the input signal, according to the computation mode;
	Lim teaches dividing the input signal into the multiple signals, each having a smaller number of bits than the number of bits in the input signal, according to the computation mode (Vision module 322 performs various operations to facilitate computer vision operations at CPU 208 such as facial detection in image data [0050];  computer vision applications such as image classification, scene detection, facial expression detection, human detection, object detection, scene classification, and text classification [0059]; FIG. 3 is a block diagram illustrating image processing pipelines implemented using an image signal processor, according to one embodiment. [0007]; the computation core 516 is designed to process only 8 bit data. To process input data of 16 bit, each input data is first divided into two 8 bit data portions: One data portion is 8 bit image data including 8 most significant bits (MSB) and the other data portion includes 8 least significant bits (LSB) [0088]. The Examiner notes that computer vision is a field in artificial intelligence)
	It would have been obvious for a person having ordinary skill in the art before the effective filling date to have modified the method of Ould-Ahmed-Vall to incorporate the teachings of Lim for the benefit of performing computation based on a subset of input data or a subset of kernels in a single cycle (Lim [0069])

	Regarding claim 16, Modified Ould-Ahmed-Vall teaches the operating method according to claim 15, Lim teaches wherein the dividing of the input signal comprises dividing bits of the input signal in half. (Vision module 322 performs various operations to facilitate computer vision operations at CPU 208 such as facial detection in image data [0050]; the computation core 516 is designed to process only 8 bit data. To process input data of 16 bit, each input data is first divided into two 8 bit data portions: One data portion is 8 bit image data including 8 most significant bits (MSB) and the other data portion includes 8 least significant bits (LSB) [0088]. The Examiner notes that computer vision is a field in artificial intelligence)
	The same motivation to combine dependent claim 15 applies here.


26. 	Claims 4, 17 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Ould-Ahmed-Vall et al (US20180315159 filed 10/20/2017) in view of Lim et al (US20180082400) in view of Ghasemi et al (US11106968 filed 05/24/2018) and further in view of Ritter et al. ("Lattice algebra approach to single-neuron computation." IEEE Transactions on Neural Networks 14.2 (2003): 282-295.)

	Regarding claim 4, Ould-Ahmed-Vall teaches the accelerating apparatus according to claim 2, wherein the computation circuit comprises a first computation circuit comprising: a plurality of first multipliers configured to perform a computation on the multiple signals whose precisions have been changed, (In one embodiment the dynamic precision multiplier 1418 includes a multiplier 1506 and an overflow multiplier 1504. The multiplier 1506 is configurable to perform a multiply or divide operation at half precision for a data type [0198]) and
	Ould-Ahmed-Vall does not explicitly teach a boundary migrator configured to perform the boundary migration and according to a lattice multiplication rule;
	Ghasemi teaches a boundary migrator configured to perform the boundary migration and an addition on an output value of the plurality of first multipliers (the pixel iterator 102 provides an out-of-bounds signal 110 to a multiplexer 106 and the multiplexer 106 selects an input having a constant value (e.g., zero), which can be an out-of-bounds address, and provides the constant value to an application 112. If the traversal is in-bounds, the data value(s) (e.g., indexed pixels) stored in the buffer circuit 104 at the address(es) generated by the pixel iterator 102 is provided to an application 112, such as an array of MAC circuits, col 4, lines 43-51)
	an addition on an output value of the plurality of first multipliers (The address generation circuit 206 includes multiplier 228 that performs the multiplication of the height traversal location 275 with the width, ifm_w, of the IFM 280 and adder 248 that performs the summation of the output of multiplier 228, col 6, lines 36-40)
	The same motivation to combine independent claim 11 applies here.
	Ritter teaches according to a lattice multiplication rule; (For a biological (or traditional) interpretation of negative weights, simply observe that these correspond to large positive weights in the equivalent multiplicative lattice 
    PNG
    media_image1.png
    34
    346
    media_image1.png
    Greyscale
 , pg. 284, right col, last para.)
	It would have been obvious for a person having ordinary skill in the art before the effective filling date to have modified the method of Modified Ould-Ahmed-Vall to incorporate the teachings of Ritter for the benefit of providing extremely fast neural computation and easy hardware implementation (Ritter, pg. 284, left col, first para.)

	Regarding claim 17, Ould-Ahmed-Vall teaches the operating method according to claim 15, wherein the performing of the one or more selected operations comprises the steps of: performing a computation on the multiple signals whose precisions have been changed, (In one embodiment the dynamic precision multiplier 1418 includes a multiplier 1506 and an overflow multiplier 1504. The multiplier 1506 is configurable to perform a multiply or divide operation at half precision for a data type [0198])
	Ould-Ahmed-Vall does not explicitly teach performing the boundary migration and an addition on the computation result, according to a lattice multiplication rule; 
	Ghasemi teaches performing the boundary migration (the pixel iterator 102 provides an out-of-bounds signal 110 to a multiplexer 106 and the multiplexer 106 selects an input having a constant value (e.g., zero), which can be an out-of-bounds address, and provides the constant value to an application 112. If the traversal is in-bounds, the data value(s) (e.g., indexed pixels) stored in the buffer circuit 104 at the address(es) generated by the pixel iterator 102 is provided to an application 112, such as an array of MAC circuits, col 4, lines 43-51) and 
	an addition on the computation result (The address generation circuit 206 includes multiplier 228 that performs the multiplication of the height traversal location 275 with the width, ifm_w, of the IFM 280 and adder 248 that performs the summation of the output of multiplier 228, col 6, lines 36-40)
	It would have been obvious for a person having ordinary skill in the art before the effective filling date to have modified the method of Ould-Ahmed-Vall to incorporate the teachings of Ghasemi for the benefit of multiply-and -accumulate (MAC) circuits of a CNN which reduces the area overhead and improves scalability (col. 4, lines 28-31) and is scalable for any CNN or application (Ghasemi, col 4, lines 15-16)
	Ritter teaches according to a lattice multiplication rule; (For a biological (or traditional) interpretation of negative weights, simply observe that these correspond to large positive weights in the equivalent multiplicative lattice 
    PNG
    media_image1.png
    34
    346
    media_image1.png
    Greyscale
 , pg. 284, right col, last para.)
	It would have been obvious for a person having ordinary skill in the art before the effective filling date to have modified the method of Modified Ould-Ahmed-Vall to incorporate the teachings of Ritter for the benefit of providing extremely fast neural computation and easy hardware implementation (Ritter, pg. 284, left col, first para.)

	Regarding claim 19, Modified Ould-Ahmed-Vall teaches the operating method according to claim 17, Ghasemi teaches wherein the performing of the boundary migration and the addition comprises: performing the boundary migration by rearranging the individual lattice values at boundary migration positions matched with the positions of the corresponding multiple signals; and generating a result value by adding the boundary migration values in a second direction. (the pixel iterator 102 provides an out-of-bounds signal 110 to a multiplexer 106 and the multiplexer 106 selects an input having a constant value (e.g., zero), which can be an out-of-bounds address, and provides the constant value to an application 112. If the traversal is in-bounds, the data value(s) (e.g., indexed pixels) stored in the buffer circuit 104 at the address(es) generated by the pixel iterator 102 is provided to an application 112, such as an array of MAC circuits, col 4, lines 43-51; As described above, the traversal order associated with the controller 300 causes the pixel iterator 102 to generate the addresses for all the elements of the IFM 500 for one convolution operation before generating the addresses for all the elements of the IFM 500 for another convolution operation. Accordingly, the arrow shown in FIG. 7A that represents the address generation pattern travels down a column before traveling down the next column. The address generation pattern associated with the controller 300 is element 0, element 1, element 4, element 5, element 1, element 2, element 5, …, FIG. 7B shows an exemplary address generation pattern of the pixel iterator 102 of FIG. 2 in response to the control signals generated by the controller 400 of FIG. 4. The arrow shown in FIG. 7B that represents the address generation pattern travels across a row before traveling across the next row. The address generation pattern associated with the controller 400 is element 0, element 1, element 2, element 4, element 5, element 6, element 8, element 9, element 10, col 9, lines 49-67, col 10, lines 1-3. The Examiner notes that pixel iterator is the boundary migrator that arranges the signal according to column (first direction) and row (second) directions) 
	It would have been obvious for a person having ordinary skill in the art before the effective filling date to have modified the method of Ould-Ahmed-Vall to incorporate the teachings of Ghasemi for the benefit of multiply-and -accumulate (MAC) circuits of a CNN which reduces the area overhead and improves scalability (col. 4, lines 28-31) and is scalable for any CNN or application (Ghasemi, col 4, lines 15-16)

27.	Claim 8 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Ould-Ahmed-Vall et al (US20180315159 filed 10/20/2017) in view of Ritter et al. ("Lattice algebra approach to single-neuron computation." IEEE Transactions on Neural Networks 14.2 (2003): 282-295.)

	Regarding claim 8, Ould-Ahmed-Vall teaches the accelerating apparatus according to claim 1, Ould-Ahmed-Vall teaches wherein the computation circuit comprises a second computation circuit (compute execution units (e.g., GPGPU core 337A-337B [0081]) comprising 
	a plurality of second multipliers configured to generate a result value by perform a computation on the input signal (in one embodiment the exponent block 1406 can include an additional multiplier 1436. The multiplier can be a fixed 8-bit multiplier to enable simultaneous dual 8-bit multiply operations using the exponent block 1406 and the significand block 1408 [0201])
	Ould-Ahmed-Vall does not explicitly teach according to a lattice multiplication rule;
	Ritter teaches according to a lattice multiplication rule; (For a biological (or traditional) interpretation of negative weights, simply observe that these correspond to large positive weights in the equivalent multiplicative lattice 
    PNG
    media_image1.png
    34
    346
    media_image1.png
    Greyscale
 , pg. 284, right col, last para.)
	The same motivation to combine dependent claim 4 applies here.

	Regarding claim 13, Ould-Ahmed-Vall teaches the accelerating apparatus according to claim 1, Ould-Ahmed-Vall does not explicitly teach wherein the computation circuit performs a multiplication on each of the multiple signals derived from the input signal, using any of lattice multiplication, Booth multiplication, Dadda multiplication and Wallace multiplication.
	Ritter teaches wherein the computation circuit performs a multiplication on each of the multiple signals derived from the input signal, using any of lattice multiplication, Booth multiplication, Dadda multiplication and Wallace multiplication. (Morphological neural networks use lattice operations

    PNG
    media_image2.png
    295
    418
    media_image2.png
    Greyscale
 

    PNG
    media_image3.png
    200
    400
    media_image3.png
    Greyscale

pg.283, right col, last para. to pg. 284, right col, first para.)
	The same motivation to combine dependent claim 4 applies here.

28. 	Claims 5, 6, 18 are rejected under 35 U.S.C. 103 as being unpatentable over Ould-Ahmed-Vall et al (US20180315159 filed 10/20/2017) in view of Lim et al (US20180082400) in view of Ghasemi et al (US11106968 filed 05/24/2018) in view of Ritter et al. ("Lattice algebra approach to single-neuron computation." IEEE Transactions on Neural Networks 14.2 (2003): 282-295.) and further in view Tawel (US6199057)

	Regarding claim 5, Modified Ould-Ahmed-Vall teaches the accelerating apparatus according to claim 4, Ghasemi teaches wherein the first multipliers perform a bit-wise AND operation on the multiple signals, (The address generation circuit 206 includes multiplier 228 that performs the multiplication of the height traversal location 275 with the width, ifm_w, of the IFM 280, col 6, lines 36-40; The outputs of the comparators 254, 256, 258, and 260 are input to AND gate 262, col 6, lines 58-60; The output 279 of the AND gate 262 can be analogous to the out-of-bounds signal 110 shown in FIG. 1., col 9, lines 1-2) and 
	generate individual lattice values for each of the multiple signals by performing a bit-wise addition on the respective multiple signals (and adder 248 that performs the summation of the output of multiplier 228 and the width traversal location 277, col 6, lines 39-40)
	Modified Ould-Ahmed-Vall does not explicitly teach to perform a carry update in a first direction
	Tawel teaches to perform a carry update in a first direction (The bit-serial adder is made up of a single full adder and a flip-flop to store the carry bit …, At each clock cycle, the accumulator sums the bit from the input data stream with both the contents of the last flip-flop on the chain as well as the carry bit, if any, generated from the last addition operation a clock cycle before. This value is subsequently stored into the first element of the chain, col 6, lines 61-67, col 7, lines 1-3)
	It would have been obvious for a person having ordinary skill in the art before the effective filling date to have modified the method of Modified Ould-Ahmed-Vall to incorporate the teachings of Tawel for the benefit of achieving maximum computational performance for the required task. (Tawel, col 2, lines 5-6)

	Regarding claim 6, Modified Ould-Ahmed-Vall teaches the accelerating apparatus according to claim 5, Ghasemi teaches wherein the boundary migrator performs the boundary migration by rearranging the individual lattice values at boundary migration positions matched with the positions of the corresponding multiple signals, and generates a result value by adding the boundary migration values in a second direction.
 (the pixel iterator 102 provides an out-of-bounds signal 110 to a multiplexer 106 and the multiplexer 106 selects an input having a constant value (e.g., zero), which can be an out-of-bounds address, and provides the constant value to an application 112. If the traversal is in-bounds, the data value(s) (e.g., indexed pixels) stored in the buffer circuit 104 at the address(es) generated by the pixel iterator 102 is provided to an application 112, such as an array of MAC circuits, col 4, lines 43-51; As described above, the traversal order associated with the controller 300 causes the pixel iterator 102 to generate the addresses for all the elements of the IFM 500 for one convolution operation before generating the addresses for all the elements of the IFM 500 for another convolution operation. Accordingly, the arrow shown in FIG. 7A that represents the address generation pattern travels down a column before traveling down the next column. The address generation pattern associated with the controller 300 is element 0, element 1, element 4, element 5, element 1, element 2, element 5, …, FIG. 7B shows an exemplary address generation pattern of the pixel iterator 102 of FIG. 2 in response to the control signals generated by the controller 400 of FIG. 4. The arrow shown in FIG. 7B that represents the address generation pattern travels across a row before traveling across the next row. The address generation pattern associated with the controller 400 is element 0, element 1, element 2, element 4, element 5, element 6, element 8, element 9, element 10, col 9, lines 49-67, col 10, lines 1-3. The Examiner notes that pixel iterator is the boundary migrator that arranges the signal according to their column and row positions) 
	The same motivation to combine independent claim 1 applies here.

	Regarding claim 18, Modified Ould-Ahmed-Vall teaches operating method according to claim 17, Ghasemi teaches wherein the performing of the computation on the multiple signals comprises: performing a bit-wise AND operation on the multiple signals; (The address generation circuit 206 includes multiplier 228 that performs the multiplication of the height traversal location 275 with the width, ifm_w, of the IFM 280 col 6, lines 36-40; The outputs of the comparators 254, 256, 258, and 260 are input to AND gate 262, col 6, lines 58-60; The output 279 of the AND gate 262 can be analogous to the out-of-bounds signal 110 shown in FIG. 1., col 9, lines 1-2) and
	generating individual lattice values for each of the multiple signals by performing a bit-wise addition on the respective multiple signals (and adder 248 that performs the summation of the output of multiplier 228 and the width traversal location 277, col 6, lines 39-40)
	Modified Ould-Ahmed-Vall does not explicitly teach to perform a carry update in a first direction 
	Tawel teaches to perform a carry update in a first direction (The bit-serial adder is made up of a single full adder and a flip-flop to store the carry bit …, At each clock cycle, the accumulator sums the bit from the input data stream with both the contents of the last flip-flop on the chain as well as the carry bit, if any, generated from the last addition operation a clock cycle before. This value is subsequently stored into the first element of the chain, col 6, lines 61-67, col 7, lines 1-3)
	It would have been obvious for a person having ordinary skill in the art before the effective filling date to have modified the method of Modified Ould-Ahmed-Vall to incorporate the teachings of Tawel for the benefit of achieving maximum computational performance for the required task. (Tawel, col 2, lines 5-6)
	
29. 	Claims 7 are rejected under 35 U.S.C. 103 as being unpatentable over Ould-Ahmed-Vall et al (US20180315159 filed 10/20/2017) in view of Lim et al (US20180082400) in view of Ghasemi et al (US11106968 filed 05/24/2018) in view of Ritter et al. ("Lattice algebra approach to single-neuron computation." IEEE Transactions on Neural Networks 14.2 (2003): 282-295.) in view Tawel (US6199057) and further in view of Deisher et al (US20180121796)

	Regarding claim 7, Modified Ould-Ahmed-Vall teaches the accelerating apparatus according to claim 6, Ould-Ahmed-Vall teaches wherein the first computation circuit (compute execution units (e.g., GPGPU core 336A-336B) [0081]) further comprises: 
	Modified Ould-Ahmed-Vall does not explicitly teach a first flip-flop configured to perform a retiming operation on the result value received from the boundary migrator; a first accumulator configured to accumulate an output value of the first flip-flop; and a second flip-flop configured to perform an retiming operation on an output value received from the first accumulator and output the retimed result value.
	Deisher teaches a first flip-flop configured to perform a retiming operation on the result value received from the boundary migrator; (the 16 bit weight or the scaled 8-bit weight that also now is 16 bits, is passed on to a flip-flop 416 that defines the end of the first stage of the NN data path (indicated by the “1st” above the flip-flop 416 [0086]; The flip-flops control the timing of the data path and provide clock synchronized flow through the data path [0087]) 
	a first accumulator configured to accumulate an output value of the first flip-flop; (The result is stored in the accumulator 510 between iterations [0093]; Multiple sums may be stored by an accumulator 510 to allow alternating execution among groups in an interleaved input array as described above [0089]) and 
	a second flip-flop configured to perform an retiming operation on an output value received from the first accumulator and output the retimed result value.( The input Agi may be passed to a flip-flop 418 that aligns with flip-flop 416 also to define the end of the first stage of the MAC data path. The flip-flops control the timing of the data path and provide clock synchronized flow through the data path [0087].The Examiner notes that flip-flop 418 is second flip-flop)
	It would have been obvious for a person having ordinary skill in the art before the effective filling date to have modified the method of Modified Ould-Ahmed-Vall to incorporate the teachings of Deisher for the benefit of a flexible, configurable accelerator that provides highly parallel integer math logic that substantially reduces latency, processor usage, and power consumption (Deisher [0032])

30.	Claims 9 are rejected under 35 U.S.C. 103 as being unpatentable over Ould-Ahmed-Vall et al (US20180315159 filed 10/20/2017) in view of Ritter et al. ("Lattice algebra approach to single-neuron computation." IEEE Transactions on Neural Networks 14.2 (2003): 282-295.) and further in view of Deisher et al (US20180121796)

	Regarding claim 9, Modified Ould-Ahmed-Vall teaches the accelerating apparatus according to claim 8, Ould-Ahmed-Vall teaches wherein the second computation circuit (compute execution units (e.g., GPGPU core 337A-337B [0081]) further comprises:
	Modified Ould-Ahmed-Vall does not explicitly teach a second accumulator configured to perform an addition on the result value; and a third flip-flop configured to perform a retiming operation on the result value from the second accumulator and output the retimed result value.
	Deisher teaches a second accumulator configured to perform an addition on the result value; and a third flip-flop configured to perform a retiming operation on the result value from the second accumulator and output the retimed result value. (In the second stage, the input Agi and the weight (whether scaled or not) are multiplied at multiplier 420 so that a weighted input is then passed to flip-flop 422 before entering the accumulator section 426 [0088]; The flip-flops control the timing of the data path and provide clock synchronized flow through the data path [0087]. The Examiner notes that flip-flop 422 is third flip-flop)
	The same motivation to combine dependent claim 7 applies here.
	
31.	Claims 10 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Ould-Ahmed-Vall et al (US20180315159 filed 10/20/2017) in view of Ghasemi et al (US11106968 filed 05/24/2018) in view of Ritter et al. ("Lattice algebra approach to single-neuron computation." IEEE Transactions on Neural Networks 14.2 (2003): 282-295.) and further in view of Deisher et al (US20180121796)

	Regarding claim 10, Ould-Ahmed-Vall teaches the accelerating apparatus according to claim 1, Ould-Ahmed-Vall teaches wherein the input signal comprises a first input signal (In some implementations, N-bit features can be processed at low precision [0221]) and 
	a second signal (and weights for a neural network can be processed at low precision [0221])
	
	Ould-Ahmed-Vall does not explicitly teach wherein the computation circuit comprises: a third multiplier configured to perform a lattice multiplication on the first and second input signals and output a first result value; an adder configured to perform the boundary migration and an addition on the first result value from the third multiplier to generate a second result value; and a fourth flip-flop configured to perform an retiming operation on the second result value and output the retimed second result value.
	Ghasemi teaches wherein the computation circuit comprises: a third multiplier (The address generation circuit 206 includes multiplier 228 that performs the multiplication of the height traversal location 275 with the width, ifm_w, of the IFM 280, col 6, lines 36-38) configured 
	an adder configured to perform the boundary migration and an addition on the first result value (At least one specific implementation includes a circuit, hereinafter referred to a pixel iterator, that uses limited hardware resources and includes, for example, one multiplier, six adders, col 3, lines 60-64; Vertical traversal of the two-dimensional IFM includes summation of a value stored in the height register 214 and a value stored in the height_offset register 222 via adder 224. The result of the summation is the height traversal location 275 of an element of the IFM and is stored in height padded register 226, col 5, lines 37-42)
	from the third multiplier to generate a second result value; (The address generation circuit 206 includes multiplier 228 that performs the multiplication of the height traversal location 275 with the width, ifm_w, of the IFM 280 and adder 248 that performs the summation of the output of multiplier 228 and the width traversal location 277. The result of the adder 248 is stored in register 250, col 6, lines 36-39) and
	Ritter teaches to perform a lattice multiplication on the first and second input signals and output a first result value; (Morphological neural networks use lattice operations

    PNG
    media_image2.png
    295
    418
    media_image2.png
    Greyscale
 

    PNG
    media_image3.png
    200
    400
    media_image3.png
    Greyscale

Pg.283, right col, last para. to pg. 284, right col, first para.)
	Deisher teaches a fourth flip-flop configured to perform an retiming operation on the second result value and output the retimed second result value. (flip-flops 430 and 432 are shown (there are actually 6 such flip flops for 48 logic blocks) to define the end of the 3rd stage and synchronize the logic flow. The accumulation continues for another adder operation, and then an adder 436 is provided before the sums up to this point are provided to a final adder 429 [0089]; The flip-flops control the timing of the data path and provide clock synchronized flow through the data path [0087]. The Examiner notes that flip-flops 430 is the fourth flip-flop)
	The same motivation to combine dependent claim 7 applies here.
	
	Regarding claim 12, Modified Ould-Ahmed-Vall teaches the accelerating apparatus according to claim 10, Ould-Ahmed-Vall teaches wherein the computation circuit (compute execution units (e.g., GPGPU core 336A-336B) [0081]) further comprises:
	Modified Ould-Ahmed-Vall does not explicitly teach a fifth flip-flop configured to transfer the first input signal to a first another computation circuit adjacent thereto; a sixth flip-flop configured to transfer the second input signal to a second another computation circuit adjacent thereto; a multiplexer configured to output any one of the second result value from the fourth flip-flop and a result value from the first another computation circuit; and a seventh flip-flop configured to output the result value from the multiplexer.
	Deisher teaches a fifth flip-flop configured to transfer the first input signal to a first another computation circuit adjacent thereto; a sixth flip-flop configured to transfer the second input signal to a second another computation circuit adjacent thereto; (flip-flops 430 and 432 are shown (there are actually 6 such flip flops for 48 logic blocks) to define the end of the 3rd stage and synchronize the logic flow. The accumulation continues for another adder operation, and then an adder 436 is provided before the sums up to this point are provided to a final adder 429. The weighted input sum output (or sum output) from adder 429 is provided to a flip-flop 434 as the end of the 4th stage [0089]. The Examiner notes that fifth flip-flop is flip-flop 432 and sixth flip-flop is flip-flop 434)
	a multiplexer configured to output any one of the second result value (the NN sum (referred to as a Tmp_sum) may bypass the activation function 504 and may be provided through an multiplexer 516) 
	from the fourth flip-flop (flip-flops 430 [0087]) and
	a result value from the first another computation circuit; (the final weighted input (sum output) of the MAC circuit 319 may be described as being the result of an affine transform [0077]) and 
	a seventh flip-flop configured to output the result value from the multiplexer. (The resulting 16-bit Zj output value from the activation function unit may be provided through mux 516, mux 518, and flip-flop 520 [0097]. The Examiner notes that the seventh flip-flop is flip-flop 520)
	The same motivation to combine dependent claim 7 applies here.

32.	Claim 11 are rejected under 35 U.S.C. 103 as being unpatentable over Ould-Ahmed-Vall et al (US20180315159 filed 10/20/2017) in view of Ghasemi et al (US11106968 filed 05/24/2018)

	Regarding claim 11, Ould-Ahmed-Vall teaches the accelerating apparatus according to claim 10, Ould-Ahmed-Vall teaches wherein the adder performs a counting function controls computation logic for the first and second input signals (For a fused operation (e.g., multiply-add, multiply-subtract) the product of the operation can be added to a third input via an adder and/or stored in an accumulator register [0196]; The neurons compute a dot product between the weights of the neurons and the region in the local input to which the neurons are connected [0161])
	to be repeatedly performed a set number of times (The training process occurs repeatedly as the weights of the network are adjusted to refine the output generated by the neural network [0169])
	Ould-Ahmed-Vall does not explicitly teach adder performs a counting function
	Ghasemi also teaches the adder performs a counting function (incrementing an OFM height counter value by a first adder circuit, claim 15; The controller 300 includes counters for the parameters of the expansion of an input feature map IFM, col 7, lines 6-7) 
	It would have been obvious for a person having ordinary skill in the art before the effective filling date to have modified the method of Ould-Ahmed-Vall to incorporate the teachings of Ghasemi for the benefit of multiply-and -accumulate (MAC) circuits of a CNN which reduces the area overhead and improves scalability (col. 4, lines 28-31) and is scalable for any CNN or application (Ghasemi, col 4, lines 15-16)

33.	Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Jampani et al (US20190147302 filed 05/22/2018) in view of Ould-Ahmed-Vall et al (US20180315159 filed 10/20/2017)

	Regarding claim 22, Jampani teaches an apparatus for a neural network (FIG. 6 is a block diagram of a computing system 600 within which the GPU or method introduced herein may be embodied or carried out [0012]; An artificial neural network architecture for machine vision is disclosed that directly operates on a point cloud represented as a sparse set of samples in a high-dimensional lattice [0005]) comprising:
	an input processor suitable for receiving an input signal (The CPU 602 receives user input from the input devices 608, executes programming instructions [0074]; The network 500 utilizes convolutions on sparse lattices while receiving unordered point clouds as input. The use of BCLs in the network 500 enables easy specification of lattice spaces via input lattice features and also the lattice scale via a scaling matrix [0062]; Each of the BCLs operate on a 3D lattice (s=3) constructed using 3D positional features at input points, Lin=Lout ∈ Rn×3  [0059]) 
	corresponding to an n×n lattice, (The lattice space features may be N×L, where L are lattice dimensions, such as <x,y,z>.[0039]) and 
	processing the input signal to generate multiple signals respectively corresponding to (n/2) × (n/2) sub lattices of the n×n lattice; (Subsequent lattice scales are determined by dividing the previous lattice scale by a factor of 2 (λt=λt−1/2) until the lattice scale is 2 for Tth BCL. In other words, the 3D network 504 with T BCLs use the following lattice feature scales: (λ0, λ0/2, . . . , λ0/2T−1) [0059]) and
	a computation circuit suitable for performing a lattice multiplication on each of the multiple signals, (the computing device 700 will typically include operating system logic [0094]; The input points may be n×3 in shape. The input transform 106 utilizes a T-Net (transform network) and a matrix multiplier to transform the input, … The feature transform 110 may have a T-Net and a matrix multiplier [0024]) and
	 Jampani does not explicitly teach performing migration on multiplication results thereof to generate a multiplication result corresponding to the input signal.
	Ould-Ahmed-Vall teaches performing migration on multiplication results thereof to generate a multiplication result corresponding to the input signal. (For example, the multiplier 1506 can perform … a 16-bit multiply operation for a 16-bit integer operation. The multiplier 1506 can also perform an 8-bit multiply operation for an INT8 integer value [0198]; For a fused operation (e.g., multiply-add, multiply-subtract) the product of the operation can be added to a third input via an adder and/or stored in an accumulator register [0196])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Jampani to incorporate the method of Ould-Ahmed-Vall for the benefit of using dedicated circuitry/logic for efficiently processing the commands/instructions. (Ould-Ahmed-Vall [0042])

Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORIAM MOSUNMOLA GODO whose telephone number is (571)272-8670. The examiner can normally be reached Monday-Friday 7:30am-5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/M.G./Examiner, Art Unit 2121          


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121