DETAILED ACTION
This action is in response to the claims filed 03 March, 2018. Claims 1-18 are pending and have been examined.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The disclosure is objected to because of the following minor informalities:
¶124 two occurrences of   “out” 1 through out” n” does not have any particular meaning in the art. Examiner interprets this to mean “of quantity at least equal to 1”.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claim 1,10 and 18 is rejected under 35 U.S.C. 103 as being unpatentable  over Ji et al. “Reducing Weight Precision of Convolutional Neural Networks Towards Large-Scale on-Chip Image Recognition,” hereinafter Ji. In view of Lin et al. “PredicitveNet: and Energy efficient Convolutional Neural Network via Zero Prediction”, hereinafter Lin.

Regarding Claim 1
Ji teaches An electronic apparatus comprising: a first memory (Introduction, “fast and compact chip” Ji teaches a chip that implements the operations described the chip necessarily includes a memory) configured to store a first artificial intelligence (AI) model comprising a plurality of first elements, each comprising a plurality of bits (Section 3.1, “On the server cloud, we first train Convolutional Neural Network [artificial intelligence model] with full bit resolution” Examiner notes a CNN is a species of an AI model. Further, a neural network consists of a plurality of elements such as nodes and weights represented by bits) and a processor comprising a second memory that is configured to store a second AI model comprising a plurality of second elements ( Section 3.1 “Then we perform a new quantization method…After that, the network configuration and quantized filters[ second AI model] are loaded to the client hardware device [processor]”, The Examiner notes that the quantized model is considered a new AI model ) and the processor is configured to acquire output data from input data based on the second AI model, (Section 3.1 “To this end, the convolutional neural network with reduced weight precision… receiving quantized image input… feeding final low-bit feature representation to classification model for recognition output.”) wherein the first AI model is trained through an AI algorithm (Section 3.1 “On the server cloud, we first train Convolutional Neural Network with full bit resolution… using back propagation”)
Ji does not appear to explicitly teach, wherein each of the plurality of second elements comprises at least one higher bit of the plurality of bits of a respective element from among the plurality of first elements.
However, Lin when addressing issues related to efficient computation of neural networks teaches, wherein each of the plurality of second elements comprises at least one higher bit of the plurality of bits of a respective element from among the plurality of first elements. (Introduction ¶002 “PredictiveNet first evaluates the most significant bit (MSB) part of the convolution to predict whether the nonlinear layer output corresponding to the current convolution is zero, and then decides if the remaining least significant (LSB) part’s computation can be skipped or not” The examiner notes that PredictiveNet implemented by the “second AI model” would eliminate LSBs and keep MSBs which corresponds to “at least one higher bit.”
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to implement the sparse computation of a neural network by optionally ignoring least significant bits as taught by Lin to the disclosed invention of Ji.
One of ordinary skill in the arts would have been motivated to make this modification in order to implement a neural network with reduced complexity in order to reduce the computational cost without degrading accuracy (Lin Conclusion)

Regarding Claim 10
Ji teaches A method of controlling an electronic apparatus, the method comprising: storing in a first memory (Introduction, “fast and compact chip” Ji teaches a chip that implements the operations described the chip necessarily includes a memory) a first artificial intelligence (Al) model comprising a plurality of first elements, wherein each of the plurality of elements comprises a plurality of bits (Section 3.1, “On the server cloud, we first train Convolutional Neural Network [artificial intelligence model] with full bit resolution” Examiner notes a CNN is a species of an AI model. Further, a neural network consists of a plurality of elements such as nodes and weights represented by bits) storing, in a second memory of a processor, a second Al model comprising a plurality of second elements ( Section 3.1 “Then we perform a new quantization method…After that, the network configuration and quantized filters[ second AI model] are loaded to the client hardware device”, The Examiner notes that the quantized model is considered a new AI model ) acquiring, by the processor, output data from input data based on the second Al model (Section 3.1 “To this end, the convolutional neural network with reduced weight precision… receiving quantized image input… feeding final low-bit feature representation to classification model for recognition output.”) wherein the first Al model is trained through an Al algorithm (Section 3.1 “On the server cloud, we first train Convolutional Neural Network with full bit resolution… using back propagation”)
Ji does not appear to explicitly teach, wherein the first Al model is trained through an Al algorithm, and wherein each of the plurality of second elements comprises at least one higher bit of the plurality of bits comprised in a respective element from among the plurality of first elements.
However, Lin when addressing issues related to efficient computation of neural networks teaches, wherein the first Al model is trained through an Al algorithm, and wherein each of the plurality of second elements comprises at least one higher bit of the plurality of bits comprised in a respective element from among the plurality of first elements. (Introduction ¶002 “PredictiveNet first evaluates the most significant bit (MSB) part of the convolution to predict whether the nonlinear layer output corresponding to the current convolution is zero, and then decides if the remaining least significant (LSB) part’s computation can be skipped or not” The examiner notes that PredictiveNet implemented by the “second AI model” would eliminate LSBs and keep MSBs which corresponds to “at least one higher bit.”
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to implement the sparse computation of a neural network by optionally ignoring least significant bits as taught by Lin to the disclosed invention of Ji.
One of ordinary skill in the arts would have been motivated to make this modification in order to implement a neural network with reduced complexity in order to reduce the computational cost without degrading accuracy (Lin Conclusion)

Regarding Claim 18
Ji teaches A non-transitory computer readable medium storing computer-readable instructions for performing an operation method of an electronic apparatus, the operation method comprising: storing, in a first memory (Introduction, “fast and compact chip” Ji teaches a chip that implements the operations described the chip necessarily includes a memory) a first artificial intelligence (AI) model comprising a plurality of first elements, wherein each of the plurality of first elements comprises a plurality of bits (Section 3.1, “On the server cloud, we first train Convolutional Neural Network [artificial intelligence model] with full bit resolution” Examiner notes a CNN is a species of an AI model. Further, a neural network consists of a plurality of elements such as nodes and weights represented by bits) storing, in a second memory of a processor of the electronic apparatus, a second AI model comprising a plurality of second elements ( Section 3.1 “Then we perform a new quantization method…After that, the network configuration and quantized filters[ second AI model] are loaded to the client hardware device”, The Examiner notes that the quantized model is considered a new AI model ) acquiring output data from input data based on the second AI model (Section 3.1 “To this end, the convolutional neural network with reduced weight precision… receiving quantized image input… feeding final low-bit feature representation to classification model for recognition output.”) wherein the first AI model is trained through an AI algorithm (Section 3.1 “On the server cloud, we first train Convolutional Neural Network with full bit resolution… using back propagation”)
Ji does not appear to explicitly teach, wherein each of the plurality of second elements comprises at least one higher bit of the plurality of bits of a respective element from among the plurality of first elements
However, Lin when addressing issues related to efficient computation of neural networks teaches, wherein each of the plurality of second elements comprises at least one higher bit of the plurality of bits of a respective element from among the plurality of first elements. (Introduction ¶002 “PredictiveNet first evaluates the most significant bit (MSB) part of the convolution to predict whether the nonlinear layer output corresponding to the current convolution is zero, and then decides if the remaining least significant (LSB) part’s computation can be skipped or not” The examiner notes that PredictiveNet implemented by the “second AI model” would eliminate LSBs and keep MSBs which corresponds to “at least one higher bit.”
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to implement the sparse computation of a neural network by optionally ignoring least significant bits as taught by Lin to the disclosed invention of Ji.
One of ordinary skill in the arts would have been motivated to make this modification in order to implement a neural network with reduced complexity in order to reduce the computational cost without degrading accuracy (Lin Conclusion)

Claim 2-5, 7, 11-14 and 16 is rejected under 35 U.S.C. 103 as being unpatentable  over Ji/Lin in view of Malaya et al. US Publication Number US20190171420A1, hereinafter Malaya.
 
Regarding Claim 2
	Ji/Lin teaches the machine of claim 1.
Ji further teaches wherein the processor is further configured to acquire a plurality of first output elements by applying the input data to the second AI model (Section 3.1, “To this end, the convolutional neural network with reduced weight precision… receiving quantized image input… feeding final low-bit feature representation to classification model for recognition output”)
Ji/Lin does not appear to explicitly teach is further configured to acquire one of the plurality of first output elements as the output data based on sizes of the plurality of first output elements.
However, Malaya, when addressing issues related to implementing AI models on hardware such as FPGAs, teaches is further configured to acquire one of the plurality of first output elements as the output data (¶0017 “the computational units 210 and 220 generate multiple outputs [plurality of first output elements] 203(i) over multiple iterations i”) based on sizes of the plurality of first output elements. (¶0020 “the adjustment logic 230[output controller] also determines the next number representations 202(i) based on the power consumption 205(i) for calculating the output 203(i), since increased precision of the calculation corresponds to increased power consumption” The examiner notes that selecting output based on precision corresponds to “sizes”)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate adjustment logic that controls the output based on the output precision or size as taught by Malaya to the disclosed invention of Ji/Lin.
One of ordinary skill in the arts would have been motivated to make this modification in order to implement the described adjustment logic to change “the number representations 202(i) until a target power consumption is reached” or change “precision of the calculations performed by the computational units 210 and 220 to optimize for speed of execution” (Malaya ¶0020)

Regarding Claim 3
	Ji/Lin/Malaya teaches the machine of claim 2.
	Malaya further teaches, based on a difference (¶0021 “the adjustment logic 230 may compare subsequent output values” The examiner interprets difference to be a comparison between values by means of subtraction (eg. A-B>0 or A-B <0) between said one of the first output elements having a first largest size and another first output element having a second largest size (¶0021 “the adjustment logic 230 may compare subsequent output values (e.g., 203(1) and 203(2)) and based on the comparison, adjust the next number representations 202(2)” The examiner notes that a comparison between only two subsequent values will naturally include a first and second “largest size”) being larger than or equal to a preset size the processor is further configured to acquire said one of the first output elements having the first largest size as the output data. (¶ 0020 “Accordingly, the adjustment logic 230 can increase or decrease the precision of the calculation in computational units 210 and 220 by changing the number representations 202(i) until a target power consumption is reached” The examiner notes that increasing the precision to reach a power consumption corresponds to “being larger than or equal to a preset size”)

Regarding Claim 4
Ji/Lin/Malaya teaches the machine of claim 2.
	Malaya further teaches, wherein the second memory is further configured to store a third AI model comprising a plurality of third elements (¶0017 “The components illustrated in FIG. 2 can represent circuitry contained within one FPGA device, or may be distributed over multiple FPGA devices” The examiner notes that a third model could be implemented by an additional FPGA that shares the memory of another FPGA, or two models can be implemented by a single FPGA with a multitude of computational units.) wherein, based on the difference between the first output elements, being smaller than the preset size, (¶0019 “comparing [ by taking the difference] the output 203(1) to a reference value 204, and based on this determination, outputs the next number representations) the processor is further configured to store the plurality of first output elements in the second memory, . (¶ 0023 “In general, the LUT 130 can be used to store any information that can be used for determining which number representations” The examiner notes that the LUT corresponds to a memory) acquire a plurality of first middle elements by applying the input data to the third AI model (¶0017 “the set of computational units 210 and 220 [AI model] generates an output 203(i) based on input 201(i)” The examiner notes that the output 203(i) includes a plurality of outputs) acquire a plurality of second output elements based on the first output elements and the first middle elements (¶0017 “the set of computational units 210 and 220 [AI model] generates an output 203(i) based on input 201(i)”) and acquire one of the plurality of second output elements as the output data (¶0017 “the computational units 210 and 220 generate multiple outputs [plurality of first output elements] 203(i) over multiple iterations i”) based on sizes of the plurality of second output elements (¶0020 “the adjustment logic 230[output controller] also determines the next number representations 202(i) based on the power consumption 205(i) for calculating the output 203(i), since increased precision of the calculation corresponds to increased power consumption” The examiner notes that selecting output based on precision corresponds to “sizes”) and wherein each of the plurality of third elements comprises at least one higher bit of the plurality of bits in each of the plurality of first elements except for the at least one bit comprised in each of the plurality of second elements. (¶0013 “the reconfigurable nature of field programmable gate array (FPGA) devices in a computing system allows the system to support a wide range of numerical precisions and to dynamically vary the precision for key computations at run time.” The examiner notes that a configuration that “at least one higher bit of the plurality of bits in each of the plurality of first elements” in disclosing a computing system that supports a wide range of numerical precision the reference teaches a configuration that includes the claimed feature. 

Regarding Claim 5
Ji/Lin/Malaya teaches the machine of claim 4.
	Malaya further teaches, based on a difference (¶0021 “the adjustment logic 230 may compare subsequent output values” ) between said one of the plurality of second output elements having a first largest size and another second output element from the plurality of second output elements having a second largest size (¶0021 “the adjustment logic 230 may compare subsequent output values (e.g., 203(1) and 203(2)) and based on the comparison, adjust the next number representations 202(2)” The examiner notes that a comparison between only two subsequent values will naturally include a first and second “largest size”) being larger than or equal to the preset size, the processor is further configured to acquire said one second output element having the first largest size as the output data. (¶ 0020 “Accordingly, the adjustment logic 230 can increase or decrease the precision of the calculation in computational units 210 and 220 by changing the number representations 202(i) until a target power consumption is reached” The examiner notes that increasing the precision to reach a power consumption corresponds to “being larger than or equal to a preset size”)

Regarding Claim 7
Ji/Lin/Malaya teaches the machine of claim 5.
Malaya further teaches, wherein the processor is further configured to acquire one of the plurality of second output elements as the output data based on at least one selected from among a size, a gradient, a moving average, and softmax of each of the plurality of second output elements. (¶0020 “the adjustment logic 230[output controller] also determines the next number representations 202(i) based on the power consumption 205(i) for calculating the output 203(i), since increased precision of the calculation corresponds to increased power consumption” The examiner notes that selecting output based on precision corresponds to “sizes”)

Regarding Claim 11
	Ji/Lin teaches the machine of claim 10.
Ji further teaches the acquiring comprises: acquiring a plurality of first output elements by applying the input data to the second Al model (Section 3.1, “To this end, the convolutional neural network with reduced weight precision… receiving quantized image input… feeding final low-bit feature representation to classification model for recognition output”)
Ji/Lin does not appear to explicitly teach acquiring one of the plurality of first output elements as the output data based on sizes of the plurality of first output elements.
However, Malaya, when addressing issues related to implementing AI models on hardware such as FPGAs, teaches acquiring one of the plurality of first output elements as the output data (¶0017 “the computational units 210 and 220 generate multiple outputs [plurality of first output elements] 203(i) over multiple iterations i”) based on sizes of the plurality of first output elements. (¶0020 “the adjustment logic 230[output controller] also determines the next number representations 202(i) based on the power consumption 205(i) for calculating the output 203(i), since increased precision of the calculation corresponds to increased power consumption” The examiner notes that selecting output based on precision corresponds to “sizes”)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate adjustment logic that controls the output based on the output precision or size as taught by Malaya to the disclosed invention of Ji/Lin.
One of ordinary skill in the arts would have been motivated to make this modification in order to implement the described adjustment logic to change “the number representations 202(i) until a target power consumption is reached” or change “precision of the calculations performed by the computational units 210 and 220 to optimize for speed of execution” (Malaya ¶0020)

Regarding Claim 12
	Ji/Lin/Malaya teaches the machine of claim 11.
	Malaya further teaches, the acquiring the one of the plurality of first output elements as the output data comprises: based on a difference (¶0021 “the adjustment logic 230 may compare subsequent output values” The examiner interprets difference to be a comparison between values by means of subtraction (eg. A-B>0 or A-B <0) between said one first output element having a first largest size and another first output element from among the plurality of output elements having a second largest size (¶0021 “the adjustment logic 230 may compare subsequent output values (e.g., 203(1) and 203(2)) and based on the comparison, adjust the next number representations 202(2)” The examiner notes that a comparison between only two subsequent values will naturally include a first and second “largest size”) being larger than or equal to a preset size, acquiring said one first output element having the first largest size as the output data. (¶ 0020 “Accordingly, the adjustment logic 230 can increase or decrease the precision of the calculation in computational units 210 and 220 by changing the number representations 202(i) until a target power consumption is reached” The examiner notes that increasing the precision to reach a power consumption corresponds to “being larger than or equal to a preset size”)

Regarding Claim 13
Ji/Lin/Malaya teaches the machine of claim 12.
	Malaya further teaches, storing, in the second memory, a third Al model comprising a plurality of third elements (¶0017 “The components illustrated in FIG. 2 can represent circuitry contained within one FPGA device, or may be distributed over multiple FPGA devices” The examiner notes that a third model could be implemented by an additional FPGA that shares the memory of another FPGA, or two models can be implemented by a single FPGA with a multitude of computational units.) based on the difference between the first output elements, being smaller than the preset size (¶0019 “comparing [ by taking the difference] the output 203(1) to a reference value 204, and based on this determination, outputs the next number representations) storing, in the second memory, the plurality of first output elements (¶ 0023 “In general, the LUT 130 can be used to store any information that can be used for determining which number representations” The examiner notes that the LUT corresponds to a memory) acquiring, by the processor, a plurality of first middle elements by applying the input data to the third Al model (¶0017 “the set of computational units 210 and 220 [AI model] generates an output 203(i) based on input 201(i)” The examiner notes that the output 203(i) includes a plurality of outputs) acquiring, by the processor, a plurality of second output elements based on the first output elements and the first middle elements (¶0017 “the set of computational units 210 and 220 [AI model] generates an output 203(i) based on input 201(i)”) acquiring one of the plurality of second output elements as the output data (¶0017 “the computational units 210 and 220 generate multiple outputs [plurality of first output elements] 203(i) over multiple iterations i”) based on sizes of the plurality of second output elements (¶0020 “the adjustment logic 230[output controller] also determines the next number representations 202(i) based on the power consumption 205(i) for calculating the output 203(i), since increased precision of the calculation corresponds to increased power consumption” The examiner notes that selecting output based on precision corresponds to “sizes”) wherein each of the plurality of third elements comprises at least one higher bit of the plurality of bits in the respective element from among the plurality of first elements, except for at least one bit of a respective second element from among the plurality of second elements (¶0013 “the reconfigurable nature of field programmable gate array (FPGA) devices in a computing system allows the system to support a wide range of numerical precisions and to dynamically vary the precision for key computations at run time.” The examiner notes that a configuration that “at least one higher bit of the plurality of bits in each of the plurality of first elements” in disclosing a computing system that supports a wide range of numerical precision the reference teaches a configuration that includes the claimed feature.

Regarding Claim 14
Ji/Lin/Malaya teaches the machine of claim 13.
	Malaya further teaches, wherein the acquiring said one of the plurality of second output elements as the output data comprises: based on a difference (¶0021 “the adjustment logic 230 may compare subsequent output values” ) between said one of the plurality of second output elements having a first largest size and other second output element from among the plurality of second output elements having a second largest size (¶0021 “the adjustment logic 230 may compare subsequent output values (e.g., 203(1) and 203(2)) and based on the comparison, adjust the next number representations 202(2)” The examiner notes that a comparison between only two subsequent values will naturally include a first and second “largest size”) being larger than or equal to the preset size, acquiring said one of the plurality of second output elements having the first largest size as the output data. (¶ 0020 “Accordingly, the adjustment logic 230 can increase or decrease the precision of the calculation in computational units 210 and 220 by changing the number representations 202(i) until a target power consumption is reached” The examiner notes that increasing the precision to reach a power consumption corresponds to “being larger than or equal to a preset size”)

Regarding Claim 16
Ji/Lin/Malaya teaches the machine of claim 13.
Malaya further teaches, wherein the acquiring the one of the plurality of second output elements as the output data comprises acquiring the one of the plurality of second output elements as the output data based on at least one selected from among a size, a gradient, a moving average, and softmax of each of the plurality of second output elements. (¶0020 “the adjustment logic 230[output controller] also determines the next number representations 202(i) based on the power consumption 205(i) for calculating the output 203(i), since increased precision of the calculation corresponds to increased power consumption” The examiner notes that selecting output based on precision corresponds to “sizes”)

Claim 6, 15 is rejected under 35 U.S.C. 103 as being unpatentable over Ji/Lin/Malaya in view of Tann et al. “Flexible Deep Neural Network Processing” hereinafter Tann.

Regarding Claim 6
Ji/Lin/Malaya teaches the machine of claim 5.
	Malaya further teaches, based on a difference (¶0021 “the adjustment logic 230 may compare subsequent output values” The examiner interprets difference to be a comparison between values by means of subtraction (eg. A-B>0 or A-B <0) between the second output elements, being smaller than the preset size, the second memory is configured to store the plurality of second output elements, (¶ 0020 Accordingly, the adjustment logic 230 can increase or decrease the precision of the calculation in computational units 210 and 220 by changing the number representations 202(i) until a target power consumption is reached. In one embodiment, this adjustment can be constrained by a minimum acceptable accuracy, so that the power consumption can be decreased as long as the minimum level of accuracy can be maintained” The examiner notes that if the desired target power and accuracy is reached the adjustment logic may not change the number representation) acquire a plurality of second middle elements by performing an arithmetic operation on the acquired other first elements and the input data,. (¶0032 “each neuron receives inputs [input data and first elements] via connections [through arithmetic operations] at its left side and transmits output signals [a plurality of second middle elements] from its right side.”) acquire a plurality of third output elements based on the second output elements and the second middle elements (¶0032 “each neuron receives inputs [second output elements and the second middle elements] via connections at its left side and transmits output signals [a plurality of third output elements] from its right side.”) acquire the third output element having a largest size among the plurality of third output elements as the output data. (¶0021 “the adjustment logic 230 may compare subsequent output values (e.g., 203(1) and 203(2)) and based on the comparison, adjust the next number representations 202(2)” The examiner notes that a enables one to select the output in a given set of output values with the “largest size” or largest “number representation”)
	Ji/Lin/Malaya does not appear to explicitly teach, the processor is further configured to acquire others of the plurality of first elements, excluding the plurality of second elements and the plurality of third elements, from the first memory.
However, Tann, when addressing issues related to efficient neural network inference teaches, the processor is further configured to acquire others of the plurality of first elements, excluding the plurality of second elements and the plurality of third elements, from the first memory(Section 3.1 ¶001 “After independently training multiple DNNs [first, second, and third elements] of the same architecture” The examiner notes the original network on the first memory is a collection of DNNs, because they are independently trained they would each have unique “elements” 
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate the technique to split a network structure into multiple DNNs for flexible inference, taught by Tann to the disclosed invention of Ji/Lin/Malaya.
One of ordinary skill in the arts would have been motivated to make this modification in order to realize “a flexible execution methodology that lessens DNN ensemble computation and latency overheads while still maintaining much of the inference accuracy “(Section 4.2 Results and Discussions, Tann)

Regarding Claim 15
Ji/Lin/Malaya teaches the machine of claim 14.
	Malaya further teaches, based on a difference (¶0021 “the adjustment logic 230 may compare subsequent output values” The examiner interprets difference to be a comparison between values by means of subtraction (eg. A-B>0 or A-B <0) between the second output elements, being smaller than the preset size, storing, in the second memory, the plurality of second output elements (¶ 0020 Accordingly, the adjustment logic 230 can increase or decrease the precision of the calculation in computational units 210 and 220 by changing the number representations 202(i) until a target power consumption is reached. In one embodiment, this adjustment can be constrained by a minimum acceptable accuracy, so that the power consumption can be decreased as long as the minimum level of accuracy can be maintained” The examiner notes that if the desired target power and accuracy is reached the adjustment logic may not change the number representation) acquiring a plurality of second middle elements by performing an arithmetic operation on the acquired other first elements and the input data  (¶0032 “each neuron receives inputs [input data and first elements] via connections [through arithmetic operations] at its left side and transmits output signals [a plurality of second middle elements] from its right side.”) acquiring a plurality of third output elements based on the second output elements and the second middle elements (¶0032 “each neuron receives inputs [second output elements and the second middle elements] via connections at its left side and transmits output signals [a plurality of third output elements] from its right side.”) acquiring the third output element having a first largest size among the plurality of third output elements as the output data. (¶0021 “the adjustment logic 230 may compare subsequent output values (e.g., 203(1) and 203(2)) and based on the comparison, adjust the next number representations 202(2)” The examiner notes that a enables one to select the output in a given set of output values with the “largest size” or largest “number representation”)
	Ji/Lin/Malaya does not appear to explicitly teach, the processor is further configured to acquire others of the plurality of first elements, excluding the plurality of second elements and the plurality of third elements, from the first memory.
However, Tann, when addressing issues related to efficient neural network inference teaches, acquiring, from the first memory, others of the plurality of first elements, excluding the plurality of second elements and the plurality of third elements (Section 3.1 ¶001 “After independently training multiple DNNs [first, second, and third elements] of the same architecture” The examiner notes the original network on the first memory is a collection of DNNs, because they are independently trained they would each have unique “elements” 
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate the technique to split a network structure into multiple DNNs for flexible inference, taught by Tann to the disclosed invention of Ji/Lin/Malaya.
One of ordinary skill in the arts would have been motivated to make this modification in order to realize “a flexible execution methodology that lessens DNN ensemble computation and latency overheads while still maintaining much of the inference accuracy “(Section 4.2 Results and Discussions, Tann)

Claim 8-9, and 17 is rejected under 35 U.S.C. 103 as being unpatentable over Ji/Lin in view of Bong et al. “A Low-Power Convolutional Neural Network Face Recognition Processor and a CIS Integrated with Always-on Face Detector,” hereinafter Bong.

Regarding Claim 8
	Ji/Lin teaches the machine of claim 1.
Ji/Lin does not appear to explicitly teach the processor comprises at least one multiplier, configured to apply the input data to the second AI model.
However, Malaya, when addressing issues low power neural network processing, teaches the processor comprises at least one multiplier, (Section B ¶3 “With weights represented by the floating point, which is composed of 1-bit sign, 4-bit exponent, and 3-bit mantissa, the MAC unit is implemented by using a shifter and adders instead of a multiplier”) configured to apply the input data to the second AI model. (Fig 2 & 3, the MAC unit is implemented in a CNN which includes an input image)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate efficient neural network processor that uses a multiplier as taught by Bong to the disclosed invention of Ji/Lin.
One of ordinary skill in the arts would have been motivated to make this modification in order to implement an efficient neural network processor to improve the energy efficiency and reduce the workload of neural network layers to lower the power consumption. (Bong Conclusion)

Regarding Claim 9
Ji/Lin/Bong teaches the machine of claim 8.
Bong further teaches, a shifter configured to receive one of a plurality of elements comprised in the input data, (Section B ¶02 “convolutional unit is updated by loading a row of an input feature map [input data]” The shifter is shown in Fig 9.) shift and output the received element according to a cycle (Fig 9. Depicts a shifter, which necessarily operates according to a cycle as computer harware is known to operate based on a clock cycle) a First-In First-Out (FIFO) memory (Section B ¶01 “Fig. 8 shows the block diagram of the CNNP PE. The PE consists of local SRAM, a weight buffer [FIFO]”)  configured to receive the second element of the plurality of second elements corresponding to the element input into the shift and output at least one bit of the received second element according to the cycle (Fig 9. The weight comes from a buffer and is input into the shifter which “output at least one bit of the received second element according to the cycle”) an accumulator configured to receive and accumulate a result of an arithmetic operation performed on the received element output from the shifter and the second element output from the FIFO memory according to the cycle. (Section B ¶02 “the output partial sums are accumulated in the accumulation registers at the same column” Fig. 9, It can be noted from the figure that the “output partial sum” is a result of the weight buffer and the shifter)

Regarding Claim 17
	Ji/Lin teaches the machine of claim 10.
Ji/Lin does not appear to explicitly teach wherein the acquiring comprises applying the input data to the second Al model through at least one multiplying operation. 
However, Malaya, when addressing issues low power neural network processing, teaches through at least one multiplying operation. (Section B ¶3 “With weights represented by the floating point, which is composed of 1-bit sign, 4-bit exponent, and 3-bit mantissa, the MAC unit is implemented by using a shifter and adders instead of a multiplier”) wherein the acquiring comprises applying the input data to the second Al model (Fig 2 & 3, the MAC unit is implemented in a CNN which includes an input image)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate efficient neural network processor that uses a multiplier as taught by Bong to the disclosed invention of Ji/Lin.
One of ordinary skill in the arts would have been motivated to make this modification in order to implement an efficient neural network processor to improve the energy efficiency and reduce the workload of neural network layers to lower the power consumption. (Bong Conclusion)


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Albericio et al. “Bit-pragmatic Deep Neural Network Computing,” teaches a method for efficient neural network calculation that is optimized for sparse matrixes.
Teerapittayanon et al. “BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks” teaches a neural network with optional early exit.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNATHAN R GERMICK whose telephone number is (571)272-8363. The examiner can normally be reached on Monday-Friday 7:30 am – 5:00 pm (EST).
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amir Mehramanesh, can be reached at telephone number 5712703351. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions about access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
	
/J.R.G./Examiner, Art Unit 4172                                                                                                                                                                                                        
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122