DETAILED ACTION
This present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner notes the entry of the following papers:
Amended claims filed 11/30/2021.
Applicant arguments/remarks made in amendment filed 11/30/2021. 

A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 11/30/2021 has been entered.

Claims 1-4, 6, 8-11, 13, 15-18, and 20 amended. 
Claims 1-20 are presented for examination.
Response to Arguments
Applicant’s arguments have been fully considered but they are not persuasive. Each is presented below.
a)	Applicant argues that independent claim 1 as amended “recites the feature of ‘a neural task manager circuit coupled to the data reader circuit and the data buffer, the neural task manager circuit configured to send task information to the first and second rasterizer circuits to program the first and second rasterizer circuits according to a configuration of the input data, the task information indicating at least how the input data is segmented into the work units.’ Shafiee fails to disclose, teach or suggest this feature.”(Remarks, p. 10, par. 6.)  The argument is moot in view of new grounds of rejection.  See below for detailed rejection. 
b)	Applicant argues that “nowhere in Shafiee does it disclose that the mapping of layers to IMAs in Shafiee indicate how input data is segmented into smaller units.” (Remarks, p. 11, par. 2, ln. 5.) The argument is moot in view of new grounds of rejection.  See detailed rejection below. 
d)	Applicant argues that “Shafiee does not disclose, teach or suggest enabling flexible operations of varying input and output configurations.” (Remarks, p. 11, par. 2, ln. 8.) However, the claims do not require “enabling flexible operations of varying input and output configurations.” Therefore, argument is unpersuasive.
e)	Applicant argues that “Hence, amended claim 1 and its dependent claims are patentably distinguishable over Shafiee or its combination with Moeskops.” (Remarks, p. 11, par. 3.)  However, claim 1 remains rejected.  Therefore, dependent claims of claim 1 remain rejected.
f)	Applicant argues that “Independent claims 8 and 15, as amended, also recite similar features as amended claim 1. Hence, claim 8, 15 and their dependent claims are also patentably distinguishable over Shafiee or its combination with Moeskops.” (Remarks, p. 11, par. 4.) However, claim 1 remains rejected.  Therefore, independent claims 8 and 15, as well as their dependent claims, remain rejected.
Claim Objections
Claim 1 recites “data reader” in line 11.  There is insufficient antecedent basis for this limitation in the claim.  For the purpose of examination, Examiner is interpreting as “data reader circuit”.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 8-12, and 15-19 are rejected under 35 U.S.C. 103 as being unpatentable over Shafiee et al (ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars, herein Shafiee) and Ma et al (An Automatic RTL Compiler for High-Throughput FPGA Implementation of Diverse Deep Convolutional Neural Networks, herein Ma).
Regarding claim 1, 
	Shafiee teaches a neural processor circuit, comprising: 
	a data reader circuit comprising a first rasterizer circuit configured to track a segment of input data received from a source external to the neural processor circuit (Shafiee, Fig. 2, and p. 14, col. 2, par. 2, ln. 3 “A tile implements a neural functional unit (NFU) that has parallel digital arithmetic units; these units are fed with data from nearby SRAM buffers and eDRAM banks.” And, p. 17, col. 1, par. 1, ln. 3 “Control vectors are also loaded into each tile to drive the finite state machines that steer inputs and outputs correctly after every cycle. During inference, inputs are provided to ISAAC through an [external] I/O interface [implicitly discloses input data read from a source external to the neural processor circuit] and routed to the tiles implementing the first layer of the CNN. A finite state machine in the tile sends these inputs to appropriate IMAs [in-situ multiply-accumulate units].” And, p. 20, col. 1, par. 2, and Figs. 3-4, “The outputs of layer i-1 are stored in the eDRAM buffer for layer i’s tile.  As described in Figure 3, when a new set of inputs (Ni 16-bit values) shows up, it allows layer i to proceed with its next operation. This operation is itself pipelined (shown in detail in Figure 4b), with the cycle time (100 ns) dictated by the slowest stage, which is the crossbar read. In the first cycle, an eDRAM read is performed to read out 256 16-bit inputs [read a segment of input data].”  Examiner notes that there is no mention of “data reader circuit” in the specification of the instant application. In addition, the only mention of “data reader” is in the first paragraph of the Summary. “The neural processor circuit may include a data reader, a data buffer, and neural engines.” (Specification, par. [0003].) For the purpose of examination, Examiner is interpreting both “data reader circuit” and “data reader” as the same, both described as the “data reader” in the specification. In addition, “rasterizer circuit” is only mentioned in the Summary of the instant application. The specification recites “The data reader includes a rasterizer circuit that instructs the data reader to read a segment of input data from a source external to the neural processor circuit.” (Specification, par. [0003].)  The specification then recites “The data buffer includes another rasterizer circuit that generate and send work units (specification, par. [0003], ln. 6.) and, “In one or more embodiments, each of the neural engines include additional rasterizer circuits instructing each of the neural engines to shift portions of the work unit to be multiplied with the kernel at different cycles.”  It is clear from the specification that the term “rasterizer circuit” is being used generically.  For the purpose of examination, Examiner is interpreting “rasterizer circuit” as a generic term. In other words, node is neural processor circuit, external IO interface implies a source external to the neural processor circuit, Control vector is first rasterizer circuit, and eDRAM (within a Tile) is data reader circuit comprising a first rasterizer circuit configured to track a segment of input data received from a source external to the neural processor circuit.) 


    PNG
    media_image1.png
    467
    545
    media_image1.png
    Greyscale

a data buffer separate from the data reader circuit and configured to store the received segment of the input data from the data reader circuit and send work units, the data buffer comprising a second rasterizer circuit configured to track, independent from the first rasterizer, the segment of the input data and work units stored in the data buffer, each of the work units corresponding to a portion of the input data segment stored in the data buffer; and (Shafiee, p. 14, col. 2, par. 2, ln. 3 “A tile implements a neural functional unit (NFU) that has parallel digital arithmetic units; these units are fed with data from nearby SRAM buffers and eDRAM banks.” And, p. 17, Section IV and Fig. 3, “To understand how results are passed from one stage to the next, consider the following example, also shown in Figure 3. Assume that in layer i, a 6×6 input feature map is being convolved with a 2×2 kernel to produce an output feature map of the same size. Assume that a single column in an IMA has the four synaptic weights used by the 2×2 kernel. The previous layer i−1 produces outputs 0, 1, 2, ..., 6, 7, shown in blue in Figure 3a. All of these values are placed in the input buffer for layer i. At this point, we have enough information to start the operations for layer i. So inputs 0, 1, 6, 7 are fed to the IMA and they produce the first output for layer i. When the previous layer i−1 produces output 8, it gets placed in the input buffer for layer i. Value 0, shown in green in Figure 3b, is no longer required and can be removed from the input buffer.” And, p. 20, Section VI and Fig. 4, “The outputs of layer i−1 are stored in the eDRAM buffer [a data buffer configured to store the received segment of the input data] for layer i’s tile. As described in Figure 3, when a new set of inputs (Ni 16-bit values) shows up, it allows layer i to proceed with its next operation. This operation is itself pipelined (shown in detail in Figure 4b), with the cycle time (100 ns) dictated by the slowest stage, which is the crossbar read. In the first cycle, an eDRAM read is performed to read out 256 16-bit inputs [the data buffer comprising a second rasterizer circuit configured to generate and send work units, each work unit corresponding to a portion of the input data segment stored in the data buffer]. These values are sent over the shared bus to the IMA for layer i and recorded in the input register (IR).”

    PNG
    media_image2.png
    341
    556
    media_image2.png
    Greyscale

In other words, eDRAM buffer is data buffer separate from the data reader circuit, and is configured to store the received segment of the input data and send work units, and the data buffer is a second rasterizer circuit, independent from the first rasterizer circuit, configured to generate and send work units, the segment of the input data and the work units stored in the data buffer.)
neural engines separate from the data reader and the data buffer, and comprising: a first neural engine configured to receive a first work unit of the work units from the data buffer and generate a first output based on multiply-accumulate operations performed on a portion of the first work unit using a first kernel and (Shafiee, pp. 19-20, Section VI and Fig. 4, “This is best explained with the example shown in Figure 4. In this example, layer i is performing a convolution with a 4×4 shared kernel. The layer receives 16 input filters and produces 32 output filters (see Figure 4a). These output filters are fed to layer i+1 that performs a max-pool operation on every 2×2 grid. The 32 down-sized filters are then fed as input to layer i+2. For this example, assume that kernel strides (Sx and Sy) are always 1. We assume that one IMA has four crossbar arrays, each with 128 rows and 128 columns. Layer i performs a dot-product operation with a 4×4×16 matrix, i.e., we need 256 multiply-add operations, or a crossbar with 256 rows. … Once the input values have been copied to the IR, the IMA will be busy with the dot-product operation [multiply-accumulate operations] for the next 16+ cycles. In the next 16 cycles, the eDRAM is ready to receive other inputs and deal with other IMAs [neural engines configured to receive the work units] in the tile, i.e., it context-switches to handling other layers that might be sharing that tile while waiting for the result of one IMA. Over the next 16 cycles, the IR feeds 1 bit at a time for each of the 256 input values to the crossbar arrays. The first 128 bits are sent to crossbars 0 and 1, and the next 128 bits are sent to crossbars 2 and 3 [each crossbar performing multiply-accumulate operations performed on a portion of a work unit using a corresponding kernel]. At the end of each 100 ns cycle, the outputs are latched in the Sample & Hold circuits. In the next cycle, these outputs are fed to the ADC units. The results of the ADCs are then fed to the shift-and-add units, where the results are merged with the output register (OR) in the IMA.”  In other words, tiles are neural engines configured to receive work units, and performs dot product operation is generate a first output based on multiply-accumulate operations (IMAs) performed on a portion of the first work unit using a first kernel).
a second neural engine configured to receive a second work unit of the work units from the data buffer and generate a second output using a second kernel; and (Shafiee, Fig. 2, In other words, each of the tiles is a neural engine, eDRAM buffer is data buffer, and from Fig. 2 - multiple neural engines is a second neural engine configured to receive a second work unit of the work units from the data buffer and generate a second output using a second kernel.)
a neural task manager circuit coupled to the data reader circuit and the data buffer, the neural task manager circuit configured to send task information to the first and second rasterizer circuits [to program the first and second rasterizer circuits according to a configuration of the input data, the task information indicating at least how the input data is segmented into the work units.] (Shafiee, p. 17, col. 1, par. 1, ln. 1 “After training has determined the weights of every neuron, the weights are appropriately loaded into memristor cells with a programming step.  Control vectors are also loaded into each tile to drive the finite state machines that steer inputs and outputs correctly after every cycle.” In other words, finite state machine is neural task manager, which is coupled to the data reader circuit and the data buffer, control vectors loaded into each tile to drive the finite state machines is neural task manager is configured, and finite state machines steer inputs and outputs correctly after every cycle is send task information.)
Thus far, Shafiee does not explicitly teach to program the first and second rasterizer circuits according to a configuration of the input data, the task information indicating at least how the input data is segmented into the work units.
Ma teaches to program the first and second rasterizer circuits according to a configuration of the input data, the task information indicating at least how the input data is segmented into the work units. (The specification of the instant application, provides insight. (Specification, paragraph [0055], line 3 “A compiler executed by CPU 208 analyzes the hierarchy and nodes of the neural network and determines how the input data is to be segmented based on the hardware constraints of the neural processor circuit 218.  One of the functions of the compiler is to determine how input data is to be split into smaller data units for processing at the neural engines 314, and how the processing is to be iterated in loops to produce the results for tasks.” In other words, an external compiler, executed by the CPU, determines how the input data is to be segmented before it is submitted to the neural processer circuit in the claimed invention. There is no further description of the compiler in the specification.) Ma, Fig. 2, and p. 2, col. 1, par. 2, ln. 1 “In this work, a library-based CNN RTL compiler is proposed as shown in Fig. 2, where the user only need to input high-level CNN model information without touching low-level hardware design. It enables fast and automatic mapping of various deep CNN algorithms from software deep learning frameworks, e.g. Caffe [10], onto FPGAs with high efficiency and performance.  By this means, we can benefit from the reconfigurability of FPGA and finer optimization of RTL implementation. As CNNs are assembled by highly iterative computing primitives or layers, scalable RTL building block modules are designed for different types of layers and reused by different CNNs.  The RTL compiler configures these modules with CNN parameters, and it also scales the sizes of Processing Engines (PEs) and on-chip buffers based on the user-specific hardware design variables.  The main contributions of this work are as follows.” And, p. 2, col. 1, par. 7, ln. 1 “The dimensions and connections of  CNN layers and pretrained kernel weights are derived from Caffe [10] as the input to the CNN compiler.”  

    PNG
    media_image3.png
    371
    633
    media_image3.png
    Greyscale

In other words, RTL compiler is neural task manager circuit (the finite state machine is the counterpart in Shafiee), the FPGA contains the first and second rasterizer circuits, configures these modules is programs the first and second rasterizer circuits, and the RTL compiler configures these modules with CNN parameters, and it also scales the sizes of Processing Engines (PE) and on-chip buffers is task information indicating how the input data is segmented into the work units.)
	Both Ma and Shafiee are directed to hardware accelerators for CNNs, among other things.  Shafiee teaches a CNN accelerator with analog arithmetic in crossbars but does not explicitly teach a neural network manager that programs/configures components by sending task information that describes how input data is segmented.  Ma teaches an automatic RTL compiler that programs/configures an FPGA (hardware components) by sending task information that describes how input data is segmented, among other things.  It would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Ma into Shafiee.  This would result in a CNN accelerator that can be configured automatically based on input requirements. 
	One of ordinary skill in the art would be motivated to do this in order to reduce the work of manually customizing hardware to execute a particular CNN. ( Ma, p. 1, col. 1, par. 1, ln. 7 “Without a general compiler to automate the implementation, however, significant efforts and expertise are still required to customize the design for each CNN model.”)
Regarding claim 2, 
	The combination of Shafiee and Ma teaches the neural processor circuit of claim 1, wherein 
	each of the neural engines comprises a third rasterizer circuit instructing each of the neural engines to shift portions of a corresponding one of the work units to be multiplied with a corresponding kernel at different cycles, each of the work units of a size that results in an output of a size that fits in an accumulator of each of the neural engines. (Shafiee, p. 20, Section VI and Fig. 4, “Over the next 16 cycles, the IR [a third rasterizer circuit] feeds 1 bit at a time for each of the 256 input values to the crossbar arrays. The first 128 bits are sent to crossbars 0 and 1, and the next 128 bits are sent to crossbars 2 and 3. At the end of each 100 ns cycle [where shifted portions of the work unit are multiplied with the kernel at different cycles], the outputs are latched in the Sample & Hold circuits. In the next cycle, these outputs are fed to the ADC units. The results of the ADCs are then fed to the shift-and-add units, where the results are merged with the output register (OR) in the IMA.” In other words, IR is third rasterizer circuit, IMA is accumulator, bits sent to crossbar is shift portions of the work units to be multiplied, crossbar is where the work units are multiplied with a corresponding kernel, and crossbar multiplication is results in a size that fits in an accumulator of each of the neural engines.)
Regarding claim 3, 
	The combination of Shafiee and Ma teaches the neural processor circuit of claim 2, wherein 
	the third rasterizer circuit is configured to track one or more of a convolution group, a work unit, output channel group, an input channel group and a output channel corresponding to the portion of work unit being multiplied with the corresponding kernel (Shafiee, p. 20, Section VI and Fig. 4, “Over the next 16 cycles, the IR feeds 1 bit at a time [suggesting that the third rasterizer circuit is configured to track at least a work unit corresponding to the portion of work unit being multiplied with the kernel] for each of the 256 input values to the crossbar arrays. The first 128 bits are sent to crossbars 0 and 1, and the next 128 bits are sent to crossbars 2 and 3.” In other words, IR is third rasterizer circuit, and IR feeds 1 bit at a time is rasterizer circuit is configured to track at least a work unit corresponding to the portion of work unit being multiplied with the corresponding kernel.)
Regarding claim 4,
The combination of Shafiee and Ma teaches the neural processor circuit of claim 1, wherein 
	the task information further indicates a dimension of the input data, a dimension of the output data and dimensions of the first and second kernels. (Ma, p. 2, col. 1, par. 7, ln. 1 “The dimensions and connections of CNN layers and pretrained kernel weights are derived from Caffe [10] as the input to the CNN compiler.  Given the CNN parameters, the accelerator design variables, e.g. loop unrolling and tiling size shown in Fig. 3 and described in Section III, can be tuned by the user to balance the performance and required hardware resources.” In other words, dimensions and connections of CNN layers and pretrained kernel weights is dimension of input data and dimension of output data and dimensions of the first and second kernels. See mapping of claim 1.)
Regarding claim 5, 
	The combination of Shafiee and Ma teaches the neural processor circuit of claim 1, wherein 
	the segment of the input data is part of a slice of the input data, slices of the input data dividing the input data vertically and segments dividing each of the slices horizontally (Shafiee, p. 17, Section IV and Fig. 3, in the pipeline described in Figure 3, the segment of the input data stored in the buffer comprises multiple rows, or segments dividing each of the slices horizontally, and multiple columns, or slices of the input data dividing the input data vertically, of the depicted input feature map. In other words, segment is segment of input data, rows are horizontal slices, and columns are vertical slices.)
Regarding claims 8-12, claims 8-12 are directed to a method of operating a neural processor circuit, comprising elements and steps recited in claims 1-5, respectively. Therefore, the rejection made to claims 1-5 are applied to claims 8-12.
Regarding claims 15-19, claims 15-19 are directed to an integrated circuit system comprising a neural processor circuit, the neural processor circuit comprising elements and steps recited in claims 1-5, respectively. Therefore, the rejection made to claims 1-5 are applied to claims 15-19.
Claims 6-7, 13-14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Shafiee, Ma, and Moeskops et al. (“Automatic Segmentation of MR Brain Images With a Convolutional Neural Network, herein Moeskops).
Regarding claim 6, 
	The combination of Shafiee and Ma teaches the neural processor circuit of claim 1.
	Thus far, the combination of Shafiee and Ma does not explicitly teach the circuit, wherein the data reader is further configured to retrieve another segment of the input data for neural network processing from a source, the other segment having a shape different from the segment.
	Moeskops teaches the circuit, wherein the data reader is further configured to retrieve another segment of the input data for neural network processing from a source, the other segment having a shape different from the segment (Moeskops, p. 1255, Section IV-B and Fig. 1, “For all voxels within the brain mask, three in-plane patches [a source] with sizes of 25x25, 51x51 and 75x75 voxels are extracted, where the voxel of interest is in the centre. … A CNN with multiple convolution layers is used; a schematic of the network is shown in Fig. 1. In the first layers 24 kernels are trained for each patch size. For the patches of 25x25 voxels, kernels of 5x5 voxels are used, for the patches of 51x51 voxels, kernels of 7x7 voxels are used, and for the patches of 75x75 voxels, kernels of 9x9 voxels are used.” Moeskops discloses the retrieval of segments of different shapes from the input data for neural network processing. Moeskops, p. 1261, Acknowledgement discloses the use of a Tesla K40 GPU, which may disclose a data reader. In other words, retrieval of segments is further configured to receive another segment, and patches of  25x25 voxels, 51x51 voxels and 75x75 voxels is segments having different shapes.)
	Both Moeskops and the combination of Shafiee and Ma are directed to implementing convolutional neural networks, among other things. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the segment of input data in the combination of Shafiee and Ma to include another segment of the input data having a shape different from the segment, as disclosed in Moeskops. 
	One of ordinary skill in the art would be motivated to do this because doing so allows for the use of “multi-scale information” regarding input image data, and this “multi-scale approach allows the network to incorporate local details as well as global spatial consistency” (Moeskops, p. 1253, Section II).
Regarding claim 7, 
	The combination of Shafiee and Ma teaches the neural processor circuit of claim 1.
	Thus far, the combination of Shafiee and Ma does not explicitly teach the circuit, wherein at least two of the work units have different shapes.
	Moeskops teaches the circuit, wherein at least two of the work units have different shapes (Moeskops, p. 1255, Section IV-B and Fig. 1, “For all voxels within the brain mask, three in-plane patches with sizes of 25x25, 51x51 and 75x75 voxels are extracted, where the voxel of interest is in the centre. … A CNN with multiple convolution layers is used; a schematic of the network is shown in Fig. 1. In the first layers 24 kernels are trained for each patch size. For the patches of 25x25 voxels, kernels of 5x5 voxels are used, for the patches of 51x51 voxels, kernels of 7x7 voxels are used, and for the patches of 75x75 voxels, kernels of 9x9 voxels are used.” In other words, because Moeskops discloses three input feature maps of varying sizes with kernels of varying sizes, at least two of the work units have different shapes.).
	Both Moeskops and the combination of Shafiee and Ma are directed to implementing convolutional neural networks, among other things. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the work units in Shafiee to include work units having different shapes, as disclosed in Moeskops. 
One of ordinary skill in the art would be motivated to do this because doing so allows for the use of “multi-scale information” regarding input image data, and this “multi-scale approach allows the network to incorporate local details as well as global spatial consistency” (Moeskops, p. 1253, Section II).
Regarding claim 13, 
	The combination of Shafiee and Ma teaches the method of claim 8, further comprising: 
…
storing the other segment of the input data in the data buffer of the neural processor circuit for processing by the neural engines (Shafiee, p. 20, Section VI and Fig. 4, “The outputs of layer i−1 are stored in the eDRAM buffer [the data buffer configured to store a segment of the input data for the current layer] for layer i’s tile. As described in Figure 3, when a new set of inputs (Ni 16-bit values) shows up, it allows layer i to proceed with its next operation. This operation is itself pipelined (shown in detail in Figure 4b), with the cycle time (100 ns) dictated by the slowest stage, which is the crossbar read. In the first cycle, an eDRAM read is performed to read out 256 16-bit inputs. These values are sent over the shared bus to the IMA for layer i and recorded in the input register (IR).” In other words, the data buffer configured to store a segment of the input data for the current layer is storing the other segment of the input data in the data buffer.)
	Thus far, the combination of Shafiee and Ma does not explicitly teach the method, where the stored segment of the input data buffer of the neural processor circuit for processing by the neural engines is based on the step of retrieving another segment of the input data for neural network processing from the source, the other segment having a shape different from the segment (emphasis added).
	Moeskops teaches the method, further comprising retrieving another segment of the input data for neural network processing from the source, the other segment having a shape different from the segment (Moeskops, p. 1255, Section IV-B and Fig. 1, “For all voxels within the brain mask, three in-plane patches [a source] with sizes of 25x25, 51x51 and 75x75 voxels are extracted, where the voxel of interest is in the centre. … A CNN with multiple convolution layers is used; a schematic of the network is shown in Fig. 1. In the first layers 24 kernels are trained for each patch size. For the patches of 25x25 voxels, kernels of 5x5 voxels are used, for the patches of 51x51 voxels, kernels of 7x7 voxels are used, and for the patches of 75x75 voxels, kernels of 9x9 voxels are used.” Moeskops discloses the retrieval of segments of different shapes from the input data for neural network processing. Moeskops, p. 1261, Acknowledgement discloses the use of a Tesla K40 GPU, which may disclose a data reader. In other words, patches of different sizes (25x25, 51x51, 75x75) is the other segment having a shape different from the segment.)
	Both Moeskops and the combination of Shafiee and Ma are directed to implementing convolutional neural networks, among other things. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the segment of input data in Shafiee to include another segment of the input data having a shape different from the segment, as disclosed in Moeskops. 
	One of ordinary skill in the art would be motivated to this because doing so allows for the use of “multi-scale information” regarding input image data, and this “multi-scale approach allows the network to incorporate local details as well as global spatial consistency” (Moeskops, p. 1253, Section II).
Regarding claim 14, claim 14 is directed to a method of operating a neural processor circuit, comprising elements and steps recited in claim 6. Therefore, the rejection made to claim 6 is applied to claim 14.
Regarding claim 20, claim 20 is directed to an integrated circuit system comprising a neural processor circuit, the neural processor circuit comprising elements and steps recited in claim 6. Therefore, the rejection made to claim 6 is applied to claim 20.
Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to BART RYLANDER whose telephone number is (571)272-8359. The examiner can normally be reached Monday - Thursday 8:00 to 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/B.I.R./Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124