Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is in response to applicant’s amendment filed on January 20, 2021. Claims 1-2, 5, 10 and 13-16 are pending and under consideration. 

Response to Arguments
	Applicant’s amendments have overcome the previous claim objections, which have been withdrawn.
	Applicant’s arguments with respect to the rejection over Sakaguchi has been fully considered, but are not deemed to be persuasive in distinguishing over Sakaguchi as applied in the rejections below.
	On page 8 of applicant’s response, Applicant argues that: “Sakaguchi does not teach or suggest the specific mode of staggered delivery among adjacent groups of processing elements that is required by amended claim,” referring to the newly recited limitation of “the shift register is configured to deliver the one or more segments of the input data to groups of the processing elements such that in any given processing cycle of the processing elements, each segment of the input data is passed from one group of the processing elements to an adjacent group of the processing elements in the succession, and adjacent groups of the processing elements in the succession process the input data in different, respective windows, which are staggered so that any given line of the input data appears in different locations in the respective window of each of two or more adjacent groups of the processing elements” of amended independent claim 1. 
However, as further explained in the rejection below, FIG. 12 of Sakaguchi teaches offset regions (0, +1) and (0, 0), respectively corresponding to adjacent convolution operation circuits 230 (i.e., groups of processing elements). As shown below, the windows of these regions are staggered so that lines of input data (e.g., data in registers 1, 5, and 9, and data in registers 2, 6, and 10) are in different locations in the respective windows. 
[AltContent: arrow][AltContent: arrow][AltContent: roundedrect][AltContent: roundedrect][AltContent: textbox (Window for region (0, 0))][AltContent: arrow][AltContent: rect][AltContent: textbox (Line of input data)][AltContent: textbox (Line of input data)][AltContent: textbox (Window for region (+1, 0))][AltContent: arrow][AltContent: rect]
    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
FIG. 12 of Sakaguchi (with annotations)
Furthermore, as shown in FIG. 8 of Sakaguchi, the data is shifted from right to left, therefore passing the data between adjacent groups of processing elements. 

Therefore, the independent claims remain rejected over Sakaguchi.  

Claim Interpretation
Claims 1-2, 5 and 7-8 invoke 35 U.S.C. 112(f) for the reasons stated in the previous Office Action. In these claims, the limitation “processing elements” invokes 35 U.S.C. 112(f) for the reasons stated in the previous Office Action. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-2, 6-10 and 14-16 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Sakaguchi et al. (US 2019/0205780 A1) (“Sakaguchi”).
As to claim 1, Sakaguchi teaches a computational apparatus, comprising:
an input buffer configured to hold a first array of input data; [working memory 105, as shown in FIGS. 5-6 and 8. [0067]: “the working memory 105 holds the image data as the target of the image recognition, the feature maps input and output in each layer.”]
an output buffer configured to hold a second array of output data computed by the apparatus; [FIG. 6: Pooling process circuit 250 (as described in [0073]), which receives and holds the output data computed by the two-dimensional convolution operation circuits 230. Additionally, as shown in FIG. 6 (top of the figure), the two-dimensional convolution operation results are stored in the working memory 105, which may also correspond to the output buffer of the instant claim (see [0067], which state that “the working memory 105 holds…the feature maps…output in each layer”).] 
a plurality of processing elements, each processing element configured to compute a convolution of a respective kernel comprising a matrix of coefficients with a set of the input data that are contained within a respective window [FIG. 6: two-dimensional convolution operation circuits 230, each configured to compute a convolution of a respective kernel using weights stored in coefficient memory 232. See [0077]: “convolution operation of two-dimensional filters in the kernel size of kw×kh and with the same weight coefficient is performed at the same time”; [0079]: “convolution operation of two-dimensional filters…is performed at the same time”; see also [0043]-[0044] regarding convolutional filters. With respect to the limitation of “comprising a matrix of coefficients,” [0048] teaches generally “weight coefficient w (i, j, k)” and FIG. 11 illustrates a set of weight coefficients. As shown in FIGS. 6-8, the convolution is performed with a set of the input data supplied from the input buffer 210 (specifically, data from the feature map from memory 105). The input data is contained within a respective window, as illustrated in FIG. 2, top left portion.] and to write a result of the convolution to a corresponding location in a respective plane of the output data; [As illustrated in FIG. 2 (bottom right portion), the results of the convolution are written to a plane on the feature map ([0049]: “The data output from the convolutional layer 40 is a three-dimensional feature map z (x, y, m).”). This feature map may be processed by the pooling process circuit 250 of FIG. 6 (see [0073]) and/or stored in working memory 105 (as stated in FIG. 6, top portion, and described in [0067]).]
one or more data fetch units, each coupled to read one or more segments of the input data from the input buffer; [Input buffer 210 shown in FIG. 8. The input buffer 210 includes input FIFOs 211, which read input data from the working memory 105 (as shown in FIGS. 6 and 9). [0083]: “The input FIFO 211 is a memory in a FIFO (First-In First-Out) structure for holding data input to the bottom row of the two-dimensional shift register 220. The input FIFO 211 includes at least one stage of registers.” Input buffer 210 also includes registers 212 that hold (and thus also read), “the data input to each row of the two-dimensional shift register,” as described in [0084], or, as shown in FIG. 9, input buffers 213 that perform a similar function (as described in [0087]).] and
a shift register, [Shift register including two-dimensional shift register 220 (FIGS. 6-9). [0070]: “The two-dimensional shift register 220 is a shift register that holds the data supplied from the input buffer 210 in two-dimensional regions.”] which is coupled to receive the one or more segments of the input data from the one or more data fetch units and to deliver the one or more segments of the input data in succession to each of the processing elements in an order selected so that the respective window of each processing element slides in turn over a sequence of window positions covering the first array, [FIGS. 6-8: Registers 221 of the shift register 220 receive segments of data from input FIFOs 211 and deliver the segments of data in succession to the two-dimensional convolution operation circuits 230 via selectors 222/223 (as shown in FIG. 7) in an order selected so that the window of each two-dimensional convolution operation circuit 230 slides in turn over a sequence of window position covering the first array. See [0080]-[0081] regarding the delivery of data to the convolution operation circuits via the selectors 222/223. Note that the “slides in turn” limitation is illustrated in FIG. 2, which shows the window moving in the x-direction before moving in the y-direction. Note that x and y are coordinate directions on input and output feature maps (as described in [0047] and [0049]), and are incremented as described in [0050]. Furthermore, as shown in FIG. 8, each row of registers 221, 212 corresponds to an amount of data corresponding to the width of the input array, and the data is shifted from right to left (for movement of windows in the x-direction) and from bottom rows to top rows. See [0084]-[0085].] whereupon the result of the convolution for each window position is written by each processing element to the location corresponding to the window position in the respective plane in the output buffer. [FIG. 6: The results of the two-dimensional convolution operation circuits 230 are written onto a feature map, as illustrated in FIG. 2. As shown in FIG. 6, the pooling process circuit and/or working memory 105 receives the feature map from the convolutional operation circuits 230. See also [0053], describing that the data input to the pooling layer is “a three-dimensional feature map a (x, y, m) [that is] output from the previous layer,” and [0077].] 
wherein the shift register is configured to deliver the one or more segments of the input data to groups of the processing elements [Convolution operation circuits 230 (see FIG. 6) corresponding to regions (0, +1), (0, 0), (+1, 0), and (+1, +1) shown in FIG. 12 are considered to be groups, noting that according to page 8 of the specification, a “group” may consist of only one member. See [0096]: “FIG. 12 is a diagram illustrating an example of the parallel processing between the offset regions according to the embodiments of the present technique…” Here, the convolution operation circuit for offset region (+1, 0) constitutes one “group of the processing elements” and convolution operation circuit for offset region for offset region (0, 0) constitutes an adjacent group. See [0097]: “A two-dimensional convolution operation result cnv_00 is output in relation to the region based on the coordinate position (+0, +0)… A two-dimensional convolution operation result cnv_10 is output in relation to the region based on the coordinate position (+1, +0).” Similarly, the circuits for regions (0, +1) and (+1, +1) are adjacent groups.] such that in any given processing cycle of the processing elements, each segment of the input data is passed from one group of the processing elements to an adjacent group of the processing elements in the succession [See [0082]-[0089]. In particular [0085] teaches: “the shift-in operation to the left is performed all at once every time the two-dimensional convolution operation of kw×kh pixels (3×3 pixels) is performed.” As shown in FIG. 8, the data is moved from right to left (see also [0078]: “registers are shifted from right to left”). In doing so, the data is shifted from one group, such as the circuit for region (+1,0), to another group, such as the circuit for region (0, 0). With respect to the limitation of “processing cycle,” the instant claim does not specifically define what particular operations constitutes a cycle. Therefore, any repetition, as shown in FIG. 13 may be regarded as a processing cycle, including any cycle that includes a shift-in.] and adjacent groups of the processing elements in the succession [As noted above, the convolution operation circuit for offset region (0, +1) is one group and the convolution operation circuit for offset region (0, 0) is an adjacent group. Similarly, the circuits for regions (0, +1) and (+1, +1) are adjacent groups.] process the input data in different, respective windows, which are staggered so that any given line of the input data appears in different locations in the respective window of each of two or more adjacent groups of the processing elements. [See FIG. 12, as described in [0096] (“FIG. 12 is a diagram illustrating an example of the parallel processing between the offset regions according to the embodiments of the present technique…”) et seq. As shown in the annotated figure below, the window of region (0, 0) corresponds to register #s 0, 1, 2, 4, 5, 6, 8, 9, and 10, and the window of region (+1, 0) corresponds to registers #s 1, 2, 3, 5, 6, 7, 9, 10, and 11. 
[AltContent: arrow][AltContent: arrow][AltContent: roundedrect][AltContent: roundedrect][AltContent: textbox (Window for region (0, 0))][AltContent: arrow][AltContent: rect][AltContent: textbox (Line of input data)][AltContent: textbox (Line of input data)][AltContent: textbox (Window for region (+1, 0))][AltContent: arrow][AltContent: rect]
    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
FIG. 12 of Sakaguchi (with annotations)
These two windows are “staggered” as shown since they overlap partly. Accordingly, a line of data, such as the data corresponding to #s 1, 5, and 9 or the data corresponding to register #s 2, 6, and 10 appear in different locations in the respective windows. Similarly, the windows for regions (0, +1) and (+1, +1) are likewise staggered. The convolution operation circuits for these regions “process input data” in accordance with the operations disclosed in relation to FIG. 10. That is, the data in the registers are subject to respective multipliers 231 and accumulators, as described in [0090] and [0093].]

As to claim 2, Sakaguchi teaches the apparatus according to claim 1, wherein the processing elements are configured to compute a respective line of the output data in the second array for each traversal of the first array by the respective window, [As shown in FIG. 2, the convolution operation implemented by the convolution operation circuits 230 generates a line of output data (e.g., a line in the x-direction, which is incremented as shown at the bottom right portion of FIG. 2 and in the formula in [0050])] and wherein the one or more data fetch units and the shift register are configured so that each of the one or more segments of the input data is read from the input buffer no more than once per line of the output data [Since Sakaguchi teaches that the data fetch units are first-in first-out (see [0083], which describes the “FIFO (First-In First-Out) structure for holding data input to the bottom row of the two-dimensional shift register 220”), and since FIGS. 8 shows a unidirectional passage of data from the input FIFOs 211, each of the segments of the input data is read from the input buffer no more than once per line of the output data] and then delivered by the shift register to all of the processing elements in the succession. [Since the data is delivered through the set of registers 211 in the two-dimensional shift register 220, as shown in FIG. 8, the data is delivered in succession to a plurality of two-dimensional convolution operation circuits 230 that would read on “all of the processing elements.” See also FIGS. 12. In particular, [0096]-[0097] describe adjacent sets of convolution operation circuits as noted in the rejection of the parent claim, that receives input data in succession.]

As to claim 6, Sakaguchi teaches the apparatus according to claim 1, wherein each processing element comprises one or more multipliers, which multiply the input data by weights in the respective kernel, [FIG. 7: multipliers 231; [0081]: “The multiplier 231 is configured to multiply coefficient data stored in a coefficient memory 232 by the data selected by the selector 223.” Note that the data selected by the selector corresponds to the input data from the shift register 220. Additionally, the coefficient data stored in coefficient memory 232 are the weights for a kernel. See [0081]: “The coefficient memory 232 is a memory that stores coefficient data (weight coefficient).”] and an accumulator, which sums products output by the one or more multipliers. [FIG. 7: accumulators 233 (for each convolution operation circuit 230). [0081]: “The accumulators 233 are provided corresponding to the multipliers 231, respectively, and are configured to accumulate multiplication results of the multipliers 231 to output a two-dimensional convolution operation result.” Each of the accumulators 233 sums the respective products, as described in [0094]-[0095] (and also in [0105] and [0107]) which disclose summation operations such as “(D0×Coef[9]) + (D1×Coef[10]).” See also FIG. 11, rightmost column, which shows the summation process.]  

As to claim 7, Sakaguchi teaches the apparatus according to claim 1, wherein the input data held by the input buffer comprise pixels of an image. [[0067]: “the working memory 105 holds the image data as the target of the image recognition, the feature maps input and output in each layer.” As shown in FIG. 1, the input image 10 may be fed into a first convolutional layer, which is implemented by the circuit shown in FIG. 6. Furthermore, [0078] teaches “pixels in the image.”]

As to claim 8, Sakaguchi teaches the apparatus according to claim 1, wherein the input data held by the input buffer comprise intermediate results, corresponding to feature values computed by a preceding layer of convolution. [[0067]: “the working memory 105 holds the image data as the target of the image recognition, the feature maps input and output in each layer.” As shown in FIG. 1, the convolution and pooling operations may be repeated. Therefore, the second convolutional layer 40, which is implemented by the circuit shown in FIG. 6, would receive intermediate results corresponding to feature values computed by a preceding layer of convolution.]

As to claim 9, Sakaguchi teaches a method for computation, comprising:
receiving a first array of input data in an input buffer; [FIGS. 5-6 and 8: working memory 105, corresponding to an input buffer. [0067]: “the working memory 105 holds the image data as the target of the image recognition, the feature maps input and output in each layer.”]
transferring successive segments of the input data from the input buffer [FIG. 8: Input FIFOs 211, which read input data from the working memory 105 (as shown in FIGS. 6 and 9) and transfers them to a shift register. [0083]: “The input FIFO 211 is a memory in a FIFO (First-In First-Out) structure for holding data input to the bottom row of the two-dimensional shift register 220. The input FIFO 211 includes at least one stage of registers.”] into a shift register; [Sakaguchi teaches a shift register including two-dimensional shift register 220 (FIGS. 6-8). [0070]: “The two-dimensional shift register 220 is a shift register that holds the data supplied from the input buffer 210 in two-dimensional regions.” It is noted that shift register 220 receives data via input buffer 210, which also includes registers 212 that hold (and thus also read), “the data input to each row of the two-dimensional shift register,” as described in [0084], or, as shown in FIG. 9, input buffers 213 that perform a similar function (as described in [0087]).]
delivering the segments of the input data from the shift register in succession to each of a plurality of processing elements, in an order selected so that a respective window of each processing element slides in turn over a sequence of window positions covering the first array; [FIGS. 6-8: Registers 221 of the shift register 220 receive segments of data from input FIFOs 211 and deliver the segments of data in succession to the two-dimensional convolution operation circuits 230 via selectors 222/223 (as shown in FIG. 7) in an order selected so that the window of each two-dimensional convolution operation circuit 230 slides in turn over a sequence of window position covering the first array. See [0080]-[0081] regarding the delivery of data to the convolution operation circuits via the selectors 222/223. Note that the “slides in turn” limitation is illustrated in FIG. 2, which shows the window moving in the x-direction before moving in the y-direction. Note that x and y are coordinate directions on input and output feature maps (as described in [0047] and [0049]), and are incremented as described in [0050]. Furthermore, as shown in FIG. 8, each row of registers 221, 213 corresponds to an amount of data corresponding to the width of the input array, and the data is shifted between the rows of registers. See [0084]-[0085].]
computing in each processing element a convolution of a respective kernel comprising a matrix of coefficients with a set of the input data that are contained within the respective window, as the respective window slides over the sequence of window positions, [FIG. 6: two-dimensional convolution operation circuits 230, each configured to compute a convolution of a respective kernel using weights stored in coefficient memory 232. See [0077]: “convolution operation of two-dimensional filters in the kernel size of kw×kh and with the same weight coefficient is performed at the same time”; [0079]: “convolution operation of two-dimensional filters…is performed at the same time”; see also [0043]-[0044] regarding convolutional filters. With respect to the limitation of “comprising a matrix of coefficients,” [0048] teaches generally “weight coefficient w (i, j, k)” and FIG. 11 illustrates a set of weight coefficients. As shown in FIGS. 6-8, the convolution is performed with a set of the input data supplied from the input buffer 210 (specifically, data from the feature map from memory 105). The input data is contained within a respective window, as illustrated in FIG. 2, top left portion, and slides over a sequence of window positions (FIG. 2, top left portion showing the window sliding in the x direction on the feature map).] and writing a result of the convolution for each window position to a corresponding location in a respective plane in a second array of output data [With respect to the operation of “writing a result…,” FIG. 2 (bottom right portion) illustrates that the results of the convolution are written to a plane on the feature map. This feature map may be processed by the pooling process circuit 250 of FIG. 6, in the manner shown in FIG. 3. See also [0053], describing that the data input to the pooling layer is “a three-dimensional feature map a (x, y, m) [that is] output from the previous layer,” and [0077]. Additionally, as shown in FIG. 6 (top of the figure), the two-dimensional convolution operation results are stored in the working memory 105, which may also correspond to the output buffer of the instant claim.] in an output buffer. [FIG. 6: Pooling process circuit 250 (as described in [0073]), which receives and holds the output data computed by the two-dimensional convolution operation circuits 230. Additionally, as shown in FIG. 6 (top of the figure), the two-dimensional convolution operation results are stored in the working memory 105, which may also correspond to the output buffer of the instant claim (see [0067], which state that “the working memory 105 holds…the feature maps…output in each layer”).]
wherein delivering the segments of the input data comprises passing the segments of the input data to groups of the processing elements [Convolution operation circuits 230 (see FIG. 6) corresponding to regions (0, +1), (0, 0), (+1, 0), and (+1, +1) shown in FIG. 12 are considered to be groups, noting that according to page 8 of the specification, a “group” may consist of only one member. See [0096]: “FIG. 12 is a diagram illustrating an example of the parallel processing between the offset regions according to the embodiments of the present technique…” Here, the convolution operation circuit for offset region (+1, 0) constitutes one “group of the processing elements” and convolution operation circuit for offset region for offset region (0, 0) constitutes an adjacent group. See [0097]: “A two-dimensional convolution operation result cnv_00 is output in relation to the region based on the coordinate position (+0, +0)… A two-dimensional convolution operation result cnv_10 is output in relation to the region based on the coordinate position (+1, +0).” Similarly, the circuits for regions (0, +1) and (+1, +1) are adjacent groups.] such that in any given processing cycle of the processing elements, each segment of the input data is passed by the shift register from one group of the processing elements to an adjacent group of the processing elements in the succession, [FIG. 8, as described in [0082]-[0089]. In particular [0085] teaches: “the shift-in operation to the left is performed all at once every time the two-dimensional convolution operation of kw×kh pixels (3×3 pixels) is performed.” As shown in FIG. 8, the data is moved from right to left (see also [0078]: “registers are shifted from right to left”). In doing so, the data is shifted from one group, such as the circuit for region (+1, 0), to another group, such as the circuit for region (0, 0). With respect to the limitation of “processing cycle,” the instant claim does not specifically define what particular operations constitutes a cycle. Therefore, any repetition, as shown in FIG. 13 may be regarded as a processing cycle, including a repetition that includes a shift-in.] and adjacent groups of the processing elements in the succession [As noted above, the convolution operation circuit for offset region (0, +1) is one group and the convolution operation circuit for offset region (0, 0) is an adjacent group. Similarly, the circuits for regions (0, +1) and (+1, +1) are adjacent groups.] process the input data in different, respective windows, which are staggered so that any given line of the input data appears in different locations in the respective window of each of two or more adjacent groups of the processing elements. [See FIG. 12, as described in [0096] (“FIG. 12 is a diagram illustrating an example of the parallel processing between the offset regions according to the embodiments of the present technique…”) et seq. As shown in the annotated figure in the rejection of claim 1, the window of region (0, 0) corresponds to register #s 0, 1, 2, 4, 5, 6, 8, 9, and 10, and the window of region (+1, 0) corresponds to registers #s 1, 2, 3, 5, 6, 7, 9, 10, and 11. These two windows are “staggered” as shown since they overlap partly. Accordingly, a line of data, such as the data corresponding to #s 1, 5, and 9 or the data corresponding to register #s 2, 6, and 10 appear in different locations in the respective windows. Similarly, the windows for regions (0, +1) and (+1, +1) are likewise staggered. The convolution operation circuits for these regions “process input data” in accordance with the operations disclosed in relation to FIG. 10. That is, the data in the registers are subject to respective multipliers 231 and accumulators, as described in [0090] and [0093].]

As to claim 10, Sakaguchi teaches the method according to claim 9, wherein computing the convolution comprises computing a respective line of the output data in the second array for each traversal of the first array by the respective window, [As shown in FIG. 2, the convolution operation implemented by the convolution operation circuits 230 generates a line of output data (e.g., a line in the x-direction, which is incremented as shown at the bottom right portion of FIG. 2 and in the formula in [0050])] and wherein fetching the successive segments comprises reading each of the segments of the input data from the input buffer no more than once per line of the output data, [Since Sakaguchi teaches that the data fetch units are first-in first-out (see [0083], which describes the “FIFO (First-In First-Out) structure for holding data input to the bottom row of the two-dimensional shift register 220”), and since FIG. 8 shows a unidirectional passage of data from the input FIFOs 211, each of the segments of the input data is read from the input buffer no more than once per line of the output data] and wherein delivering the segments comprises passing each of the segments of the input data from the shift register to all of the processing elements in the succession. [Since the data is delivered through the set of registers 211 in the two-dimensional shift register 220, as shown in FIG. 8, the data is delivered in succession to a plurality of two-dimensional convolution operation circuits 230 that would read on “all of the processing elements.” See also FIGS. 12. In particular, [0096]-[0097] describe adjacent sets of convolution operation circuits, as noted in the rejection of the parent claim, that receives input data in succession.]

As to claim 14, Sakaguchi teaches the method according to claim 9, wherein computing the convolution comprises, in each processing element, multiplying the input data by weights in the respective kernel to give respective products, [FIG. 7: multipliers 231; [0081]: “The multiplier 231 is configured to multiply coefficient data stored in a coefficient memory 232 by the data selected by the selector 223.” Note that the data selected by the selector corresponds to the input data from the shift register 220. Additionally, the coefficient data stored in coefficient memory 232 are the weights for a kernel. See [0081]: “The coefficient memory 232 is a memory that stores coefficient data (weight coefficient).”] and summing the respective products. [FIG. 7: accumulators 233 (for each convolution operation circuit 230), as described in [0081]: “The accumulators 233 are provided corresponding to the multipliers 231, respectively, and are configured to accumulate multiplication results of the multipliers 231 to output a two-dimensional convolution operation result.” Each of the accumulators 233 (see FIG. 7: accumulators 233 as described in [0081]) sums the respective products, as described in [0094]-[0095] (and also in [0105] and [0107]) which disclose summation operations such as “(D0×Coef[9]) + (D1×Coef[10]).” See also FIG. 11, rightmost column, which shows the summation process.]  

As to claim 15, Sakaguchi teaches the method according to claim 9, wherein the input data held by the input buffer comprise pixels of an image. [[0067]: “the working memory 105 holds the image data as the target of the image recognition, the feature maps input and output in each layer.” As shown in FIG. 1, the input image 10 may be fed into a first convolutional layer, which is implemented by the circuit shown in FIG. 6. Furthermore, [0078] teaches “pixels in the image.”]

As to claim 16, Sakaguchi teaches the method according to claim 9, wherein the input data held by the input buffer comprise intermediate results, corresponding to feature values computed by a preceding layer of convolution. [[0067]: “the working memory 105 holds the image data as the target of the image recognition, the feature maps input and output in each layer.” As shown in FIG. 1, the convolution and pooling operations may be repeated. Therefore, the second convolutional layer 40, which is implemented by the circuit shown in FIG. 6, would receive intermediate results corresponding to feature values computed by a preceding layer of convolution.]

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 5 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Sakaguchi in view of Pattichis et al. (US 2018/0357744 A1) (“Pattichis”).
As to claim 5, Sakaguchi teaches the apparatus according to claim 1, as set forth in the rejection above, but does not teach that the “shift register” comprises “a cyclic shift register, such that a final processing element in the succession is adjacent, with respect to the cyclic shift register, to an initial processing element in the succession.” 
Pattichis, in an analogous art, teaches the above limitations not taught by Sakaguchi. Pattichis pertains to image processing techniques using convolution kernels ([0003] and claim 1). Therefore, Pattichis is analogous for at least the reason of being in the same field of endeavor as the claimed invention.
In particular, Pattichis teaches “wherein the shift register comprises a cyclic shift register, such that a final processing element in the succession is adjacent, with respect to the cyclic shift register, to an initial processing element in the succession.” [[0021]: “the image data is processed using an array of circular shift registers.” [0028]: “loading N pixels into a circular shift register in a single clock cycle… A circular right-shift is performed on the convolution kernel in a single cycle. The convolution result may be stored and the process repeats until all the convolution outputs have been computed.” With respect to the limitation of “such that a final processing element in the succession is adjacent, with respect to the cyclic shift register, to an initial processing element in the succession,” this limitation results from the combined teachings of Pattichis and Sakaguchi as set forth below, since a circular (cyclic) shift register connects the last output with the first input.] Pattichis further teaches the benefit of “fast and scalable architectures and methods for computing 2-D convolutions and cross-correlations as well as those that fit in new devices” ([0011]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Sakaguchi with the teachings of Pattichis by modifying the shift register of Sakaguchi to comprise a cyclic shift register, such that a final processing element in the succession is adjacent, with respect to the cyclic shift register, to an initial processing element in the succession. One of ordinary skill in the art would have been motivated to do so in order to utilize circular convolutions for implementing fast and scalable architectures and methods for computing 2-D convolutions and cross-correlations, as suggested by Pattichis, [0011] (quoted above).

As to claim 13, Sakaguchi teaches the method according to claim 9, as set forth in the rejection above, but does not teach that the “shift register” comprises “a cyclic shift register, such that a final processing element in the succession is adjacent, with respect to the cyclic shift register, to an initial processing element in the succession.” 
Pattichis, in an analogous art, teaches the above limitations not taught by Sakaguchi. Pattichis pertains to image processing techniques using convolution kernels ([0003] and claim 1). Therefore, Pattichis is analogous for at least the reason of being in the same field of endeavor as the claimed invention.
In particular, Pattichis teaches “wherein the shift register comprises a cyclic shift register, such that a final processing element in the succession is adjacent, with respect to the cyclic shift register, to an initial processing element in the succession.” [[0021]: “the image data is processed using an array of circular shift registers.” [0028]: “loading N pixels into a circular shift register in a single clock cycle… A circular right-shift is performed on the convolution kernel in a single cycle. The convolution result may be stored and the process repeats until all the convolution outputs have been computed.” With respect to the limitation of “such that a final processing element in the succession is adjacent, with respect to the cyclic shift register, to an initial processing element in the succession,” this limitation results from the combined teachings of Pattichis and Sakaguchi as set forth below, since a circular (cyclic) shift register connects the last output with the first input.] Pattichis further teaches the benefit of “fast and scalable architectures and methods for computing 2-D convolutions and cross-correlations as well as those that fit in new devices” ([0011]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Sakaguchi with the teachings of Pattichis by modifying the shift register of Sakaguchi to comprise a cyclic shift register, such that a final processing element in the succession is adjacent, with respect to the cyclic shift register, to an initial processing element in the succession. One of ordinary skill in the art would have been motivated to do so in order to utilize circular convolutions for implementing fast and scalable architectures and methods for computing 2-D convolutions and cross-correlations, as suggested by Pattichis, [0011] (quoted above).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YAO DAVID HUANG whose telephone number is (571)270-1764.  The examiner can normally be reached on Monday - Friday 8:30 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571) 270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Y.D.H./Examiner, Art Unit 2124                                                                                                                                                                                                        




/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124