DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The disclosure is objected to because of the following informalities: summary of the invention recites verbatim as claim language.  Appropriate correction is required.
The incorporation of essential material in the specification [spec 0032] by reference to an unpublished U.S. application, foreign application or patent, or to a publication is improper. Applicant is required to amend the disclosure to include the material incorporated by reference, if the material is relied upon to overcome any objection, rejection, or other requirement imposed by the Office. The amendment must be accompanied by a statement executed by the applicant, or a practitioner representing the applicant, stating that the material being inserted is the material previously incorporated by reference and that the amendment contains no new matter. 37 CFR 1.57(g).
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Regarding claims 1 and 10, the phrase "such that" renders the claim indefinite because it is unclear whether the limitations following the phrase are part of the claimed invention.  See MPEP § 2173.05(d).

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim recites receiving a first input and training a kernel to be symmetric…
The limitation of receiving a first input as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting by a processor, nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the by a processor language, training in the context of this claim encompasses the user manually receiving and training Similarly, the limitation of training a kernel, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. For example, but for the by a processor language, receiving in the context of this claim encompasses the user thinking that the receiving and training. The claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, it appears to be a “Mental Processes” [[grouping]] of abstract ideas. Accordingly, the claim recites an abstract idea. 
This judicial exception is not integrated into a practical application. In particular, the claim only recites one additional element – using a processor to perform both the receiving and training steps. The processor in both steps is recited at a high-level of generality (i.e., as a generic processor performing a generic computer function of receiving and training) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a processor to perform both receiving and training no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.



For dependent claims 
Claims 2-18 the claims do not remedy claim 1 and therefore are directed to non-statutory subject matter for the same reason(s) as noted above. 

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s)s 1, 10 and 19 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Puri USPN 7,747,070.
Regarding claims 1 and 10
Puri teaches 
receiving a first input of an image in a first orientation (column 3, line 30, the techniques described herein are implemented on a graphics processing unit. One example of a graphics processing unit is shown in FIG. 3, which illustrates a simplified overview of a traditional GPU architecture 300. In one implementation, the GPU architecture corresponds to the GPU 815 illustrated in FIG. 8. Display data 305, which describes geometry of an image to be rendered, is input into vertex shader units 310, which generate polygonal representations of the geometric forms. These geometric forms are then input into a rasterizer, which interpolates the polygons and samples them to develop a sample set of points in image space, which can then be shaded and have texture added to them. These points are then passed to a series of programmable pixel shader units 330 and which utilize parallel computing techniques to perform shading of the points, as well as adding and manipulating textures. It is this ability to perform parallel computations as well as to manipulate textures which makes the GPU, and the pixel shader units in particular, a useful platform for neural network computation. Pixel shader unit computation is frequently performed under the control of pixel shader programs, which are GPU-executable programs written to take advantage of the pixel shader units);
training a kernel to be symmetric such that an output corresponding to the first input is the same as an output corresponding to a second input of the image in a second orientation (column 3, line 60, while the fully-connected neural networks described above are able, when properly trained, to recognize handwriting, they oftentimes fail to take advantage of shape and proximity when operating on input. One reason for this is that every pixel is operated on independently, ignoring adjacent pixels. For this reason, convolutional neural networks are also used, which operate by associating an array of values with each neuron, rather than a single value. Conceptually, this array can be thought of as a small patch of an image. The transformation of a neuron value for the subsequent layer is generalized from multiplication to convolution. This implies that the connection strengths 230 are convolution kernels rather than scalar values. FIGS. 4a and 4b show two examples of convolutional kernels operating on a sample 400 of a letter "m." In FIG. 4a, the sample is combined with a convolution kernel 410 representing a vertical line. The resulting patch of pixels 420 comprises the three vertical lines which are present in the sample. Similarly, in FIG. 4b, the handwriting sample 450 is combined with a convolutional kernel 460 representing a diagonal line going up and to the right. This results in a patch of pixels 460 which contains the two diagonal lines of the input character. As FIGS. 4a and 4b show, the two result patches show different information for the character, while preserving pixel adjacency. This can result in more efficient character recognition).

Regarding claim 20
Puri teaches
receiving, by a neural network, a first input of an image in a first orientation (column 3, line 30, the techniques described herein are implemented on a graphics processing unit. One example of a graphics processing unit is shown in FIG. 3, which illustrates a simplified overview of a traditional GPU architecture 300. In one implementation, the GPU architecture corresponds to the GPU 815 illustrated in FIG. 8. Display data 305, which describes geometry of an image to be rendered, is input into vertex shader units 310, which generate polygonal representations of the geometric forms. These geometric forms are then input into a rasterizer, which interpolates the polygons and samples them to develop a sample set of points in image space, which can then be shaded and have texture added to them. These points are then passed to a series of programmable pixel shader units 330 and which utilize parallel computing techniques to perform shading of the points, as well as adding and manipulating textures. It is this ability to perform parallel computations as well as to manipulate textures which makes the GPU, and the pixel shader units in particular, a useful platform for neural network computation. Pixel shader unit computation is frequently performed under the control of pixel shader programs, which are GPU-executable programs written to take advantage of the pixel shader units);

generating a first loss function based on a first output associated with the first input (column 1, line 64, However, these connections can be computationally complex, as FIG. 2 illustrates. FIG. 2 is a block diagram of a method of computing neuron values based on the values found in the previous layer. It should be noted that while FIG. 2 illustrates various matrices, the indexes (or sizes) of the matrices will vary from layer to layer and network to network and various implementations may orient the matrices or map the matrices to computer memory differently. As FIG. 2 illustrates, one method of implementing neural networks is to treat each level as a matrix of neuron values, as is illustrated by layer 0 matrix 210. Connection strengths can then be implemented as a transformation matrix 220, which is multiplied by the layer 0 matrix 210. This multiplication allows each value in the previous layer to be scaled according to connection strengths, and then summed, all through normal matrix multiplication. After the multiplication is performed, a bias matrix 230 is then added to the product matrix to account for the threshold of each neuron in the next level. Then a sigmoid function (in one implementation, tan h( )) is applied to each resultant value to determine if the threshold was met, and the resulting values are placed in the matrix for the next layer. This can also be called a "squashing function." Thus, as FIG. 2 shows, the connections between each layer, and thus an entire network, can be represented as a series of matrices. Finding proper values for these matrices, then, is the problem of training a neural network);
receiving, by the neural network, a second input of the image in a second orientation (see summary of the invention and (column 3, line 60, while the fully-connected neural networks described above are able, when properly trained, to recognize handwriting, they oftentimes fail to take advantage of shape and proximity when operating on input. One reason for this is that every pixel is operated on independently, ignoring adjacent pixels. For this reason, convolutional neural networks are also used, which operate by associating an array of values with each neuron, rather than a single value. Conceptually, this array can be thought of as a small patch of an image. The transformation of a neuron value for the subsequent layer is generalized from multiplication to convolution. This implies that the connection strengths 230 are convolution kernels rather than scalar values. FIGS. 4a and 4b show two examples of convolutional kernels operating on a sample 400 of a letter "m." In FIG. 4a, the sample is combined with a convolution kernel 410 representing a vertical line. The resulting patch of pixels 420 comprises the three vertical lines which are present in the sample. Similarly, in FIG. 4b, the handwriting sample 450 is combined with a convolutional kernel 460 representing a diagonal line going up and to the right. This results in a patch of pixels 460 which contains the two diagonal lines of the input character. As FIGS. 4a and 4b show, the two result patches show different information for the character, while preserving pixel adjacency. This can result in more efficient character recognition).
generating a second loss function based on a second output associated with the second input (column 6, line 30, in order to understand the case of a convolutional neural network, it is helpful to compare to the relatively-simple case of a fully-connected network where N, the number of layers is equal to 2. In this case, during a forward pass, where each layer is calculated from the previous one, we compute l.sup.v+1 by: l.sup.v+1=.sigma.(.phi..sup.v)=.sigma.(K.sup.vl.sup.v+b.sup.v) (4.1) for 0.ltoreq.v&lt;N. Here, .sigma. is a "squashing function" representing an element-wise application of tan h, K.sup.v is an n.sup.v+1.times.n.sup.v matrix representing connection strengths between the two layers, and b.sup.v is a vector of length n.sup.v+1 representing the bias);

training the neural network to minimize a sum of the first loss function and the second loss function (column 9, line 60, the data-parallel nature of pixel shader units makes it hard to perform summations. Since the result at each pixel cannot depend on the result at other pixels, a way to regain efficiencies is to compute a summation in several passes, where each pass sums some fixed number of horizontally adjacent patches. If A is an n.times.m array of p.times.p patches, then a function S.sub.r can be defined as:.function..times.'.times.'&lt;.times.'.times. ##EQU00017## An example of this multi-pass summation is illustrated in FIG. 7. In FIG. 7, a matrix 710 needs to be summed across each row. One row is illustrated as an example row. In the example of FIG. 7, the matrix is summed by groups of four, so that the first four elements are summed into the first element of that row in a transitional sum matrix 720. Likewise, the second four elements are summed to the second element of the row, and so on. Then, in a second pass, the four elements of the transitional sum matrix are summed to produce a final sum for the row).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 2-9, 11-18 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Puri USPN 7,747,070 in view of Seung et al USPN 9,799,098.
Regarding claims 2 and 11
Puri teaches training the kernel to be symmetric further but does not comprises forcing weight tying within the kernel however, Seung et al teaches (column 13, line 32, FIG. 7 illustrates a comparison between the weights of the CN2 and the CRF. Each box displays one layer of weights. Boxes with oblique-line patterns (e.g., 720, 722, 724, 726, 728, 730) denote strong positive weights, and dark colored boxes denote strong negative weights. The CN2 filter has a positive center and a negative surround. Results show that the negative surround is important for good image restoration. Both the CRF and CN2.sup.+ filter are constrained to be nonnegative, which yield inferior performance). Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate weight with kernel. The modification would have been obvious because one of ordinary skill in the art would have been motivated to combine teaching into neural network where different variant can learn and minimize the loss function. 

Regarding claims 3 and 12
Puri teaches 
 forcing weight tying further comprises averaging gradients of coordinates belonging to same values (column 5, line 1, thus, at action 530, the GPU enters a loop for each sample input. At action 540, the neural network is propagated on a forward pass to determine an output for a given sample. Next, at action 550, an error function is used to compute how far off of the expected output the neural network was. Next, at action 560, a gradient function is determined for the error function. By computing a gradient function, which comprises partial derivatives for each entry of each neural network matrix with respect to the error, the GPU can compute how much to adjust each matrix according to the gradient descent method. Then, at action 570, the matrices, including the convolutional kernels and the biases, are modified according to the gradient function. The actions 550, 560, and 570 are collectively known as a "backward pass" because they take the output error information and use it to determine needed modifications for each neural network matrix. Examples of one implementation of the equations used in these forward and backward passes are described in Section 4. Finally, at action 580, the forward-pass/backward-pass steps of process 500 are repeated as long as there are more sample inputs. At the end of the sample inputs, the network has been trained over those inputs and the process ends.

Regarding claims 4 and 13
Puri teaches 
 training the kernel to be symmetric further comprises adding regularization on weights in a loss function associated with outputs (column 1, line 64, However, these connections can be computationally complex, as FIG. 2 illustrates. FIG. 2 is a block diagram of a method of computing neuron values based on the values found in the previous layer. It should be noted that while FIG. 2 illustrates various matrices, the indexes (or sizes) of the matrices will vary from layer to layer and network to network and various implementations may orient the matrices or map the matrices to computer memory differently. As FIG. 2 illustrates, one method of implementing neural networks is to treat each level as a matrix of neuron values, as is illustrated by layer 0 matrix 210. Connection strengths can then be implemented as a transformation matrix 220, which is multiplied by the layer 0 matrix 210. This multiplication allows each value in the previous layer to be scaled according to connection strengths, and then summed, all through normal matrix multiplication. After the multiplication is performed, a bias matrix 230 is then added to the product matrix to account for the threshold of each neuron in the next level. Then a sigmoid function (in one implementation, tan h( )) is applied to each resultant value to determine if the threshold was met, and the resulting values are placed in the matrix for the next layer. This can also be called a "squashing function." Thus, as FIG. 2 shows, the connections between each layer, and thus an entire network, can be represented as a series of matrices. Finding proper values for these matrices, then, is the problem of training a neural network). Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate weight with kernel. The modification would have been obvious because one of ordinary skill in the art would have been motivated to combine teaching into neural network where train the kernel and minimize the loss function which is associated with weight.

Regarding claims 5-6 and 14-15 
Seung et al teaches 
training the kernel to be symmetric further comprises providing weight sharing across multiple filters (column 5, line 8, the convolutional network 120 may be constructed from multiple layers of filters 140. Typically, the architecture may include of an input layer 150 that encodes one or more input images, an output layer 170 that encodes one or more output images, and one or more intermediate layers 160 with hidden images that contain the internal computations and representations of an applied method. In one embodiment, each layer receives input from only the previous layer. The convolutional network 120 may alternate between linear filtering and nonlinear transformations to produce a transformed version of the input). Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate filters. The modification would have been obvious because one of ordinary skill in the art would have been motivated to combine teaching into neural network where multiple layers of filters to produce at least one image in succeeding layers with at least the same resolution as an original image.



Regarding claims 7 and 16
Puri teaches 
 wherein weights are shared across the multiple filters based on an average gradient matrix of gradient matrices of the multiple filters (see fig 5 and (column 5, line 1, thus, at action 530, the GPU enters a loop for each sample input. At action 540, the neural network is propagated on a forward pass to determine an output for a given sample. Next, at action 550, an error function is used to compute how far off of the expected output the neural network was. Next, at action 560, a gradient function is determined for the error function. By computing a gradient function, which comprises partial derivatives for each entry of each neural network matrix with respect to the error, the GPU can compute how much to adjust each matrix according to the gradient descent method. Then, at action 570, the matrices, including the convolutional kernels and the biases, are modified according to the gradient function. The actions 550, 560, and 570 are collectively known as a "backward pass" because they take the output error information and use it to determine needed modifications for each neural network matrix. Examples of one implementation of the equations used in these forward and backward passes are described in Section 4. Finally, at action 580, the forward-pass/backward-pass steps of process 500 are repeated as long as there are more sample inputs. At the end of the sample inputs, the network has been trained over those inputs and the process ends).



Regarding claims 8 and 17
Puri teaches 
 applying the trained symmetric kernel to a block-circulant weight matrix (column 1, line 65, however, these connections can be computationally complex, as FIG. 2 illustrates. FIG. 2 is a block diagram of a method of computing neuron values based on the values found in the previous layer. It should be noted that while FIG. 2 illustrates various matrices, the indexes (or sizes) of the matrices will vary from layer to layer and network to network and various implementations may orient the matrices or map the matrices to computer memory differently. As FIG. 2 illustrates, one method of implementing neural networks is to treat each level as a matrix of neuron values, as is illustrated by layer 0 matrix 210. Connection strengths can then be implemented as a transformation matrix 220, which is multiplied by the layer 0 matrix 210. This multiplication allows each value in the previous layer to be scaled according to connection strengths, and then summed, all through normal matrix multiplication. After the multiplication is performed, a bias matrix 230 is then added to the product matrix to account for the threshold of each neuron in the next level. Then a sigmoid function (in one implementation, tan h( )) is applied to each resultant value to determine if the threshold was met, and the resulting values are placed in the matrix for the next layer. This can also be called a "squashing function." Thus, as FIG. 2 shows, the connections between each layer, and thus an entire network, can be represented as a series of matrices. Finding proper values for these matrices, then, is the problem of training a neural network) The feature of providing that limitation trained symmetric kernel to a block-circulant weight matrix from the claim would be obvious for the reasons set forth in the rejection of claim 1. 

Regarding claims 9 and 18
Puri teaches 
trained symmetric kernel to a block-circulant weight matrix (column 4, line 39, FIG. 5 shows an example process 500 for training a convolutional neural network. In various implementations of the process 500, actions may be removed, combined, or broken up into sub-actions. The process begins at action 510, where the process received a neural network to train, as well as training samples. In a typical implementation, the network may be pre-set with sample convolutional kernels and biases, but each needs to be refined to give consistent and efficient results. Training samples typically involve many (on the order of tens of thousands) samples of handwritten characters, along with an indication of the correct character each should be interpreted as. Next at action 520, neural network data, such as the samples and neural network matrices, are prepared to be operated on as graphics data by the pixel shader units 330 of the GPU 300. An example process of this action is described in greater detail below with respect to FIG. 6. In one implementation, both of the actions 510 and 520 are performed by a CPU associated with the GPU 815. In another, all preparation is performed by the GPU 815).

Regarding claim 20
Seung et al teaches 
modifying the second loss function to have an extra term corresponding to added loss from the second image (column 5, line 8, the convolutional network 120 may be constructed from multiple layers of filters 140. Typically, the architecture may include of an input layer 150 that encodes one or more input images, an output layer 170 that encodes one or more output images, and one or more intermediate layers 160 with hidden images that contain the internal computations and representations of an applied method. In one embodiment, each layer receives input from only the previous layer. The convolutional network 120 may alternate between linear filtering and nonlinear transformations to produce a transformed version of the input).

Relevant Prior Art
US 10296829 B2 Mostafa et al teaches Convolution Processing Apparatus And Method
US 10599978 B2  Sekiyama teaches  Weighted Cascading Convolutional Neural Networks
US 1008349 Milanfar et al teaches Methods And Apparatus To Reduce Compression Artifacts In Images

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Anil Khatri whose telephone number is (571)272-3725. The examiner can normally be reached M-F 8:30-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, W Zhen can be reached on 571-272-3708. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ANIL KHATRI/Primary Examiner, Art Unit 2191