DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This Office Action is in response to applicant’s communication filed 19 March 2021, in response to the Office Action mailed 24 December 2020.  The applicant’s remarks and any amendments to the claims or specification have been considered, with the results that follow.

The objection to claim 14 is withdrawn due to the amendment(s) filed.

The rejection of claim 9 under 35 U.S.C. 112, second paragraph, has been withdrawn due to the amendment filed.


Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 
DEVICE AND METHOD FOR IMPROVING PROCESSING SPEED OF NEURAL NETWORK BY PARAMETER MATRIX DIMENSION REDUCTION (or similar).


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have 

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 2-15, and 17-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Georgescu (US 2016/0174902) in view of Katta (US 2008/0312942).

As per claim 2, Georgescu teaches a device for improving processing speed of a neural network, comprising: a processor [a system for training deep neural networks for object detection, including a processor (abstract; fig. 28; etc.)] configured to determine, according to a predetermined processing speed improvement target, a dimension reduction amount of each of at least one parameter matrix in the neural network obtained through training [the system reduces the size of the weight matrices according to a determined degree of sparsity, in order to improve the speed (the target is to improve the speed) and training of the neural network while maintaining the classification performance (paras. 086-94, 0124, etc.)], preprocess each parameter matrix based on the columns dimension reduction amount thereof [the system is trained in a number of iterations, where a certain degree of sparsity is enforced by removing the smallest weights in the weight matrix based on a threshold, and then training the network on the remaining active weights, and repeating the process (paras. 0089-100; figs. 15-16; etc.)], by zeroing, according to the dimension reduction amount of the parameter matrix, the parameters in a column [parameters in the matrix chosen for reduction are zeroed (para. 0094, etc.)], and retrain the neural network based on a result of the preprocessing to obtain at least one dimension reduced parameter matrix to ensure performance of the neural network meets a predetermined requirement [the system is trained in a number of iterations, where a certain degree of sparsity is enforced by removing the smallest weights in the weight matrix based on a threshold, and then training the network on the remaining active weights, and repeating the process (paras. 0089-100; figs. 15-16; etc.) according to a determined degree of sparsity, in order to improve the speed  and training of the neural network while maintaining the classification performance (paras. 086-94, 0124, etc.)].
While Georgescu teaches utilizing a specific dimension reduction amount (see above) it does not explicitly teach wherein the dimension reduction amount represents a columns dimension reduction amount of each parameter matrix, and the processor is further configured to perform the pre-processing by performing operations for each parameter matrix comprising: calculating a column score of each of the columns of the parameter matrix according to the values of parameters in each column of the parameter matrix; and zeroing, according to the column dimension reduction amount of the parameter matrix, the parameters in a column where the column score meets a predetermined condition.
[a pre-specified parameter is chosen as a threshold for reducing the matrix by a number of columns (para. 0042, etc.); using the sparsity degree of Georgescu, above, for the total amount of reduction], and the processor is further configured to perform the pre-processing by performing operations for each parameter matrix comprising: calculating a column score of each of the columns of the parameter matrix according to the values of parameters in each column of the parameter matrix [a pre-specified parameter is set as a threshold, and the columns of the matrix are ranked according to their total sum relevance scores, to remove columns with a relevance score below the threshold (paras. 0037-42, etc.)]; and zeroing, according to the column dimension reduction amount of the parameter matrix, the parameters in a column where the column score meets a predetermined condition [a pre-specified parameter is set as a threshold, and the columns of the matrix are ranked according to their total sum relevance scores, to remove columns with a relevance score below the threshold (paras. 0037-42, etc.); where parameters in the matrix chosen for reduction are zeroed (Georgescu: para. 0094, etc.)].
Georgescu and Katta are analogous art, as they are within the same field of endeavor, namely machine learning.
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to utilize reduction of the matrix by columns/column scores, as taught by Katta, for the reduction of the parameter matrix for the system of Georgescu.
[reducing the matrix by columns using column relevancy scores reduces the dimensionality of the problem significantly, and can be controlled by the single parameter (para. 0042, etc.)].

As per claim 3, Georgescu/Katta teaches wherein the processor is further configured to calculate, for each parameter matrix, a sum of absolute values of the parameters in each column of the parameter matrix as the column score of the column [a pre-specified parameter is set as a threshold, and the columns of the matrix are ranked according to their total sum relevance scores, to remove columns with a relevance score below the threshold (Katta: paras. 0037-42, etc.)].

As per claim 4, Georgescu/Katta teaches wherein the processor is further configured to calculate, for each parameter matrix, the column score according to loss weights associated with parameters in each column of the parameter matrix [a pre-specified parameter is set as a threshold, and the columns of the matrix are ranked according to their total weighted sum relevance scores, to remove columns with a relevance score below the threshold (Katta: paras. 0037-42, etc.)].

As per claim 5, Georgescu/Katta teaches wherein the processor is further configured to: normalize all of the parameters and the loss weights in each parameter matrix [the parameters of the matrix may be normalized (Georgescu: paras. 0094-96, etc.); and column relevancy scores may be calculated by a weighted sum (Katta: paras. 0037-42, etc.)]; and calculate, for each parameter matrix, a sum of [the parameters of the matrix may be normalized (Georgescu: paras. 0094-96, etc.); and column relevancy scores may be calculated by a weighted sum (Katta: paras. 0037-42, etc.)].

As per claim 6, Georgescu/Katta teaches wherein the processor is further configured to perform the zeroing by: determining, for each parameter matrix, a threshold based on a determined columns dimension reduction amount and calculated column scores of the columns [a desired sparsity degree may be used to determine how many weights to remove (Georgescu: paras. 0089-100; figs. 15-16; etc.) to determine predetermined threshold to compare the relevancy scores of the matrix columns against (Katta: paras. 0037-42, etc.)]; and zeroing the parameters in the column, where the column score is less than the threshold, of each parameter matrix [a pre-specified parameter is set as a threshold, and the columns of the matrix are ranked according to their total sum relevance scores, to remove columns with a relevance score below the threshold (Katta: paras. 0037-42, etc.); where parameters in the columns chosen for reduction are zeroed (Georgescu: para. 0094, etc.)].

As per claim 7, Georgescu/Katta teaches wherein the processor is further configured to perform the zeroing by: ranking the column scores of the columns of each parameter matrix based on magnitudes of the column scores [a pre-specified parameter is set as a threshold, and the columns of the matrix are ranked according to their total sum relevance scores, to remove columns with a relevance score below the threshold (Katta: paras. 0037-42, etc.); where parameters in the columns chosen for reduction are zeroed (Georgescu: para. 0094, etc.)]; and zeroing, based on a determined column dimension reduction amount, the parameters in a predetermined number of columns, where the column scores are ranked one of high and low, of each parameter matrix [a desired sparsity degree may be used to determine how many weights to remove (Georgescu: paras. 0089-100; figs. 15-16; etc.) to determine predetermined threshold to compare the relevancy scores of the matrix columns against (Katta: paras. 0037-42, etc.); where parameters in the columns chosen for reduction are zeroed (Georgescu: para. 0094, etc.)].

As per claim 8, Georgescu/Katta teaches wherein the processor is further configured to retrain, according to parameter matrices with corresponding columns being zeroed, the neural network to obtain one or more column dimension reduced parameter matrices [the network is retrained after the reduction (Georgescu: figs. 15-16, claim 4, etc.)].

As per claim 9, Georgescu/Katta teaches wherein the processor is further configured to determine a first columns dimension reduction amount of a first parameter matrix closer to an input layer than a second parameter matrix, where the first columns dimension reduction amount is smaller than a second column dimension reduction amount of the second parameter matrix [a desired sparsity degree may be used to determine how many weights to remove to improve processing for each layer, including iteratively reducing a layer and retraining, then adding another copy of the layer and doing the same (Georgescu: paras. 0045, 0089-100; figs. 15-16; claims 1-5, etc.) to determine predetermined threshold to compare the relevancy scores (sums) of the matrix columns against (Katta: paras. 0037-42, etc.); where the layer further from the input will thus be reduced additionally compared to the earlier reduced/trained layer closer to input], and calculate a sum of column dimension reduction amounts of all the parameter matrices meet the predetermined processing speed improvement target [the system reduces the size of the weight matrices according to a determined degree of sparsity, in order to improve the speed  and training of the neural network while maintaining the classification performance (Georgescu: paras. 086-94, 0124, etc.)].

As per claim 10, Georgescu/Katta teaches wherein the processor is further configured to: zero, according to the zeroed column of each parameter matrix, elements in a corresponding row of an input matrix corresponding to the parameter matrix; and retrain the neural network according to parameter matrices with corresponding columns being zeroes and at least one input matrix with corresponding rows being zeroed to obtain the at least one dimension reduced parameter matrix [a desired sparsity degree may be used to determine how many weights to remove (Georgescu: paras. 0089-100; figs. 15-16; etc.) to determine predetermined threshold to compare the relevancy scores of the matrix columns against (Katta: paras. 0037-42, etc.); where parameters in the columns chosen for reduction are zeroed (Georgescu: para. 0094, etc.) and the network is retrained after the reduction (Georgescu: figs. 15-16, claim 4, etc.)].

As per claim 11, Georgescu/Katta teaches wherein the processor is further configured to perform: determining, according to another predetermined processing speed improvement target, a determined dimension reduction amount of each of the at least one dimension reduced parameter matrix obtained through retraining; re-preprocessing each parameter matrix based on the determined dimension reduction amount of the parameter matrix; and retraining, based on a result of the re-preprocessing, the neural network to obtain at least one parameter matrix with dimensions being reduced again to ensure the performance of the neural network meets the predetermined requirement, wherein the determining, the re-preprocessing, and the retraining are performed repeatedly until at least one dimension reduced parameter matrix meeting a final processing speed improvement target is obtained [the system is trained in a number of iterations, where a certain degree of sparsity is enforced by removing the smallest weights in the weight matrix based on a threshold, and then training the network on the remaining active weights, and repeating the process (Georgescu: paras. 0089-100; figs. 15-16; etc.)].

As per claim 12, Georgescu/Katta teaches wherein the predetermined processing speed improvement target is determined where an effect on the performance of the neural network is within a tolerance range [optimizations focus on either minimizing the two-norm (least squares) error under identically distributed noise or the Kullback-Leibler divergence between the low rank decomposition and the target tensor (Georgescu: para. 0087, etc.)].

As per claim 13, Georgescu/Katta teaches wherein the neural network comprises a convolutional neural network (CNN) [the present invention is not limited to this particular type of deep neural network and other types of deep neural networks, such as a convolutional neural network (CNN), stacked RBM, or a sparse AE, can also be used to train a discriminative classifier; including applying the process to convolutional layers as well as fully connected filters (Georgescu: paras. 0046, 0050, 0084, etc.)].

As per claim 14, Georgescu/Katta teaches wherein in the case that the neural network is a convolutional neural network (CNN), the at least one parameter matrix represents parameter matrices of one or more convolution layers and/or a fully connected layer [the present invention is not limited to this particular type of deep neural network and other types of deep neural networks, such as a convolutional neural network (CNN), stacked RBM, or a sparse AE, can also be used to train a discriminative classifier; including applying the process to convolutional layers as well as fully connected filters (Georgescu: paras. 0046, 0050, 0084, etc.)].

As per claim 15, see the rejection of claim 2, above.

As per claim 17, see the rejection of claim 3, above.

As per claim 18, see the rejection of claim 4, above.

As per claim 19, see the rejection of claim 5, above.


Claim 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Georgescu (US 2016/0174902), in view of Katta (US 2008/0312942), and further in view of Ito (US 2011/0091113).

As per claim 20, Georgescu/Katta teaches a device for performing an inference process in a neural network, the device comprising a processor configured to: convert a current parameter matrix into a dimension reduced parameter matrix by performing the method according to claim 15 [see above] and multiply the dimension reduced parameter matrix by the input matrix to obtain an output matrix [multiplications are performed on the input with the parameters in a sliding window (Georgescu: paras. 0123-125, etc.)].
Georgescu does not explicitly teach the processor also configured to: convert, according to the dimension reduced parameter matrix, an input matrix corresponding to a current parameter matrix into a dimension reduced input matrix; and multiply the dimension reduced parameter matrix by the dimension reduced input matrix to obtain an output matrix.
[a processor (para. 0167, etc.) converting the input matrix to a lower dimension based upon the dimension of the subspace data (paras. 0053-54, etc.)]; and multiply the dimension reduced parameter matrix by the dimension reduced input matrix to obtain an output matrix [matrix multiplication of the input and projection matrix is performed to produce the output (paras. 0053-54, etc.)].
Georgescu/Katta and Ito are analogous art, as they are within the same field of endeavor, namely machine learning.
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to utilize dimensionality reduction on the input matrix, as taught by Ito, in the dimensionality reduction of the system of Georgescu.
Ito provides motivation as [the dimensionality reduction not only enables reducing the amount of data and the amount of calculation, but also improves the identification ratio (para. 0054, etc.)].


Response to Arguments
Applicant's arguments filed 19 March 2021 have been fully considered but they are not persuasive.

Applicant argues that Katta teaches away from using neural networks.
In re Fulton, 391 F.3d 1195, 1201, 73 USPQ2d 1141, 1146 (Fed. Cir. 2004).  Furthermore, the only possible drawback of neural networks discussed by Katta is a lack of confidence score, whereas Georgescu teaches providing a confidence score (see, e.g., Georgescu: paras. 0049-57, etc.).

In response to applicant's argument that the examiner's conclusion of obviousness is based upon improper hindsight reasoning, it must be recognized that any judgment on obviousness is in a sense necessarily a reconstruction based upon hindsight reasoning.  But so long as it takes into account only knowledge which was within the level of ordinary skill at the time the claimed invention was made, and does not include knowledge gleaned only from the applicant's disclosure, such a reconstruction is proper.  See In re McLaughlin, 443 F.2d 1392, 170 USPQ 209 (CCPA 1971).

Applicant also argues that the cited art does no teach zeroing, according to the column dimension reduction amount of the parameter matrix, the parameters in a column where the column score meets a predetermined condition.
However, Katta teaches that a pre-specified parameter is set as a threshold, and the columns of the matrix are ranked according to their total sum relevance scores, to remove columns with a relevance score below the threshold (Katta: paras. 0037-42, 


Conclusion
The following is a summary of the treatment and status of all claims in the application as recommended by M.P.E.P. 707.07(i): claims 1 and 16 are cancelled; claims 2-15 and 17-20 are rejected.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Li (US 2018/0046919) and Yao (US 2018/0046903) – disclose systems including pruning of the weight/parameter matrices.
Gong et al. (Compressing Deep Convolutional Networks Using Vector Quantization, Dec 2014, pgs. 1-10) – discloses compression of parameter matrices for a CNN.

The examiner requests, in response to this Office action, that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line number(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.

When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the references cited or the objections made. He or she must also show how the amendments avoid such references or objections.  See 37 CFR 1.111(c).

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to GEORGE GIROUX whose telephone number is (571)272-9769.  The examiner can normally be reached on M-F 10am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/GEORGE GIROUX/Primary Examiner, Art Unit 2125