DETAILED ACTION

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1 – 4, 6, 14 – 17, 19 – 23, and 25 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Garimella (US 2015/0170020).
As to claim 1, Garimella teaches an electronic processing system (paragraph [0050]...spoken language processing system), comprising: 
a processor (paragraph [0030]...one or more processors); 
memory communicatively coupled to the processor (paragraph [0030]... executable program instructions can be loaded into memory ; paragraph [0050]...data store 506); and 
a decision network (paragraph [0053]...ASR module 502 to process the utterance and transcribe what the user said) communicatively coupled to the processor and the memory, the decision network including logic to: 
apply a low rank factorization (paragraph [0029]...low-rank matrix 214) to a weight matrix (paragraph [0028]...third hidden layer 216 ; Examiner’s Note: the third hidden layer is a multiplied version of the NN weight matrix 204. Therefore, if the weights of the NN weight matrix change then the third hidden layer will also change) of the decision network to determine a first weight matrix approximation (paragraph [0040]... intermediate linear hidden layer 21 ; paragraph [0017]...approximation introduces an intermediate hidden layer, linear in nature, where the dynamic range of the output is no longer restricted to be between (0, 1) and may instead be higher or lower), 
reshape (paragraph [0029]...the intermediate hidden layer 218 may then be multiplied by a second low-rank matrix 220 to produce the output layer. Each of the low-rank matrices may be substantially smaller than the matrix that would normally be used to produce the output layer from the last hidden layer) the first weight matrix approximation into a second weight matrix approximation (paragraph [0029]...output layer 116), and 
compress (paragraph [0029]...a first low-rank matrix of size 180.times.1000 and a second low-rank matrix of size 3000.times.180 may be used, thereby substantially reducing the number of multiplications to be computed (e.g., 720,000 vs. 3,000,000). The process ends at block 222) the decision network based on the second weight matrix approximation.

As to claim 2, Garimella teaches the system, wherein the logic (paragraph [0056]...logical blocks, modules, routines, and algorithm steps) is further to: reshape (paragraph [0029]...the intermediate hidden layer 218 may then be multiplied by a second low-rank matrix 220 to produce the output layer. Each of the low-rank matrices may be substantially smaller than the matrix that would normally be used to produce the output layer from the last hidden layer) the first weight matrix approximation (paragraph [0040]... intermediate linear hidden layer 21 ; paragraph [0017]...approximation introduces an intermediate hidden layer, linear in nature, where the dynamic range of the output is no longer restricted to be between (0, 1) and may instead be higher or lower) into the second weight matrix approximation (paragraph [0029]...output layer 116) to add nonlinearity (paragraph [0044]...one may estimate the average variance per node at the output of this intermediate layer 218 as shown in block 406. This estimation assumes a constant variance for any non-linear node) to the second weight matrix approximation.

paragraph [0056]...logical blocks, modules, routines, and algorithm steps) is further to: scatter data (paragraph [0029]...the intermediate hidden layer 218 may then be multiplied by a second low-rank matrix 220 to produce the output layer) from the first weight matrix approximation (paragraph [0040]... intermediate linear hidden layer 21 ; paragraph [0017]...approximation introduces an intermediate hidden layer, linear in nature, where the dynamic range of the output is no longer restricted to be between (0, 1) and may instead be higher or lower) into the second weight matrix approximation (paragraph [0029]...output layer 116).

As to claim 4, Garimella teaches the system, wherein the logic (paragraph [0056]...logical blocks, modules, routines, and algorithm steps) is further to: alter the layout of data (paragraph [0029]...the intermediate hidden layer 218 may then be multiplied by a second low-rank matrix 220 to produce the output layer) from the first weight matrix approximation (paragraph [0040]... intermediate linear hidden layer 21 ; paragraph [0017]...approximation introduces an intermediate hidden layer, linear in nature, where the dynamic range of the output is no longer restricted to be between (0, 1) and may instead be higher or lower) into the second weight matrix approximation (paragraph [0029]...output layer 116).

As to claim 6, Garimella teaches the system, wherein the decision network  (paragraph [0053]...ASR module 502 to process the utterance and transcribe what the user said) comprises one or more of a convolutional neural network, a deep neural network (paragraph [0002]...deep neural networks), and a recurrent neural network.



Claim 15 has similar limitations as claim 2. Therefore, the claim is rejected for the same reasons as above. 

Claim 16 has similar limitations as claim 3. Therefore, the claim is rejected for the same reasons as above. 

Claim 17 has similar limitations as claim 4. Therefore, the claim is rejected for the same reasons as above. 

Claim 19 has similar limitations as claim 6. Therefore, the claim is rejected for the same reasons as above. 

Claim 20 has similar limitations as claim 1. Therefore, the claim is rejected for the same reasons as above. 

Claim 21 has similar limitations as claim 2. Therefore, the claim is rejected for the same reasons as above. 

Claim 22 has similar limitations as claim 3. Therefore, the claim is rejected for the same reasons as above. 

Claim 23 has similar limitations as claim 4. Therefore, the claim is rejected for the same reasons as above. 

Claim 25 has similar limitations as claim 6. Therefore, the claim is rejected for the same reasons as above. 


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:

2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 5, 18, and 24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Garimella (US 2015/0170020) in view of DEISHER et al (US 2018/0121796).
As to claim 5, Garimella teaches the system, with a first weight matrix approximation (paragraph [0040]... intermediate linear hidden layer 21 ; paragraph [0017]...approximation introduces an intermediate hidden layer, linear in nature, where the dynamic range of the output is no longer restricted to be between (0, 1) and may instead be higher or lower) and a second weight matrix approximation (paragraph [0029]...output layer 116).
Garimella fails to explicitly show/teach read data from the first weight matrix approximation in a row-major order; and write data into the second weight matrix approximation in a column-major order.
However, DEISHER et al teaches read data (paragraph [0049]...read/write operations)  from the first weight matrix approximation (paragraph [0034]...placing an activation function unit in a pipeline after the parallel logic structure produces a relatively high processing rate since the parallel logic may be performing computations by multiplying an input vector by values in a weight matrix for one output of a layer while the activation function is simultaneously computing a final output using a weighted input sum output from a different output (or node) of the same layer (or different layer)) in a row-major order (paragraph [0069]...weight matrix may have one row for each input to a layer and column for each output (or node) of the layer that is to be obtained. This assumes ro major organization of memory); and write data (paragraph [0049]...read/write operations) into the second weight matrix approximation (paragraph [0034]...placing an activation function unit in a pipeline after the parallel logic structure produces a relatively high processing rate since the parallel logic may be performing computations by multiplying an input vector by values in a weight matrix for one output of a layer while the activation function is simultaneously computing a final output using a weighted input sum output from a different output (or node) of the same layer (or different layer)) in a column-major order (paragraph [0069]...it is possible to use the transverse with a column major organization instead).
It would have been an obvious matter of design choice for DEISHER et al’s read data from the first weight matrix approximation in a row-major order; and write data into the second weight matrix approximation in a column-major order, since applicant has not disclosed that the read data from the first weight matrix approximation in a row-major order; and write data into the second weight matrix approximation in a column-major order solves any stated problems or is for any particular purpose and it appears that the invention would perform equally well with any one of row major order or column major order for the purpose of processing large computational loads. 

Claim 18 has similar limitations as claim 5. Therefore, the claim is rejected for the same reasons as above. 

Claim 24 has similar limitations as claim 5. Therefore, the claim is rejected for the same reasons as above. 


Claims 7 – 10, 12, and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Garimella (US 2015/0170020) in view of Rakshit et al (US 2018/0053550).
As to claim 7, Garimella teaches a decision network (paragraph [0053]...ASR module 502 to process the utterance and transcribe what the user said) communicatively coupled to the processor and the memory, the decision network including logic to: apply a low rank factorization (paragraph [0029]...low-rank matrix 214) to a weight matrix (paragraph [0028]...third hidden layer 216 ; Examiner’s Note: the third hidden layer is a multiplied version of the NN weight matrix 204. Therefore, if the weights of the NN weight matrix change then the third hidden layer will also change) of the decision network to determine a first weight matrix approximation (paragraph [0040]... intermediate linear hidden layer 21 ; paragraph [0017]...approximation introduces an intermediate hidden layer, linear in nature, where the dynamic range of the output is no longer restricted to be between (0, 1) and may instead be higher or lower), reshape (paragraph [0029]...the intermediate hidden layer 218 may then be multiplied by a second low-rank matrix 220 to produce the output layer. Each of the low-rank matrices may be substantially smaller than the matrix that would normally be used to produce the output layer from the last hidden layer) the first weight matrix approximation into a second weight matrix approximation (paragraph [0029]...output layer 116), and compress (paragraph [0029]...a first low-rank matrix of size 180.times.1000 and a second low-rank matrix of size 3000.times.180 may be used, thereby substantially reducing the number of multiplications to be computed (e.g., 720,000 vs. 3,000,000). The process ends at block 222) the decision network based on the second weight matrix approximation.
Garimella fails to explicitly show/teaches a semiconductor package apparatus, comprising: one or more substrates; and logic coupled to the one or more substrates, wherein the logic is at least partly implemented in one or more of configurable logic and fixed-functionality hardware logic, the logic coupled to the one or more substrates.
However, Rakshit et al teaches a semiconductor package apparatus (paragraph [0037]...memory cell 200...a memory cell 200 may be (or may include) a field effect transistor (hereinafter `FET`) with a confined channel...the FET is a metal-oxide semiconductor field effect transistor (MOSFET)), comprising: one or more substrates (paragraph [0037]...bulk substrate 206) ; and logic coupled to the one or more substrates, wherein the logic is at least partly implemented in one or more of configurable logic and fixed-functionality hardware logic (paragraph [0037]...memory cell 200 may be a programmable resistive device implementing an analog or multilevel memory. The analog or multilevel value stored in the memory cell may be used as a synaptic weight in a neural network), the logic coupled to the one or more substrates (paragraph [0037]...the analog or multilevel value stored in the memory cell may be used as a synaptic weight in a neural network).


As to claim 8, Garimella teaches the apparatus, wherein the logic (paragraph [0056]...logical blocks, modules, routines, and algorithm steps) is further to: reshape (paragraph [0029]...the intermediate hidden layer 218 may then be multiplied by a second low-rank matrix 220 to produce the output layer. Each of the low-rank matrices may be substantially smaller than the matrix that would normally be used to produce the output layer from the last hidden layer) the first weight matrix approximation (paragraph [0040]... intermediate linear hidden layer 21 ; paragraph [0017]...approximation introduces an intermediate hidden layer, linear in nature, where the dynamic range of the output is no longer restricted to be between (0, 1) and may instead be higher or lower) into the second weight matrix approximation (paragraph [0029]...output layer 116) to add nonlinearity (paragraph [0044]...one may estimate the average variance per node at the output of this intermediate layer 218 as shown in block 406. This estimation assumes a constant variance for any non-linear node) to the second weight matrix approximation.

As to claim 9, Garimella teaches the apparatus, wherein the logic (paragraph [0056]...logical blocks, modules, routines, and algorithm steps) is further to: scatter data (paragraph [0029]...the intermediate hidden layer 218 may then be multiplied by a second low-rank matrix 220 to produce the output layer) from the first weight matrix approximation (paragraph [0040]... intermediate linear hidden layer 21 ; paragraph [0017]...approximation introduces an intermediate hidden layer, linear in nature, where the dynamic range of the output is no longer restricted to be between (0, 1) and may instead be higher or lower) into the second weight matrix approximation (paragraph [0029]...output layer 116).

As to claim 10, Garimella teaches the apparatus, wherein the logic (paragraph [0056]...logical blocks, modules, routines, and algorithm steps) is further to: alter the layout of data (paragraph [0029]...the intermediate hidden layer 218 may then be multiplied by a second low-rank matrix 220 to produce the output layer) from the first weight matrix approximation (paragraph [0040]... intermediate linear hidden layer 21 ; paragraph [0017]...approximation introduces an intermediate hidden layer, linear in nature, where the dynamic range of the output is no longer restricted to be between (0, 1) and may instead be higher or lower) into the second weight matrix approximation (paragraph [0029]...output layer 116).

As to claim 12, Garimella teaches the apparatus, wherein the decision network  (paragraph [0053]...ASR module 502 to process the utterance and transcribe what the user said) comprises one or more of a convolutional neural network, a deep neural network (paragraph [0002]...deep neural networks), and a recurrent neural network.




paragraph [0037] teaches... [0037] FIG. 2A is a diagram of an embodiment of a memory cell 200 according to embodiments of the present disclosure. The memory cell 200 may be a programmable resistive device implementing an analog or multilevel memory. The analog or multilevel value stored in the memory cell may be used as a synaptic weight in a neural network. A memory cell 200 may be (or may include) a field effect transistor (hereinafter `FET`) with a confined channel. For example, as shown in FIG. 2A, a memory cell may be a silicon on insulator FET, such as a fully depleted silicon on insulator FET. The insulator layer 202 isolates the channel 204 from the bulk substrate 206. In some embodiments, the FET is a metal-oxide semiconductor field effect transistor (MOSFET).). It would have been obvious for the logic coupled to the one or more substrates includes transistor channel regions that are positioned within the one or more substrates, for the same reasons as above. 

Claim 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Garimella (US 2015/0170020) in view of Rakshit et al (US 2018/0053550) and in further view of DEISHER et al (US 2018/0121796).
As to claim 12, Garimella teaches the apparatus, with a first weight matrix approximation (paragraph [0040]... intermediate linear hidden layer 21 ; paragraph [0017]...approximation introduces an intermediate hidden layer, linear in nature, where the dynamic range of the output is no longer restricted to be between (0, 1) and may instead be higher or lower) and a second weight matrix approximation (paragraph [0029]...output layer 116).
Garimella and Rakshit et al both fails to explicitly show/teach read data from the first weight matrix approximation in a row-major order; and write data into the second weight matrix approximation in a column-major order.
paragraph [0049]...read/write operations)  from the first weight matrix approximation (paragraph [0034]...placing an activation function unit in a pipeline after the parallel logic structure produces a relatively high processing rate since the parallel logic may be performing computations by multiplying an input vector by values in a weight matrix for one output of a layer while the activation function is simultaneously computing a final output using a weighted input sum output from a different output (or node) of the same layer (or different layer)) in a row-major order (paragraph [0069]...weight matrix may have one row for each input to a layer and column for each output (or node) of the layer that is to be obtained. This assumes ro major organization of memory); and write data (paragraph [0049]...read/write operations) into the second weight matrix approximation (paragraph [0034]...placing an activation function unit in a pipeline after the parallel logic structure produces a relatively high processing rate since the parallel logic may be performing computations by multiplying an input vector by values in a weight matrix for one output of a layer while the activation function is simultaneously computing a final output using a weighted input sum output from a different output (or node) of the same layer (or different layer)) in a column-major order (paragraph [0069]...it is possible to use the transverse with a column major organization instead).
It would have been an obvious matter of design choice for DEISHER et al’s read data from the first weight matrix approximation in a row-major order; and write data into the second weight matrix approximation in a column-major order, since applicant has not disclosed that the read data from the first weight matrix approximation in a row-major order; and write data into the second weight matrix approximation in a column-major order solves any stated problems or is for any particular purpose and it appears that the invention would perform equally well with any one of row major order or column major order for the purpose of processing large computational loads. 


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRANDON S COLE whose telephone number is (571)270-5075.  The examiner can normally be reached on Mon - Fri 7:30pm - 5pm EST (Alternate Friday's Off).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/BRANDON S COLE/Primary Examiner, Art Unit 2122