DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. Application No EP17158959.1 filed on 03/02/2017.

Response to Amendment
This office action is in response to the amendments submitted on 05/18/2022. It is a final action based on further search and consideration and based on the provided arguments. Wherein claims 4,5,6,8,11 and 20 are amended and claims 1-3,7 and 18-19 and 21-22 are canceled.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 4-6,8,11-12,15-17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over  Liu (US 20170351935 A1) (hereinafter Liu)  in view of Han et al. (Han, Song, Huizi Mao, and William J. Dally. "Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding." arXiv preprint arXiv:1510.00149 (2015)) (hereinafter Han) and further in view of Seo et al. (US 20190164538 A1) (hereinafter Seo). 
Regarding Claim 8, Liu teaches an apparatus comprising circuitry (Para [0087], line 8-11, “Such processors may be implemented as integrated circuits, with one or more processors in an integrated circuit component.”) that implements an artificial neural network training algorithm (Para[0066], line 6-9, “ In general, training an artificial-neural- network comprises applying a training algorithm, sometimes referred to as a “learning” algorithm, to an artificial-neural- network in view of a training set.) that uses weight tying (Para [0006], line 1-6, “Some embodiments are based on realization that when neural networks (i.e. an artificial neural network) are trained independently to generate a digital image, the generated digital images are not related. However, by forcing, e.g., during the joint training, a weight sharing (i.e. weight tying) constraint on the neural networks, the neural networks can be trained to generate a multimodal digital image.”).
	Liu is silent with regards to

wherein the circuitry is configured to compute a weight-tied weight matrix based on an index matrix and based on a value vector.
update the weight tying using a predefined number of iterations of a clustering algorithm
quantize the values of the 20 value vector after updating the weight tying.
	Han teaches wherein the circuitry is configured to compute a weight-tied weight matrix (Fig 3, Section 3, line 2-5, “We limit the number of effective weights we need to store by having multiple connections share the same weight, and then fine-tune those shared weights. Weight sharing is illustrated in Figure 3. Suppose we have a layer that has 4 input neurons and 4 output neurons, the weight is a 4 x 4 matrix.”)
 based on an index matrix (Fig 2, Section 3, line 6-8, “The weights are quantized to 4 bins (denoted with 4 colors), all the weights in the same bin share the same value, thus for each weight, we then need to store only a small index into a table of shared weights”) and based on a value vector (Section 6.3, page 10, 3rd paragraph, line 1-3, “In real time processing when batching is not allowed, the input activation is a single vector  (i.e. value vector) and the computation is matrix-vector multiplication”).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include computing weight matrix based on index matrix and value vector as taught by Han into the neural network of Liu since the technique of Han is applied on artificial neural network. Therefore, this technique of clustering would facilitate the weight allocation and minimize memory usage and eventually reduce overall processor capability and cost (Han, Section 3).
The combination of Liu and Han is silent with regards to 
update the weight tying using a predefined number of iterations of a clustering algorithm.
quantize the values of the 20 value vector after updating the weight tying.
Seo teaches update the weight tying using a predefined number of iterations of a clustering algorithm(Para [0037], “Throughout the entire process of DNN training for a DNN only the selected number of active weight blocks in each of the M rows is updated, while the inactive weight blocks remain at zero. “); quantize(Para [0034], “quantized weight sharing”) the values of the 20 value vector (Para [0035], “fully connected weight matrix”) after updating the weight tying(Para [0037], “Throughout the entire process of DNN training for a DNN only the selected number of active weight blocks in each of the M rows is updated, while the inactive weight blocks remain at zero. “)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include update of weight tying as taught by Seo into the neural network of Liu as modified by Han since the technique of Han is applied on artificial neural network for better calculation using weight tying process. Therefore, this technique will facilitate the reset of weight tying by quantized value vector and enhance calculation process using the updated vector (Seo Para [0035] – [0037])).
Regarding claim 4, the combination of Liu, Han, and Seo teaches the limitations of claim 8.
	Han further teaches wherein the predefined number of iterations of the clustering algorithm used to update the weight tying (Page 3, section 3, line 8-9, “During update, all the gradients are grouped by the color and summed together, multiplied by the learning rate and subtracted from the shared centroids from last iteration.” Based on equation 2 it is evident that the iteration can be done only one time as i=1)
	Han does not explicitly teach iteration is one (Emphasis mine).
However, Han teaches based on equation 2 at page 4 that i starts from 1 and so it would be obvious for a person with ordinary skill of art to have the iteration as one. Therefore, it is obvious for any ordinary skill in the art to use the iteration at least one time to make initial update in weight tying process and update the weight value accordingly, so that the system can be refreshed correctly with updated data and give correct results.
Regarding claim 5, the combination of Liu, Han, and Seo teaches the limitations of claim 8.
	Seo further teaches wherein the circuitry is configured to, in each iteration of the clustering algorithm, update a value vector according to 
    PNG
    media_image1.png
    67
    335
    media_image1.png
    Greyscale
where W (l) is a full-precision weight matrix for layer l of the neural network... and I) (l) is the index matrix, i and j denote rows and columns, respectively, of W (l) and I) (l), and k denotes a value index (Para [0054], equation 2).
Regarding claim 6, the combination of Liu, Han, and Seo teaches the limitations of claim 8.
	 Han further teaches wherein 2Application No. 15/903,290 Reply to Office Action of August 24, 2021 the circuitry is configured to update, in each iteration of the clustering algorithm, an index matrix according to 
    PNG
    media_image2.png
    43
    385
    media_image2.png
    Greyscale
 where W (l) is a full-precision weight matrix for layer l of the neural network. and I)(l) is the index matrix, i and j denote rows and columns, respectively, i and j denote rows and columns, respectively, of W(l)  and I)(l), and k denotes a value index (Page 4, section 3.1, equation 2).
Regarding Claim 11, the combination of Liu, Han and Seo teaches the limitations of claim 8.
	Liu is silent with regards to the circuitry is configured to compute a weight-tied weight matrix based on an index matrix and based on a value vector comprising more than three values.
	Han teaches wherein the circuitry is configured to compute a weight-tied weight matrix (Fig 3, Section 3, line 2-5, “We limit the number of effective weights we need to store by having multiple connections share the same weight, and then fine-tune those shared weights. Weight sharing is illustrated in Figure 3. Suppose we have a layer that has 4 input neurons and 4 output neurons, the weight is a 4 x 4 matrix.”)
 based on an index matrix (Fig 2, Section 3, line 6-8, “The weights are quantized to 4 bins (denoted with 4 colors), all the weights in the same bin share the same value, thus for each weight, we then need to store only a small index into a table of shared weights”) and based on a value vector comprising more than three  values (Section 6.3, page 10, 3rd paragraph, line 1-3, “In real time processing when batching is not allowed, the input activation is a single vector  (i.e. value vector) and the computation is matrix-vector multiplication”. Section 3 discusses about a 4x4 matrix (i.e. more than three values)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include computing weight matrix based on index matrix and value vector as taught by Han into the neural network of Liu since the technique of Han is applied on artificial neural network. Therefore, this technique of clustering would facilitate the weight allocation and minimize memory usage and eventually reduce overall processor capability and cost (Han, Section 3).

Regarding claim 12, the combination of Liu, Han, and Seo teaches the limitations of claim 8.
	 Liu is silent with regards to wherein the circuitry is configured to update full precision weights 5 based on gradients
	Han teaches wherein the circuitry is configured to update full precision weights (Page 3, section 3, line 6-8, “The weights are quantized to 4 bins (denoted with 4 colors), all the weights (i.e. full precision weight) in the same bin share the same value”. Also based on page 5, section 3.3, and line 3-5, “During back-propagation, the gradient for each shared weight is calculated and used to update the shared weight. This procedure is shown in Figure 3.”) 5 based on gradients (Page 3, section 3, line 8-10, “During update, all the gradients are grouped by the color and summed together, multiplied by the learning rate and subtracted from the shared centroids from last iteration”).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include wherein the circuitry is configured to update full precision weights 5 based on gradients as taught by Han into the neural network of Liu since this is for the efficient update of weight matrix. Therefore, this technique of gradient based update will facilitate better accuracy in update process (Han, Fig 3, page 3).
Regarding claim 15, the combination of Liu, Han, and Seo teaches the limitations of claim 12.
	Liu is silent with regards to wherein the circuitry is configured to compute the gradients based on a backward pass function.
	Han further teaches wherein the circuitry is configured to compute the gradients based on a backward pass function (Page 5, section 3.3, line 3-6, “During back-propagation (i.e. backward pass), the gradient for each shared weight is calculated and used to update the shared weight.” Equation 3 represent the function 1(.)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include wherein the circuitry is configured to compute the gradients based on a backward pass function as taught by Han into the neural network of Liu since this is to improve the neural network overall calculation speed. Therefore, this technique of backward pass function would facilitate better convergence for the process without trying every possible weight (Han,  page 5).

Regarding Claim 16, the combination of Liu, Han, and Seo teaches the limitations of claim 8.
	Liu further teaches wherein the training algorithm is a stochastic gradient descent training algorithm (Para [0074], line 13-16, “implementation uses adaptive moment stochastic-gradient descent (ADAM) method to train the CoGAN for 25000 iterations.”).

Regarding Claim 17, the combination of Liu, Han, and Seo teaches the limitations of claim 8.
	Liu further teaches wherein the artificial neural network is a deep convolutional neural 15 network (Para [0076], line 14-16, “In this example, the generative and discriminative subnetworks were both seven layers deep convolutional neural networks.”).

Regarding Claim 20, Liu teaches circuitry that implements an artificial neural network training algorithm (Para[0066], line 6-9, “ In general, training an artificial-neural- network comprises applying a training algorithm, sometimes referred to as a “learning” algorithm, to an artificial-neural- network in view of a training set.) that uses weight tying (Para [0006], line 1-6, “Some embodiments are based on realization that when neural networks (i.e. an artificial neural network) are trained independently to generate a digital image, the generated digital images are not related. However, by forcing, e.g., during the joint training, a weight sharing (i.e. weight tying) constraint on the neural networks, the neural networks can be trained to generate a multimodal digital image.”).
	Liu is silent with regards to
computing a weight-tied matrix based on an index matrix and based on a value vector.
Update the weight tying using a predefined number of iterations of a clustering algorithm .
quantizing the values of the 20 value vector after updating the weight tying.
	Han teaches computing a weight-tied matrix based (Fig 3, Section 3, line 2-5, “We limit the number of effective weights we need to store by having multiple connections share the same weight, and then fine-tune those shared weights.
Weight sharing is illustrated in Figure 3. Suppose we have a layer that has 4 input neurons and 4 output neurons, the weight is a 4 x 4 matrix.”)
 based on an index matrix (Fig 2, Section 3, line 6-8, “The weights are quantized to 4 bins (denoted with 4 colors), all the weights in the same bin share the same value, thus for each weight, we then need to store only a small index into a table of shared weights”) and based on a value vector (Section 6.3, page 10, 3rd paragraph, line 1-3, “In real time processing when batching is not allowed, the input activation is a single vector  (i.e. value vector) and the computation is matrix-vector multiplication”).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include computing weight matrix based on index matrix and value vector as taught by Han into the neural network of Liu since the technique of Han is applied on artificial neural network. Therefore, this technique of clustering would facilitate the weight allocation and minimize memory usage and eventually reduce overall processor capability and cost (Han, Section 3).
The combination of Liu and Han is silent with regards to 
Update the weight tying using a predefined number of iterations of a clustering algorithm .
quantizing the values of the 20 value vector after updating the weight tying.
	Seo teaches update the weight tying using a predefined number of iterations of a clustering algorithm(Para [0037], “Throughout the entire process of DNN training for a DNN only the selected number of active weight blocks in each of the M rows is updated, while the inactive weight blocks remain at zero. “); quantize(Para [0034], “quantized weight sharing”) the values of the 20 value vector (Para [0035], “fully connected weight matrix”) after updating the weight tying(Para [0037], “Throughout the entire process of DNN training for a DNN only the selected number of active weight blocks in each of the M rows is updated, while the inactive weight blocks remain at zero. “)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include update of weight tying as taught by Seo into the neural network of Liu as modified by Han since the technique of Han is applied on artificial neural network for better calculation using weight tying process. Therefore, this technique will facilitate the reset of weight tying by quantized value vector and enhance calculation process using the updated vector (Seo Para [0035] – [0037])).

Claims 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over  Liu in view of Han and further in view of Seo and further in view of Weston et al. (US 20170200077A1) (hereinafter Weston) .
Regarding claim 9, the combination of Liu, Han, and Seo teaches the limitations of claim 8.
	However, the combination is silent with regards to wherein the circuitry is configured to quantize the values of the value vector to the nearest power of two.
	Weston teaches wherein the circuitry is configured to quantize the values of the value vector (Fig 11, block 1120, “Cluster the words in the predetermined dictionary into multiple bucketscorresponding to word clusters, by running a vector quantization onvectors of the embedding matrix”).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include wherein the circuitry is configured to quantize the values of the value vector as taught by Weston into neural network of  Liu as modified by  Han to improve distortion in neural network Therefore, this technique of vector quantization will help with proper and reliable reconstruction of data (Weston , Fig 11, abstract).
	However, the combination does not explicitly teach quantize the values of the value vector to the nearest power of two.
	However, Weston teaches – “For clustering word embedding, the memory network takes the trained embedding matrix U0 and runs a vector quantization (e.g., K-means clustering) to cluster (Para [0070]).
	Weston further teaches in Para [0046] – “The output feature map component 440 and the response component 450 are responsible for handling the major part of the inference. The output feature map component 440 produces the output feature vector by first finding multiple (k number of) supporting memory slots that relate to input feature vector x. In the illustrated embodiment, two supporting memory slots are used (thus, k=2) (i.e. power of two).
Therefore, it would be obvious for any ordinary skills in the art to combine Weston’s teaching of using K=2 as quantization to reach power of two for the purpose of dense quantization and less memory use.

Regarding claim 10, the combination of Liu, Han, Seo, and Weston teaches the limitations of claim 9.
	Weston further teaches wherein the circuitry is configured to quantize the values of the value vector (Fig 11,1120).
However, the combination is silent with regard to 
to the quantization scheme: 3Application No. 15/903,290 Reply to Office Action of August 24, 2021 
    PNG
    media_image3.png
    58
    322
    media_image3.png
    Greyscale
 where s = sign(x) and b = log2 1xI, and where x is the value which is to be quantized and xq is the quantized v.
However, using logarithmic relation is a common method of quantization which can be combined to enhance the quantization effect as described in the prior art. Therefore, it would be obvious to combine this teaching to enhance the computing further more.

Claims 13 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over  Liu in view of Han and further in view of Seo and further in view of Song et al. (Song, Yang, Alexander Schwing, and Raquel Urtasun. "Training deep neural networks via direct loss minimization." International Conference on Machine Learning. PMLR, 2016.) (hereinafter Song).
Regarding claim 13, the combination of Liu, Han and Seo teaches the limitations of claim 12.
	The combination is silent with regards to wherein the circuitry is configured to compute the gradients based on a cost function and based on the weight-tied weight matrix.
	Song teaches wherein the circuitry is configured to compute the gradients based on a cost function and based on the weight-tied weight matrix (Page 7, and Page 4 equation 6).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include wherein the circuitry is configured to compute the gradients based on a cost function and based on the weight-tied weight matrix by Song in view of Liu, Han and Seo for the purpose of evaluating accuracy level of data reconstruction. Therefore, this technique of cost function analysis will provide track of loss and facilitate minimization of loss value (Song, abstract).
Regarding claim 14, the combination of Liu, Han, and Seo teaches the limitations of claim 12.
	The combination is silent with regards to wherein the circuitry is configured to compute the cost function based on a loss function and based on a forward pass function.
	Song teaches wherein the circuitry is configured to compute the cost function (Page 4, Section 3, Observation 1, Para-3, line 4 and equation 6) based on a loss function (Page 4, Section 3, Observation 1, Para-3, line 4 and equation 6, term LAP represents the loss function according to the equation 4) and based on a forward pass function (Page 4, Section 3, Observation 1, Para-3, line 4 and equation 6, term
    PNG
    media_image4.png
    81
    468
    media_image4.png
    Greyscale

Represents forward pass function F (x, y, w) at page 4, COL1, Para-1, line 9. Also “Forward pass function” is mentioned at Page 3, COL1, Para-4, line 1-3 with term ‘F’ – “. First we use a standard forward pass to evaluate F”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include wherein the circuitry is configured to compute the cost function based on a loss function and based on a forward pass function as taught by Song in view of Liu, Han and Seo for the purpose of evaluating accuracy level of data reconstruction. Therefore, this technique of cost function analysis will provide track of loss and facilitate minimization of loss value (Song, abstract).

Response to Arguments
Applicant's arguments filed on 05/18/2022 have been fully considered.
With regards to remarks about “Claim rejection 112(b)”, the arguments are persuasive. The amended claim overcome previous rejection 
With regards to “Prior art rejection” the arguments are not persuasive. Examiner’s response is as below :
Applicant argues – “So. the Examiner would be permitted to rely on Seo as of the date of its provisional application only if the provisional application describes the subject matter relied upon in Seo for making the rejections. To assert otherwise would not be supported by the statute or the procedures outlined in the M.P.E.P. That is. the Examiner has not shown that Sios provisional application properly supports the subject matter relied upon to make the rejection. Absent such detailed evidence, the Examiner is not permitted to rely on See’s U S. provisional application filing date as the critical reference date for use in the rejections.”
Examiner respectfully disagrees for following reason :
The prior art of Seo has a PCT filing date of July 27 ,2017. But Seo also claims  a Provisional application 62/368365 filed on July 29 ,2016 . Reviewing the provisional application shows proper support for the subject matter used for rejection. In light of specification of provisional application 62/368365 Seo teaches following –
Page 9, Para [0027] – “For the trained DNN, the weights were first quantized with floating point neurons down to the precision where the DNN accuracy is acceptable. With this reduced precision on the weights, a reduced precision was chosen for the neurons, again while still achieving acceptable accuracy.”
Page 6, Para [0020] – “Here N is the size of the output layer, yt is the ith output node and t, is the ith target value or label. The mini-batch stochastic gradient method [17] is 20 used to train the network. The weights are updated using Eq. (2).”
Page 4, Para [0008] – “Because the weights that are dropped are different each time, the trained weights cannot actually be reduced, and the fully connected matrix weights are required for the classification phase”
The above-mentioned paragraph from provisional application supports the rejection and have been used. Thus, the rejection is still maintained.


Prior Art
The prior art made of record and not relied upon us considered pertinent to applicant’s disclosure.
Julian et al. (US 20150317557 A1) – This art teaches about artificial nervous system spike at time based on the delays.

	Conclusion
Applicant's amendment necessitated the new ground of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AEYSHA N SULTANA whose telephone number is (469)295-9239. The examiner can normally be reached 8:00PM-5PM,CST ,Monday -Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Arleen Vazquez can be reached on (571) 272-2619. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

AEYSHA . SULTANA
Examiner
Art Unit 2862



/EMAN A ALKAFAWI/Primary Examiner, Art Unit 2865                                                                                                                                                                                                        8/3/2022