DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-44 are presented for examination.

Information Disclosure Statement
The information disclosure statements (IDS) submitted on October 16, 2020 and May 5, 2021 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Drawings
The drawings are objected to because (a) in Figure 13, reference character 1364, “headphones” is on two different lines; (b) Figure 18A has reference characters oriented both horizontally and vertically, see 37 CFR § 1.84(p)(3); and (c) in Figure 11A, reference characters 1160 and 1164, text intermingles with the lines of the drawing, see id.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Specification
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.  Nonetheless, Examiner has read the specification to the extent possible and objects to the specification for containing various grammatical informalities.  Examiner has attached a marked-up copy of the specification indicating where errors have occurred.  To the extent that the markings are not self-explanatory and are not corrected, Examiner will enumerate the remaining objections in a subsequent Office Action.
The abstract of the disclosure is objected to because “has been updated” should be “have been updated”.  Correction is required.  See MPEP § 608.01(b).
The use of the terms BLUETOOTH (paragraphs 108, 167-68, 196, 220-21, 262), MICROSOFT (paragraphs 85, 93, 207), GOOGLE (paragraphs 82-83 and 93), AMAZON (paragraph 93), and WIFI (paragraphs 196, 262), which are trade names or marks used in commerce, has been noted in this application. The terms should be accompanied by the generic terminology; furthermore, the terms should be capitalized wherever they appear or, where appropriate, include a proper symbol indicating use in commerce such as ™, SM , or ® following the terms.
Although the use of trade names and marks used in commerce (i.e., trademarks, service marks, certification marks, and collective marks) is permissible in patent applications, the proprietary nature of the marks should be respected and every effort made to prevent their use in any manner which might adversely affect their validity as commercial marks.

Claim Objections
Examiner objects to claims 1-44.
Claims 1, 3, 15, 18, 23, 25, 30, and 37 are objected to because of the following informalities:  “portions … has been updated” should be “portions … have been updated”.  
Claims 10, 12-13, 27, 32, 37-38, and 40 are objected to because of the following informalities: “metadata indicates [comprises, is, stores]” should be “metadata indicate [comprise, are, store]”.
Claim 5 is objected to because of the following informalities: “elapsed the one” should be “elapsed since the one”; “portions … was” should be “portions … were”.
Claims 8, 23, 28, 30, and 37 are objected to because of the following informalities: “portions […] is” should be “portions […] are”.
Claims 9 and 23 are objected to because of the following informalities: “neural network” should be “neural networks”.
Claim 16 is objected to because of the following informalities: “information to be” should be “information is to be”.
Claim 17 is objected to because of the following informalities: “step of the” should be “step of training of the”.
Claim 19 is objected to because of the following informalities: “wherein” should be “further comprising”.
Claims  7 and 37 are objected to because of the following informalities: “based, at least in part on,” should be “based, at least in part, on”.
Claim 40 is objected to because of the following informalities: “the metadata” should be “wherein the metadata”.
All claims dependent on a claim objected to hereunder are also objected to for being dependent on an objected-to base claim.
Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-7 and 15-44 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The analysis of the claims will follow the 2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50 (“2019 PEG”).
Claim 1
Step 1: The claim is directed to a processor comprising one or more ALUs; therefore, it is directed to the statutory category of machines.
Step 2A Prong 1:  The claim recites “updat[ing] one or more portions of weight information corresponding to one or more neural networks based, at least in part, on metadata associated with the one or more portions of weight information to indicate how recently the one or more portions of weight information [have] been updated, wherein the one or more portions [are] less than all of the weight information corresponding to the one or more neural networks.”  Though the weight information “correspond[s] to one or more neural networks,” the weight information itself can be stored mentally or written down, and the updating of portions of the weight information based on metadata that indicate how recently the weight information has been updated could simply entail a user looking at data that indicate how recently the weight information have been updated and updating the weight information based thereon, e.g., by updating the oldest weights first.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  The only additional element of the claim is a “processor, comprising one or more arithmetic logic units (ALUs)”.  This is recited at a high level of generality and amounts to an instruction to apply the judicial exception of updating the weights using generic computing equipment.  See MPEP § 2106.05(f).
Step 2B:  The claim does not contain significantly more than the judicial exception.  As noted above, the recitation that the judicial exception is performed on a processor containing ALUs is nothing more than an instruction to apply the judicial exception using generic computing equipment.  See id.

Claims 2-7
NB: Due to the large number of claims, the dependent claims will be evaluated as a group for brevity.
Step 1:  A machine, as above.
Step 2A Prong 1:  Claim 2, reciting that the updating occurs as a result of determining that the weight information is to be used in training, recites the mental process of determining that the weights will be used for training and updating the weights in response thereto.  Regarding claims 3-4, even if one stipulates that the updating must be performed based on momentum information and learning rate and momentum coefficient hyperparameters, a human could take these factors into account in performing the updating.  Regarding claim 5, a human could update the weights based on a counter that determines how much training has occurred since the last weight update; this is purely an extension of the “oldest weights updated first” approach discussed above.  Regarding claim 6, updating weight information associated with an embedding vector could be performed mentally.  Regarding claim 7, calculating an accumulated update of the weight information based on the metadata and momentum information is mentally performable.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  The only additional element in any of these claims is the aforementioned processor comprising an ALU, which does not integrate the judicial exception into a practical application for the reasons noted above.
Step 2B:  The claim does not contain significantly more than the judicial exception.  The only additional element in any of these claims is the aforementioned processor comprising an ALU, which does not amount to significantly more than the judicial exception for the reasons noted above.

Claim 15
Step 1:  The claim recites a method; therefore, it is directed to the statutory category of processes.
Step 2A Prong 1:  The claim recites “generating weight information associated with one or more neural networks; and updating only portions of the weight information based, at least in part, on how recently the portions of the weight information [have] been updated, wherein the portions are less than all of the weight information.”  Though the weight information is “associated with one or more neural networks,” the generation of the weights themselves need not involve a neural network and can be performed mentally.  Similarly, updating portions of the weight information based on how recently they have been updated is mentally performable.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  There are no further limitations of the claim to analyze that would bear on whether the judicial exception is integrated into a practical application.
Step 2B:  The claim does not contain significantly more than the judicial exception.  There are no further limitations of the claim to analyze that would bear on whether the claim contains significantly more than the judicial exception.

Claims 16-22
Step 1: A process, as above.
Step 2A Prong 1:  Claim 16 does not recite training the network as such; it merely recites that the weight information is to be used in training.  The underlying process of updating the weights is, as noted above, mentally performable.  Regarding claim 17, randomly or pseudo-randomly selecting portions of the weight information to be used in training is mentally performable, since the selection is positively recited and the training itself is not.  Regarding claim 19, depending upon the complexity of the weight information, the computation of a gradient based on ground truth data and output data could be performed mentally or with pen and paper.  Regarding claims 20-21, updating two different partially overlapping sets of weight information in two phases is mentally performable; here again, while the updating is “part of … training,” the training of the network as such is not claimed.  Similarly, claim 22, which recites computing an accumulated update of two or more steps of training, can be performed mentally.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  The only additional element in any of the claims analyzed herein is in claim 18, which recites “storing metadata to indicate how recently the portions of the weight information has been updated.”  This limitation recites the insignificant extra-solution activity of mere data gathering.  See MPEP § 2106.05(g).
Step 2B:  The claim does not contain significantly more than the judicial exception.  The only additional element in any of the claims analyzed herein is in claim 18, which recites “storing metadata to indicate how recently the portions of the weight information has been updated.”  This limitation recites the insignificant extra-solution activity of mere data gathering.  See MPEP § 2106.05(g).  Additionally or alternatively, the limitation recites the well-understood, routine, and conventional activity of storing or retrieving information in memory.  See MPEP § 2106.05(d)(II); Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015).

Claim 23
Step 1: The claim is directed to a processor comprising one or more ALUs; therefore, it is directed to the statutory category of machines.
Step 2A Prong 1:  The claim recites, inter alia, “infer[ring] information based, at least in part, on one or more neural network[s] trained to update one or more portions of weight information corresponding to the one or more neural networks based, at least in part, on metadata associated with the one or more portions of weight information to indicate how recently the one or more portions of weight information [have] been updated, wherein the one or more portions is less than all of the weight information corresponding to the one or more neural networks.”  Note that neither the use of the neural networks nor their training, as such, is claimed.  Rather, the claim recites inferring information based on a trained neural network, such information being inferred based on metadata that indicate how recently the network’s weights have been updated.  A human could infer this information by, for instance, determining that a certain set of weights have not been updated for a certain period of time, then based on that information and the output/error of a training example of the network inferring that the weights need to be updated.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  The only remaining limitation of the claim is that the inference of the information is performed using a “processor, comprising one or more arithmetic logic units (ALUs)”.  As noted above, this limitation amounts to a mere instruction to apply the judicial exception using generic computer equipment and cannot integrate the judicial exception into a practical application.  See MPEP § 2106.05(f).
Step 2B:  The claim does not contain significantly more than the judicial exception.  The only remaining limitation of the claim is that the inference of the information is performed using a “processor, comprising one or more arithmetic logic units (ALUs)”.  As noted above, this limitation amounts to a mere instruction to apply the judicial exception using generic computer equipment and cannot amount to significantly more than the judicial exception.  See MPEP § 2106.05(f).

Claims 24-28
Step 1: A machine, as above.
Step 2A Prong 1:  The analysis of the additional limitations of claims 24-29 mirrors that of claims 2-7, respectively.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  The analysis here mirrors that of claims 2-7, respectively.
Step 2B:  The claim does not contain significantly more than the judicial exception.  The analysis here mirrors that of claims 2-7, respectively.

Claim 30
Step 1: The claim is directed to a system comprising one or more processors; therefore, it is directed to the statutory category of machines.
Step 2A Prong 1:  The claim recites, inter alia, “infer[ring] information using one or more neural networks trained by at least updating one or more portions of weight information based, at least in part, on metadata indicating how recently the one or more portions of the weight information has been updated, wherein the one or more portions is less than all of the weight information”.  As noted above, making inferences using a neural network is different from executing the network and could encompass, for instance, making a mental inference about the network itself by observing how recently some of its weights have been updated.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  Aside from the above limitations, the claim recites that the information is inferred using “one or more processors” and further recites “one or more memories to store the one or more neural networks.”  Both of these limitations amount to mere instructions to apply the judicial exception using a generic computer and cannot integrate the judicial exception into a practical application.  See MPEP § 2106.05(f).
Step 2B:  The claim does not contain significantly more than the judicial exception.  Aside from the above limitations, the claim recites that the information is inferred using “one or more processors” and further recites “one or more memories to store the one or more neural networks.”  Both of these limitations amount to mere instructions to apply the judicial exception using a generic computer and cannot amount to significantly more than the judicial exception.  See MPEP § 2106.05(f).

Claims 31-36
Step 1: A machine, as above.
Step 2A Prong 1:  Claim 31 does not positively recite training the neural networks; it merely states that the neural networks “are trained” by forward propagation.  The claim, when read in combination with claim 30, is directed to the inference of information using neural networks that happen to have been trained by forward propagation, which is mentally performable, not to the training of the networks themselves.  Regarding claim 31, a human could infer information based on neural networks based on metadata that indicate how to update embedding vectors used in training by observing the metadata and determining how the network is trained using the metadata.  Regarding claim 33, while the claim is directed to the inference of information using data on how recently neural network weight information has been updated, and not to the updating of the weight information itself, updating weight information based on momentum information is nonetheless mentally performable.  Regarding claim 34, while the updating of the metadata is not positively recited, updating metadata is nonetheless mentally performable.  Regarding claim 35, calculating an accumulated update based on momentum information and weight update metadata is mentally performable.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  The only additional element in this claim set is in claim 36, which recites that the system “further compris[es] an autonomous vehicle”.  No detail is indicated regarding what the role of the autonomous vehicle is in the system or how the system uses the autonomous vehicle.  Even assuming arguendo that this limitation is to be construed as indicating that the system is used in the navigation of an autonomous vehicle, this limitation would do no more than confine the judicial exception to the field of use of autonomous vehicles.  See MPEP § 2106.05(h).
Step 2B:  The claim does not contain significantly more than the judicial exception.  The only additional element in this claim set is in claim 36, which recites that the system “further compris[es] an autonomous vehicle”.  No detail is indicated regarding what the role of the autonomous vehicle is in the system or how the system uses the autonomous vehicle.  Even assuming arguendo that this limitation is to be construed as indicating that the system is used in the navigation of an autonomous vehicle, this limitation would do no more than confine the judicial exception to the field of use of autonomous vehicles.  See MPEP § 2106.05(h).

Claim 37
Step 1: The claim recites a method; therefore, it is directed to the statutory category of processes.
Step 2A Prong 1:  The claim recites “inferring information using one or more neural networks trained based, at least in part on, metadata to update one or more portions of weight information of the one or more neural networks, wherein the metadata indicates how recently the one or more portions of the weight information has been updated, further wherein the one or more portions is less than all of the weight information.”  As noted above, this limitation could encompass a human inferring information about the networks by observing data indicating how recently certain weights of the network were updated.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  There are no additional elements of this claim that would bear on whether the judicial exception is integrated into a practical application.
Step 2B:  The claim does not contain significantly more than the judicial exception.  There are no additional elements of this claim that would bear on whether the claim contains significantly more than the judicial exception.

Claims 38-44
Step 1: A process, as above.
Step 2A Prong 1:  Regarding claim 39, random or pseudo-random selection of weight information is mentally performable; note that the training itself is not positively recited, but rather only the selection of the portions of weight information.  Regarding claim 40, inferring information using neural networks based on metadata comprising a counter updated after training is mentally performable and could encompass merely observing the metadata that the neural networks produce and making an inference based thereon.  Regarding claims 41-43, while the claims are directed to the inference of information and not to the updating of the weight information, updating one portion of weight information in one step and another, overlapping portion in another step is mentally performable, even when done for the purpose of skipping updates during training.  Regarding claim 44, using metadata and momentum information to determine an accumulated update to weights may be performed mentally by simply applying a function that maps the metadata and momentum to the weight updates.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  Regarding claim 38, storing metadata that indicate how many steps of training have been skipped is the insignificant extra-solution activity of mere data gathering.  See MPEP § 2106.05(g).
Step 2B:  The claim does not contain significantly more than the judicial exception.  Regarding claim 38, storing metadata that indicate how many steps of training have been skipped is the insignificant extra-solution activity of mere data gathering.  See MPEP § 2106.05(g), and additionally or alternatively, it is the well-understood, routine, and conventional activity of storing and retrieving information in memory.  See MPE § 2106.05(d)(II); Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015).  

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 8-9, 12-16, 18, 20-22, 30-31, 34, 36-38, and 40-43 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ma et al. (US 20180075339) (“Ma”).
Regarding claim 8, Ma discloses “[a] system, comprising: 
one or more memories (invention can be embodied as a processor suitable for executing instructions and a memory coupled to the processor – Ma, paragraph 23) to store metadata to indicate how recently one or more portions of weight information to be back-propagated to one or more neural networks have been updated (axon timestamp registers register rolling timestamps of the last firing event of the axons, and neuron timestamp registers register rolling timestamps of the last firing event of the neurons; when an axon fires, a corresponding timestamp register Tpre can be written with a value B and is decremented by 1 until B reaches zero; when a neuron fires, a corresponding timestamp register Tpost can be written with a value B and decremented in each timestep until B reaches zero; a comparison operation between Tpre and Tpost can be triggered when an axon or neuron fires and neither Tpre nor Tpost equals zero; long-term potentiation and long-term depression curves determine synaptic weight updates; when Tpost – Tpre = 1, the synaptic weight can be considered “Strong LTP;” when Tpost – Tpre = 0, the synaptic weight can be considered “weak LTD” – Ma, paragraphs 90-94 [i.e., the timestamp registers that store the time since last firing also store the time since the last weight update, since weight updates occur via firing events and are thereby associated with the registers [containing timestamp metadata]]; see also paragraph 38 (disclosing that training of an ANN generally takes place by backpropagating the outputs of the network on labeled training datasets, thereby generating a set of weights that can be used as predictors)), wherein the one or more portions is less than all of the weight information to be back-propagated to the one or more neural networks (based on spiking neural network theory, the number of firing events of axons or neurons is relatively sparse, so only the rows having an axon or a neuron firing may need to be updated at each timestep [i.e., not all weight information is updated at once] – Ma, paragraph 128).”  

Regarding claim 9, Ma discloses that “the one or more memories include instructions that, if executed, cause the system to: 
load input data comprising the one or more portions of the weight information (operating method of a memory-centric neural network includes connecting weight matrixes to axons and neurons [i.e., loading them onto the axons and neurons] – Ma, paragraph 9); 
update the one or more portions of the weight information based at least in part on the metadata (axon timestamp registers register rolling timestamps of the last firing event of the axons, and neuron timestamp registers register rolling timestamps of the last firing event of the neurons; when an axon fires, a corresponding timestamp register Tpre can be written with a value B and is decremented by 1 until B reaches zero; when a neuron fires, a corresponding timestamp register Tpost can be written with a value B and decremented in each timestep until B reaches zero; a comparison operation between Tpre and Tpost can be triggered when an axon or neuron fires and neither Tpre nor Tpost equals zero; long-term potentiation and long-term depression curves determine synaptic weight updates; when Tpost – Tpre = 1, the synaptic weight can be considered “Strong LTP;” when Tpost – Tpre = 0, the synaptic weight can be considered “weak LTD” – Ma, paragraphs 90-94 [i.e., the timestamp registers that store the time since last firing also indicate the time since the last weight update, since weight updates occur via firing events]); 
forward propagate the updated one or more portions of the weight information through the one or more neural networks to generate one or more outputs (all types of ANN need to be trained before performing inference and classification functions; typically, there are two distinct modes of ANN operations, feed-forward mode for inferences and classifications, and backpropagation for training or learning using the labeled training datasets – Ma, paragraph 38; neural network [containing the weights] can be trained on a labeled dataset [to generate an output], and if error occurs, the error data feedback for retraining can be iterated many times until the errors converge to a minimum – id. at paragraph 40); 
back-propagate the one or more outputs to update the one or more neural network[s] (all types of ANN need to be trained before performing inference and classification functions; typically, there are two distinct modes of ANN operations, feed-forward mode for inferences and classifications, and backpropagation for training or learning [updating] using the labeled training datasets – Ma, paragraph 38); and 
update a different portion of the weight information from the one or more portions (the numbers of axons or neurons firing at a given timestep are relatively sparse, so only the rows of the weight matrices having axons or neurons firing may need to be updated at each timestep [since the axons and neurons firing differ at each timestep, it follows that in general different rows will be updated at each timestep] – Ma, paragraph 128).”  

Regarding claim 12, Ma discloses that “the metadata [are] updated after an epoch of training of the one or more neural networks (when one of the axons or neurons fires, the corresponding timestamp registers can be written with a value B and decremented until the value B reaches 0 – Ma, paragraphs 91-92; LTP/LTD curves are then used to determine synaptic weight updates based on a comparison between the two timestamp registers, which only occurs when an axon or neuron fires and neither timestamp is zero – id. at paragraphs 93-94 [i.e., the timestamp metadata are updated to the value B after a firing event occurring subsequent to a weight update/training epoch]).” 

Regarding claim 13, Ma discloses that “the metadata indicate[] how many epochs1 of training have been skipped (when one of the axons or neurons fires, corresponding timestamp registers Tpre or Tpost can be written with a value B and decremented at each timestep until the value B reaches 0 – Ma, paragraphs 91-92; a compare operation between these two registers can be triggered only when Tpre= B and/or Tpost = B and when neither Tpost nor Tpre = 0; the comparison triggers synaptic weight updates – id. at paragraphs 93-94 [so if the current value in the register is x, the number of timesteps since firing, that is, the number of timesteps since the last weight update [number of steps skipped], is B – x]).”  

Regarding claim 14, Ma discloses that the system “further compris[es] a vehicle (once trained, weights and parameters of a neural network can be transferred to application devices for deployment, such as self-driving cars or autonomous drones – Ma, paragraph 40).”

Regarding claim 15, Ma discloses “[a] method, comprising: 
generating weight information associated with one or more neural networks (all types of artificial neural network need to be trained before performing inference or classification functions; supervised learning can generate the best predictors (set of weights) – Ma, paragraph 38; see also paragraph 9 (disclosing that the weight matrices are connected to axons and neurons of the neural network system)); and 
updating only portions of the weight information based, at least in part, on how recently the portions of the weight information ha[ve] been updated (axon timestamp registers register rolling timestamps of the last firing event of the axons, and neuron timestamp registers register rolling timestamps of the last firing event of the neurons; when an axon fires, a corresponding timestamp register Tpre can be written with a value B and is decremented by 1 until B reaches zero; when a neuron fires, a corresponding timestamp register Tpost can be written with a value B and decremented in each timestep until B reaches zero; a comparison operation between Tpre and Tpost can be triggered when an axon or neuron fires and neither Tpre nor Tpost equals zero; long-term potentiation and long-term depression curves determine synaptic weight updates; when Tpost – Tpre = 1, the synaptic weight can be considered “Strong LTP;” when Tpost – Tpre = 0, the synaptic weight can be considered “weak LTD” – Ma, paragraphs 90-94 [i.e., the timestamp registers that store the time since last firing also indicate the time since the last weight update, since weight updates occur via firing events]), wherein the portions are less than all of the weight information (based on spiking neural network theory, the number of firing events of axons or neurons is relatively sparse, so only the rows having an axon or a neuron firing may need to be updated at each timestep [i.e., not all weight information is updated at once] – Ma, paragraph 128).”  

Regarding claim 16, Ma discloses that “the portions of the weight information [are] to be used in a step of training of the one or more neural networks (all types of artificial neural network need to be trained before performing inference or classification functions; supervised learning can generate the best predictors (set of weights) [i.e., the weight information is used in training] – Ma, paragraph 38).”

Regarding claim 18, Ma discloses “storing metadata to indicate how recently the portions of the weight information [have] been updated (axon timestamp registers register rolling timestamps of the last firing event of the axons, and neuron timestamp registers register rolling timestamps of the last firing event of the neurons; when an axon fires, a corresponding timestamp register Tpre can be written with a value B and is decremented by 1 until B reaches zero; when a neuron fires, a corresponding timestamp register Tpost can be written with a value B and decremented in each timestep until B reaches zero; a comparison operation between Tpre and Tpost can be triggered when an axon or neuron fires and neither Tpre nor Tpost equals zero; long-term potentiation and long-term depression curves determine synaptic weight updates; when Tpost – Tpre = 1, the synaptic weight can be considered “Strong LTP;” when Tpost – Tpre = 0, the synaptic weight can be considered “weak LTD” – Ma, paragraphs 90-94 [i.e., the timestamp registers that store the time since last firing also store the time since the last weight update, since weight updates occur via firing events and are thereby associated with the registers [containing timestamp metadata]]).”

Regarding claim 20, Ma discloses that “the portions of the weight information are updated as part of a first step of training and a different portion of the weight information is updated as part of a second step of training (values of a lookup table are looked up in accordance with calculating results of Tpost-Tpre, where Tpre is the timestamp of the selected axon and Tpost is the timestamp of each of the neurons; when any of the lookup values is non-zero, the corresponding weight matrix row is updated; when all the lookup values are zero, the system skips to the next matrix row – Ma, paragraph 140 [first step of training = step in which one or more of the lookup values is nonzero and the row [portion of weight information] is updated; second step of training = step in which all the lookup values are zero and the system skips to the next row and updates that row]).”

Regarding claim 21, Ma discloses that “the different portion partially overlaps with the portions of the weight information (in a worst case scenario, if every row of the weight matrix needs to be updated in every timestep, an STDP row update read-modify-write finite state machine may take 153.6 microseconds which is approximately 15% of the 1 ms timestep; however, the numbers of axons or neurons firing are relatively sparse, only the rows having axons or neurons firing need to be updated at each timestep [since the extreme cases of full weight updates at every timestep and highly sparse weight updates at every timestep are both contemplated, it follows that the median case, in which some weights are updated in two consecutive timesteps and others are not, is also contemplated] – Ma, paragraphs 127-28).”  

Regarding claim 22, Ma discloses “computing, based at least in part on the metadata, an accumulated update of two or more steps of training to update the portions of the weight information (axon timestamp registers register rolling timestamps of the last firing event of the axons, and neuron timestamp registers register rolling timestamps of the last firing event of the neurons; when an axon fires, a corresponding timestamp register Tpre can be written with a value B and is decremented by 1 until B reaches zero; when a neuron fires, a corresponding timestamp register Tpost can be written with a value B and decremented in each timestep until B reaches zero; a comparison operation between Tpre and Tpost can be triggered when an axon or neuron fires and neither Tpre nor Tpost equals zero; long-term potentiation and long-term depression curves determine synaptic weight updates; when Tpost – Tpre = 1, the synaptic weight can be considered “Strong LTP;” when Tpost – Tpre = 0, the synaptic weight can be considered “weak LTD” – Ma, paragraphs 90-94 [i.e., the timestamp registers that store the time since last firing also store the time since the last weight update, since weight updates occur via firing events and are thereby associated with the registers [containing timestamp metadata]]; paragraph 105 indicates that the five-step weight update procedure is repeated until all axon timestamps are compared and all weight updated are completed [so the training procedure contains five steps per axon timestamp]).”

Regarding claim 30, Ma discloses “[a] system, comprising: 
one or more processors (invention can be implemented as a processor suitable for executing instructions stored on and/or provided by a memory coupled to the processor – Ma, paragraph 23) to infer information using one or more neural networks trained by at least updating one or more portions of weight information (neural network hardware accelerator architecture may be capable of improving the performance and efficiency of a neural network accelerator – Ma, paragraph 8; long-term potentiation and long-term depression curves that determine synaptic weight updates in the neural network can be represented by a piecewise linear table – id. at paragraph 94; see also Fig. 3 and paragraph 40 (disclosing that the trained weights and parameters of the network can be transferred to application devices for deployment and inference), paragraph 45 (disclosing that the spiking neural network can be trained with an equivalent model off-chip, when the synaptic weights transferred to the SNN to perform inference and classification functions)) based, at least in part, on metadata indicating how recently the one or more portions of the weight information has been updated (axon timestamp registers register rolling timestamps of the last firing event of the axons, and neuron timestamp registers register rolling timestamps of the last firing event of the neurons; when an axon fires, a corresponding timestamp register Tpre can be written with a value B and is decremented by 1 until B reaches zero; when a neuron fires, a corresponding timestamp register Tpost can be written with a value B and decremented in each timestep until B reaches zero; a comparison operation between Tpre and Tpost can be triggered when an axon or neuron fires and neither Tpre nor Tpost equals zero; long-term potentiation and long-term depression curves determine synaptic weight updates; when Tpost – Tpre = 1, the synaptic weight can be considered “Strong LTP;” when Tpost – Tpre = 0, the synaptic weight can be considered “weak LTD” – Ma, paragraphs 90-94 [i.e., the timestamp registers that store the time since last firing also store the time since the last weight update, since weight updates occur via firing events and are thereby associated with the registers [containing timestamp metadata]]), wherein the one or more portions is less than all of the weight information (based on spiking neural network theory, the number of firing events of axons or neurons is relatively sparse, so only the rows having an axon or a neuron firing may need to be updated at each timestep [i.e., not all weight information is updated at once] – Ma, paragraph 128); and 
one or more memories to store the one or more neural networks (memory-centric neural network system includes semiconductor memory devices coupled to the processing unit and containing instructions executed by the processing unit – Ma, paragraph 9).”

Regarding claim 31, Ma discloses that “the one or more neural networks are trained by at least further forward propagating the updated one or more portions of the weight information to determine one or more outputs (all types of ANN need to be trained before performing inference or classification functions; typically, there are two distinct modes of ANN operations, feed-forward mode for inferences and classifications [i.e., the determination of outputs] and backpropagation mode for training or learning using labeled training datasets – Ma, paragraph 38; new training data may be processed [forward propagated] in accordance with training data sets; if error occurs, the error data feedback for retraining [i.e., weight updating] can be iterated many times until the errors converge to a minimum and below a certain threshold of changes – id. at paragraph 40 [i.e., at each iteration of training, the updated weight information is multiplied with the input information and the result is forward propagated through the network until an output is obtained]; see also paragraphs 104-05 (describing the weight updating process)).”  

Regarding claim 34, Ma discloses that “the metadata [are] updated after an epoch of training of the one or more neural networks (when one of the axons or neurons fires, the corresponding timestamp registers can be written with a value B and decremented until the value B reaches 0 – Ma, paragraphs 91-92; LTP/LTD curves are then used to determine synaptic weight updates based on a comparison between the two timestamp registers, which only occurs when an axon or neuron fires and neither timestamp is zero – id. at paragraphs 93-94 [i.e., the timestamp metadata are updated to the value B after a firing event occurring subsequent to a weight update/training epoch]).”  

Regarding claim 36, Ma discloses that the system “further compris[es] an autonomous vehicle (once trained, weights and parameters of a neural network can be transferred to application devices for deployment, such as self-driving cars or autonomous drones – Ma, paragraph 40).”  

Regarding claim 37, Ma discloses “[a] method, comprising:
inferring information using one or more neural networks trained based, at least in part on, metadata to update one or more portions of weight information of the one or more neural networks (neural network hardware accelerator architecture may be capable of improving the performance and efficiency of a neural network accelerator – Ma, paragraph 8; long-term potentiation and long-term depression curves that determine synaptic weight updates in the neural network can be represented by a piecewise linear table – id. at paragraph 94; see also Fig. 3 and paragraph 40 (disclosing that the trained weights and parameters of the network can be transferred to application devices for deployment and inference), paragraphs 90-94 (disclosing a procedure by which timestamp registers [containing timestamp metadata] are used to perform weight updates)), wherein the metadata indicate[] how recently the one or more portions of the weight information has been updated (axon timestamp registers register rolling timestamps of the last firing event of the axons, and neuron timestamp registers register rolling timestamps of the last firing event of the neurons; when an axon fires, a corresponding timestamp register Tpre can be written with a value B and is decremented by 1 until B reaches zero; when a neuron fires, a corresponding timestamp register Tpost can be written with a value B and decremented in each timestep until B reaches zero; a comparison operation between Tpre and Tpost can be triggered when an axon or neuron fires and neither Tpre nor Tpost equals zero; long-term potentiation and long-term depression curves determine synaptic weight updates; when Tpost – Tpre = 1, the synaptic weight can be considered “Strong LTP;” when Tpost – Tpre = 0, the synaptic weight can be considered “weak LTD” – Ma, paragraphs 90-94 [i.e., the timestamp registers that store the time since last firing also store the time since the last weight update, since weight updates occur via firing events and are associated with the registers [containing timestamp metadata]]), … [and] wherein the one or more portions is less than all of the weight information (based on spiking neural network theory, the number of firing events of axons or neurons is relatively sparse, so only the rows having an axon or a neuron firing may need to be updated at each timestep [i.e., not all weight information is updated at once] – Ma, paragraph 128).”

Regarding claim 38, Ma discloses that “the metadata store[] how many steps of training have been skipped when the weight information is updated (when one of the axons or neurons fires, corresponding timestamp registers Tpre or Tpost can be written with a value B and decremented at each timestep until the value B reaches 0 – Ma, paragraphs 91-92; a compare operation between these two registers can be triggered only when Tpre= B and/or Tpost = B and when neither Tpost nor Tpre = 0; the comparison triggers synaptic weight updates – id. at paragraphs 93-94 [so if the current value in the register is x, the number of timesteps since firing, that is, the number of timesteps since the last weight update [number of steps skipped], is B – x]).”

Regarding claim 40, Ma discloses that “the metadata [are] a counter that is updated after a step of training of the one or more neural networks (axon and neuron timestamp registers are each written with a value B that can be decremented [updated] by a value such as 1 in each timestep until the value B reaches 0 [so the timestamps are counters] – Ma, paragraphs 91-92; see also paragraph 94 (disclosing that a comparison operation between the axon timestamp and the neuron timestamp determines synaptic weight updates – i.e., each timestep is a step of training)).”  

Regarding claim 41, Ma discloses that “the one or more portions of the weight information update[] the one or more portions of the weight information to skip an update of at least one step of training (values of a lookup table are looked up in accordance with calculating results of Tpost-Tpre, where Tpre is the timestamp of the selected axon and Tpost is the timestamp of each of the neurons; when any of the lookup values is non-zero, the corresponding weight matrix row is updated; when all the lookup values are zero, the system skips to the next matrix row – Ma, paragraph 140).”  

Regarding claim 42, Ma discloses that “the one or more portions of the weight information are updated as part of a first step of training and a different portion of the weight information is updated as part of a second step of training (values of a lookup table are looked up in accordance with calculating results of Tpost-Tpre, where Tpre is the timestamp of the selected axon and Tpost is the timestamp of each of the neurons; when any of the lookup values is non-zero, the corresponding weight matrix row is updated; when all the lookup values are zero, the system skips to the next matrix row – Ma, paragraph 140 [first step of training = step in which one or more of the lookup values is nonzero and the row [portion of weight information] is updated; second step of training = step in which all the lookup values are zero and the system skips to the next row and updates that row]).”  

Regarding claim 43, Ma discloses that “the different portion partially overlaps with the one or more portions of the weight information (in a worst case scenario, if every row of the weight matrix needs to be updated in every timestep, an STDP row update read-modify-write finite state machine may take 153.6 microseconds which is approximately 15% of the 1 ms timestep; however, the numbers of axons or neurons firing are relatively sparse, only the rows having axons or neurons firing need to be updated at each timestep [since the extreme cases of full weight updates at every timestep and highly sparse weight updates at every timestep are both contemplated, it follows that the median case, in which some weights are updated in two consecutive timesteps and others are not, is also contemplated] – Ma, paragraphs 127-28).”

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-2 and 23-24 are rejected under 35 U.S.C. 103 as being unpatentable over Ma in view of Chole et al. (US 20200104669) (“Chole”).
Regarding claim 1, Ma discloses “[a] processor (invention can be implemented as a processor suitable for executing instructions stored on and/or provided by a memory coupled to the processor – Ma, paragraph 23) to update one or more portions of weight information corresponding to one or more neural networks (neural network hardware accelerator architecture may be capable of improving the performance and efficiency of a neural network accelerator – Ma, paragraph 8; long-term potentiation and long-term depression curves that determine synaptic weight updates in the neural network can be represented by a piecewise linear table – id. at paragraph 94) based, at least in part, on metadata associated with the one or more portions of weight information to indicate how recently the one or more portions of weight information has been updated (axon timestamp registers register rolling timestamps of the last firing event of the axons, and neuron timestamp registers register rolling timestamps of the last firing event of the neurons; when an axon fires, a corresponding timestamp register Tpre can be written with a value B and is decremented by 1 until B reaches zero; when a neuron fires, a corresponding timestamp register Tpost can be written with a value B and decremented in each timestep until B reaches zero; a comparison operation between Tpre and Tpost can be triggered when an axon or neuron fires and neither Tpre nor Tpost equals zero; long-term potentiation and long-term depression curves determine synaptic weight updates; when Tpost – Tpre = 1, the synaptic weight can be considered “Strong LTP;” when Tpost – Tpre = 0, the synaptic weight can be considered “weak LTD” – Ma, paragraphs 90-94 [i.e., the timestamp registers that store the time since last firing also store the time since the last weight update, since weight updates occur via firing events and are thereby associated with the registers [containing timestamp metadata]]), wherein the one or more portions is less than all of the weight information corresponding to the one or more neural networks (based on spiking neural network theory, the number of firing events of axons or neurons is relatively sparse, so only the rows having an axon or a neuron firing may need to be updated at each timestep [i.e., not all weight information is updated at once] – Ma, paragraph 128).”  
Ma appears not to disclose explicitly the further limitations of the claim.  However, Chole discloses “[a] processor, comprising one or more arithmetic logic units (ALUs) to update one or more portions of weight information corresponding to one or more neural networks (in a method for digitally performing matrix operations for an artificial neural network, the weight updates may be performed by, inter alia, independently multiplying a row of input data with a row of output error delta data using a plurality of parallel arithmetic logic units and accumulating multiplication results in a row of matrix data and storing said row of matrix data back to a memory circuit – Chole, claim 15; see also paragraph 32 (disclosing that the relevant operations may be carried out using specialized processors))….”
Chole and the instant application both relate to neural network computation using ALUs and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to use ALUs to update the weight information, as disclosed by Chole, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the operations necessary to update the weights to be performed in parallel, thereby speeding up the calculations of the network.  See Chole, claims 11 and 15.

Regarding claim 2, Ma discloses that “the one or more [processors] are to update the one or more portions of weight information as a result of determining that the one or more portions of the weight information are to be used in a current step of training of the one or more neural networks (all types of ANN need to be trained before performing inference or classification functions – Ma, paragraph 38; networks can be trained on labeled training datasets and, if error occurs, the error data feedback for retraining may be iterated many times until the errors converge to a minimum; the weights and parameters can then be transferred to actual application devices for deployment [i.e., each portion of weight information in each step of training is updated in response to the system determining that the weights need to be updated for training purposes] – id. at paragraph 40).”  
Ma appears not to disclose explicitly the further limitations of the claim.  However, Chole discloses that “the one or more ALUs are to update the one or more portions of weight information (in a method for digitally performing matrix operations for an artificial neural network, the weight updates may be performed by, inter alia, independently multiplying a row of input data with a row of output error delta data using a plurality of parallel arithmetic logic units and accumulating multiplication results in a row of matrix data and storing said row of matrix data back to a memory circuit – Chole, claim 15; see also paragraph 32 (disclosing that the relevant operations may be carried out using specialized processors))….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to use ALUs to update the weight information, as disclosed by Chole, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the operations necessary to update the weights to be performed in parallel, thereby speeding up the calculations of the network.  See Chole, claims 11 and 15.

Regarding claim 23, Ma discloses “[a] processor (invention can be implemented as a processor suitable for executing instructions stored on and/or provided by a memory coupled to the processor – Ma, paragraph 23) … to infer information based, at least in part, on one or more neural network[s] trained to update one or more portions of weight information corresponding to the one or more neural networks (neural network hardware accelerator architecture may be capable of improving the performance and efficiency of a neural network accelerator – Ma, paragraph 8; long-term potentiation and long-term depression curves that determine synaptic weight updates in the neural network can be represented by a piecewise linear table – id. at paragraph 94; see also Fig. 3 and paragraph 40 (disclosing that the trained weights and parameters of the network can be transferred to application devices for deployment and inference)) based, at least in part, on metadata associated with the one or more portions of weight information to indicate how recently the one or more portions of weight information has been updated (axon timestamp registers register rolling timestamps of the last firing event of the axons, and neuron timestamp registers register rolling timestamps of the last firing event of the neurons; when an axon fires, a corresponding timestamp register Tpre can be written with a value B and is decremented by 1 until B reaches zero; when a neuron fires, a corresponding timestamp register Tpost can be written with a value B and decremented in each timestep until B reaches zero; a comparison operation between Tpre and Tpost can be triggered when an axon or neuron fires and neither Tpre nor Tpost equals zero; long-term potentiation and long-term depression curves determine synaptic weight updates; when Tpost – Tpre = 1, the synaptic weight can be considered “Strong LTP;” when Tpost – Tpre = 0, the synaptic weight can be considered “weak LTD” – Ma, paragraphs 90-94 [i.e., the timestamp registers that store the time since last firing also store the time since the last weight update, since weight updates occur via firing events and are thereby associated with the registers [containing timestamp metadata]]), wherein the one or more portions is less than all of the weight information corresponding to the one or more neural networks (based on spiking neural network theory, the number of firing events of axons or neurons is relatively sparse, so only the rows having an axon or a neuron firing may need to be updated at each timestep [i.e., not all weight information is updated at once] – Ma, paragraph 128).”  
Ma appears not to disclose explicitly the further limitations of the claim.  However, Chole discloses “[a] processor, comprising one or more arithmetic logic units (ALUs) to infer information based, at least in part, on one or more neural network[s] (in a method for digitally performing matrix operations for an artificial neural network, the weight updates [information inferred] may be performed by, inter alia, independently multiplying a row of input data with a row of output error delta data using a plurality of parallel arithmetic logic units and accumulating multiplication results in a row of matrix data and storing said row of matrix data back to a memory circuit – Chole, claim 15; see also paragraph 32 (disclosing that the relevant operations may be carried out using specialized processors))….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to use an ALU to infer information based on the network, as disclosed by Chole, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the operations necessary to update the weights to be performed in parallel, thereby speeding up the calculations of the network.  See Chole, claims 11 and 15.

Regarding claim 24, the rejection of claim 23 is incorporated.  Ma further discloses that “the one or more ALUs are to update the one or more portions of weight information as a result of determining that the one or more portions of the weight information are to be used in a current step of training of the one or more neural networks (all types of ANN need to be trained before performing inference or classification functions – Ma, paragraph 38; networks can be trained on labeled training datasets and, if error occurs, the error data feedback for retraining may be iterated many times until the errors converge to a minimum; the weights and parameters can then be transferred to actual application devices for deployment [i.e., each portion of weight information in each step of training is updated in response to the system determining that the weights need to be updated for training purposes] – id. at paragraph 40).”  
Ma appears not to disclose explicitly the further limitations of the claim.  However, Chole discloses that “the one or more ALUs are to update the one or more portions of weight information (in a method for digitally performing matrix operations for an artificial neural network, the weight updates may be performed by, inter alia, independently multiplying a row of input data with a row of output error delta data using a plurality of parallel arithmetic logic units and accumulating multiplication results in a row of matrix data and storing said row of matrix data back to a memory circuit – Chole, claim 15; see also paragraph 32 (disclosing that the relevant operations may be carried out using specialized processors)) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to update the weight information using ALUs, as disclosed by Chole, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the operations necessary to update the weights to be performed in parallel, thereby speeding up the calculations of the network.  See Chole, claims 11 and 15.

Claims 3-4, 7, 25-26, and 29 are rejected under 35 U.S.C. 103 as being unpatentable over Ma in view of Chole and further in view of Kaskari et al. (US 20180232632) (“Kaskari”).
Regarding claim 3, Ma, as modified by Chole, discloses that “the one or more portions of weight information are updated based at least in part on: 
the metadata to indicate how recently the one or more portions of the weight information [have] been updated (axon timestamp registers register rolling timestamps of the last firing event of the axons, and neuron timestamp registers register rolling timestamps of the last firing event of the neurons; when an axon fires, a corresponding timestamp register Tpre can be written with a value B and is decremented by 1 until B reaches zero; when a neuron fires, a corresponding timestamp register Tpost can be written with a value B and decremented in each timestep until B reaches zero; a comparison operation between Tpre and Tpost can be triggered when an axon or neuron fires and neither Tpre nor Tpost equals zero; long-term potentiation and long-term depression curves determine synaptic weight updates; when Tpost – Tpre = 1, the synaptic weight can be considered “Strong LTP;” when Tpost – Tpre = 0, the synaptic weight can be considered “weak LTD” – Ma, paragraphs 90-94 [i.e., the timestamp registers that store the time since last firing also store the time since the last weight update, since weight updates occur via firing events and are thereby associated with the registers [containing timestamp metadata]]) ….”
Neither Ma nor Chole appears to disclose explicitly the further limitations of the claim.  However, Kaskari discloses that “the one or more portions of weight information are updated based at least in part on: …
momentum information to indicate how to update the one or more portions of the weight information (in a backward pass of training a neural network, an adaptive learning rate algorithm may be used in which the weights are updated for the i-th epoch based on the formula weight(i) = weight (i – 1) + update, where the update value is equal to Xweight(i) if Xweight(i) is between a lower bound and an upper bound for the update of the weights, and Xweight(i) is based on, inter alia, the product between a momentum and a quantity Δweight(i – 1) [momentum information] – Kaskari, paragraph 47); 
a learning rate (in a backward pass of training a neural network, an adaptive learning rate algorithm may be used in which the weights are updated for the i-th epoch based on the formula weight(i) = weight (i – 1) + update, where the update value is equal to Xweight(i) if Xweight(i) is between a lower bound and an upper bound for the update of the weights, and Xweight(i) is based on, inter alia, the product between an error signal received at corresponding weights or biases using backpropagation through time and a learning rate – Kaskari, paragraph 47); and 
a momentum coefficient (in a backward pass of training a neural network, an adaptive learning rate algorithm may be used in which the weights are updated for the i-th epoch based on the formula weight(i) = weight (i – 1) + update, where the update value is equal to Xweight(i) if Xweight(i) is between a lower bound and an upper bound for the update of the weights, and Xweight(i) is based on, inter alia, the product between a momentum [momentum coefficient] and a quantity Δweight(i – 1) – Kaskari, paragraph 47).”
Kaskari and the instant application both relate to neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Ma and Chole to update the weight information using a momentum coefficient, further momentum information, and a learning rate, as disclosed by Kaskari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve performance of the network and increase the training convergence rate.  See Kaskari, paragraph 47.

Regarding claim 4, Ma, as modified by Chole and Kaskari, discloses that “the learning rate and momentum coefficients are hyperparameters (in a backward pass of training a neural network, an adaptive learning rate algorithm may be used in which the weights are updated for the i-th epoch based on the formula weight(i) = weight (i – 1) + update, where the update value is equal to Xweight(i) if Xweight(i) is between a lower bound and an upper bound for the update of the weights, and Xweight(i) is based on, inter alia, a learning rate and a momentum, where the momentum may be set to m = 0.9 and the learning rate may be set to μ = 10-3 [i.e., they are not parameters set by training and thus are hyperparameters] – Kaskari, paragraph 47).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Ma and Chole to update the weight information based on learning rate and momentum hyperparameters, as disclosed by Kaskari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve performance of the network and increase the training convergence rate.  See Kaskari, paragraph 47.

Regarding claim 7, Ma, as modified by Chole, discloses that “an accumulated update is calculated based[] at least in part on[] … the metadata to update the one or more portions of the weight information (axon timestamp registers register rolling timestamps of the last firing event of the axons, and neuron timestamp registers register rolling timestamps of the last firing event of the neurons; when an axon fires, a corresponding timestamp register Tpre can be written with a value B and is decremented by 1 until B reaches zero; when a neuron fires, a corresponding timestamp register Tpost can be written with a value B and decremented in each timestep until B reaches zero; a comparison operation between Tpre and Tpost can be triggered when an axon or neuron fires and neither Tpre nor Tpost equals zero; long-term potentiation and long-term depression curves determine synaptic weight updates; when Tpost – Tpre = 1, the synaptic weight can be considered “Strong LTP;” when Tpost – Tpre = 0, the synaptic weight can be considered “weak LTD” – Ma, paragraphs 90-94 [i.e., the timestamp registers that store the time since last firing also store the time since the last weight update, since weight updates occur via firing events and are thereby associated with the registers [containing timestamp metadata]]), wherein the one or more portions is less than all of the weight information (based on spiking neural network theory, the number of firing events of axons or neurons is relatively sparse, so only the rows having an axon or a neuron firing may need to be updated at each timestep [i.e., not all weight information is updated at once; note that the resulting weight matrix is the result of calculations from the current timestep and previous timesteps, i.e., is accumulated] – Ma, paragraph 128).” 
Neither Ma nor Chole appears to disclose explicitly the further limitations of the claim.  However, Kaskari discloses that “an accumulated update is calculated based[] at least in part on[] the momentum information (weights and biases are updated for each epoch according to the rule weight(i) = weight(i – 1) + update, where the update is equal to a quantity dependent, inter alia, on a momentum coefficient provided that this quantity is in bounds [since the weight is dependent on the weight at the previous time step, which is dependent on the weight at a previous time step, etc., the weight update up to epoch i is an accumulated update] – Kaskari, paragraph 47)….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Ma and Chole to base the weight update on momentum information, as disclosed by Kaskari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve performance of the network and increase the training convergence rate.  See Kaskari, paragraph 47.

Regarding claim 25, Ma, as modified by Chole, discloses that “the one or more portions of weight information are updated based at least in part on: 
the metadata indicating how recently the one or more portions of the weight information [have] been updated (axon timestamp registers register rolling timestamps of the last firing event of the axons, and neuron timestamp registers register rolling timestamps of the last firing event of the neurons; when an axon fires, a corresponding timestamp register Tpre can be written with a value B and is decremented by 1 until B reaches zero; when a neuron fires, a corresponding timestamp register Tpost can be written with a value B and decremented in each timestep until B reaches zero; a comparison operation between Tpre and Tpost can be triggered when an axon or neuron fires and neither Tpre nor Tpost equals zero; long-term potentiation and long-term depression curves determine synaptic weight updates; when Tpost – Tpre = 1, the synaptic weight can be considered “Strong LTP;” when Tpost – Tpre = 0, the synaptic weight can be considered “weak LTD” – Ma, paragraphs 90-94 [i.e., the timestamp registers that store the time since last firing also store the time since the last weight update, since weight updates occur via firing events and are thereby associated with the registers [containing timestamp metadata]]) ….”
Neither Ma nor Chole appears to disclose explicitly the further limitations of the claim.  However, Kaskari discloses that “the one or more portions of weight information are updated based at least in part on:  
momentum information to indicate how to update the one or more portions of the weight information (in a backward pass of training a neural network, an adaptive learning rate algorithm may be used in which the weights are updated for the i-th epoch based on the formula weight(i) = weight (i – 1) + update, where the update value is equal to Xweight(i) if Xweight(i) is between a lower bound and an upper bound for the update of the weights, and Xweight(i) is based on, inter alia, the product between a momentum and a quantity Δweight(i – 1) [momentum information] – Kaskari, paragraph 47); 
a learning rate (in a backward pass of training a neural network, an adaptive learning rate algorithm may be used in which the weights are updated for the i-th epoch based on the formula weight(i) = weight (i – 1) + update, where the update value is equal to Xweight(i) if Xweight(i) is between a lower bound and an upper bound for the update of the weights, and Xweight(i) is based on, inter alia, the product between an error signal received at corresponding weights or biases using backpropagation through time and a learning rate – Kaskari, paragraph 47); and 
a momentum coefficient (in a backward pass of training a neural network, an adaptive learning rate algorithm may be used in which the weights are updated for the i-th epoch based on the formula weight(i) = weight (i – 1) + update, where the update value is equal to Xweight(i) if Xweight(i) is between a lower bound and an upper bound for the update of the weights, and Xweight(i) is based on, inter alia, the product between a momentum [momentum coefficient] and a quantity Δweight(i – 1) – Kaskari, paragraph 47).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Ma and Chole to update the weights based on learning rate and momentum information, as disclosed by Kaskari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve performance of the network and increase the training convergence rate.  See Kaskari, paragraph 47.

Regarding claim 26, Ma, as modified by Chole and Kaskari, discloses that “the learning rate and momentum coefficients are hyperparameters (in a backward pass of training a neural network, an adaptive learning rate algorithm may be used in which the weights are updated for the i-th epoch based on the formula weight(i) = weight (i – 1) + update, where the update value is equal to Xweight(i) if Xweight(i) is between a lower bound and an upper bound for the update of the weights, and Xweight(i) is based on, inter alia, a learning rate and a momentum, where the momentum may be set to m = 0.9 and the learning rate may be set to μ = 10-3 [i.e., they are not parameters set by training and thus are hyperparameters] – Kaskari, paragraph 47).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Ma and Chole to update the weights based on learning rate and momentum hyperparameters, as disclosed by Kaskari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve performance of the network and increase the training convergence rate.  See Kaskari, paragraph 47.

Regarding claim 29, Ma, as modified by Chole, discloses that “an accumulated update is calculated based, at least in part[,] on[] … the metadata to update the one or more portions of the weight information (axon timestamp registers register rolling timestamps of the last firing event of the axons, and neuron timestamp registers register rolling timestamps of the last firing event of the neurons; when an axon fires, a corresponding timestamp register Tpre can be written with a value B and is decremented by 1 until B reaches zero; when a neuron fires, a corresponding timestamp register Tpost can be written with a value B and decremented in each timestep until B reaches zero; a comparison operation between Tpre and Tpost can be triggered when an axon or neuron fires and neither Tpre nor Tpost equals zero; long-term potentiation and long-term depression curves determine synaptic weight updates; when Tpost – Tpre = 1, the synaptic weight can be considered “Strong LTP;” when Tpost – Tpre = 0, the synaptic weight can be considered “weak LTD” – Ma, paragraphs 90-94 [i.e., the timestamp registers that store the time since last firing also store the time since the last weight update, since weight updates occur via firing events and are thereby associated with the registers [containing timestamp metadata]]).”  
Neither Ma nor Chole appears to disclose explicitly the further limitations of the claim.  However, Kaskari discloses that “an accumulated update is calculated based, at least in part[,] on[] the momentum information (weights and biases are updated for each epoch according to the rule weight(i) = weight(i – 1) + update, where the update is equal to a quantity dependent, inter alia, on a momentum coefficient provided that this quantity is in bounds [since the weight is dependent on the weight at the previous time step, which is dependent on the weight at a previous time step, etc., the weight update up to epoch i is an accumulated update] – Kaskari, paragraph 47)….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Ma and Chole to calculate an accumulated update based on momentum information, as disclosed by Kaskari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve performance of the network and increase the training convergence rate.  See Kaskari, paragraph 47.

Claims 5 and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Ma in view of Chole and further in view of Krishnamurthy et al. (US 20190042910) (“Krishnamurthy”).
Regarding claim 5, Ma, as modified by Chole and Krishnamurthy, discloses that “the metadata comprise[] a counter that indicates how many steps of training have elapsed [since] the one or more portions of weight information [were] last updated (long-term potentiation on a synapse may be conducted using a replay spike; when a presynaptic spike is replayed after a fixed number of time steps T, the relative spike timing between a presynaptic spike and a replay spike is determined based solely on a postsynaptic spike history counter (e.g., the number of time-steps since the postsynaptic spike occurred); T is the maximum spike time difference beyond which the synaptic weight update is zero [so the counter indicates when a spike was last emitted, and the timing of the spikes are used for weight update, so a counter indicating the amount of time since the last spike is also a counter for how long ago the weight was updated because weight updates do not occur absent a spike] – Krishnamurthy, paragraph 39).”  
Krishnamurthy and the instant application both relate to neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Ma and Chole to include a counter indicating how many steps of training have elapsed since weight information was last updated, as disclosed by Krishnamurthy, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would ensure that the system does not learn from events that occurred to far back in time to be significant.  See Krishnamurthy, paragraph 39.

Regarding claim 27, Ma, as modified by Chole and Krishnamurthy, discloses that “the metadata comprise[] a counter to indicate how many steps of training have elapsed [since] the one or more portions of weight information [were] last updated (long-term potentiation on a synapse may be conducted using a replay spike; when a presynaptic spike is replayed after a fixed number of time steps T, the relative spike timing between a presynaptic spike and a replay spike is determined based solely on a postsynaptic spike history counter (e.g., the number of time-steps since the postsynaptic spike occurred); T is the maximum spike time difference beyond which the synaptic weight update is zero [so the counter indicates when a spike was last emitted, and the timing of the spikes are used for weight update, so a counter indicating the amount of time since the last spike is also a counter for how long ago the weight was updated because weight updates do not occur absent a spike] – Krishnamurthy, paragraph 39).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Ma and Chole to include a counter indicating how many steps of training have elapsed since the last weight update, as disclosed by Krishnamurthy, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would ensure that the system does not learn from events that occurred to far back in time to be significant.  See Krishnamurthy, paragraph 39.

Claims 6 and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Ma in view of Chole and further in view of Vu et al. (US 20200372360) (“Vu”).
Regarding claim 6, Ma, as modified by Chole and Vu, discloses that “the one or more portions of weight information [are] associated with an embedding vector2 (in order for a perceptron to generate a desired value, a learning rule is specified that specifies how to train the perceptron; the learning process proceeds by selecting an initial weight vector [embedding vector], and then a set of m pairs of inputs and target values is used successively to update the weight vector until the output of the perceptron is equal to or close to the target values for the output – Vu, paragraph 35).”  
Vu and the instant application both relate to neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Ma and Chole to associate the weight information with an embedding vector, as disclosed by Vu, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the network to manipulate multiple weights at once using linear algebraic expressions, thereby increasing efficiency.  See Vu, paragraph 35.

Regarding claim 28, Ma, as modified by Chole and Vu, discloses that “the one or more portions of weight information [are] associated with an embedding vector (in order for a perceptron to generate a desired value, a learning rule is specified that specifies how to train the perceptron; the learning process proceeds by selecting an initial weight vector [embedding vector], and then a set of m pairs of inputs and target values is used successively to update the weight vector until the output of the perceptron is equal to or close to the target values for the output – Vu, paragraph 35).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Ma and Chole to associate the weight information with an embedding vector, as disclosed by Vu, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the network to manipulate multiple weights at once using linear algebraic expressions, thereby increasing efficiency.  See Vu, paragraph 35.

Claims 10 and 32 are rejected under 35 U.S.C. 103 as being unpatentable over Ma in view of Vu.
Regarding claim 10, Ma, as modified by Vu, discloses that “the metadata indicate[] how to update a plurality of embedding vectors used to train the one or more neural networks (in order for a perceptron to generate a desired value, a learning rule is specified that specifies how to train the perceptron; the learning process proceeds by selecting an initial weight vector [embedding vector], and then a set of m pairs of inputs and target values [metadata] is used successively to update the weight vector until the output of the perceptron is equal to or close to the target values for the output – Vu, paragraph 36).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to use data to indicate how to update the embedding vectors of the networks, as disclosed by Vu, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would ensure that the updating of the weights of the network takes place according to a predefined procedure, thereby enhancing the predictability o the network operations.  See Vu, paragraph 36.

Regarding claim 32, Ma, as modified by Vu, discloses that “the metadata indicate[] how to update a plurality of embedding vectors used to train the one or more neural networks (in order for a perceptron to generate a desired value, a learning rule is specified that specifies how to train the perceptron; the learning process proceeds by selecting an initial weight vector [embedding vector], and then a set of m pairs of inputs and target values [metadata] is used successively to update the weight vector until the output of the perceptron is equal to or close to the target values for the output – Vu, paragraph 35).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to include data indicating how to update the weight vectors for the networks, as disclosed by Vu, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would ensure that the updating of the weights of the network takes place according to a predefined procedure, thereby enhancing the predictability o the network operations.  See Vu, paragraph 36.

Claims 11, 33, 35, and 44 are rejected under 35 U.S.C. 103 as being unpatentable over Ma in view of Kaskari.
Regarding claim 11, Ma, as modified by Kaskari, discloses that “the one or more memories are to store momentum information to indicate how to update the one or more portions of the weight information (in a backward pass of training a neural network, an adaptive learning rate algorithm may be used in which the weights are updated for the i-th epoch based on the formula weight(i) = weight (i – 1) + update, where the update value is equal to Xweight(i) if Xweight(i) is between a lower bound and an upper bound for the update of the weights, and Xweight(i) is based on, inter alia, the product between a momentum and a quantity Δweight(i – 1) [momentum information] – Kaskari, paragraph 47).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to store momentum information to indicate how to update the weight information, as disclosed by Kaskari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve performance of the network and increase the training convergence rate.  See Kaskari, paragraph 47.

Regarding claim 33, Ma, as modified by Kaskari, discloses that “the one or more portions of the weight information are updated further based at least in part on momentum information to indicate how to update the one or more portions of the weight information (weights and biases connected to output layer of neural network are updated according to the equation weight(i) = weight(i – 1) + update and the update, when within bounds, is defined by an expression that includes a momentum term [i.e., the equation for weight update indicates how the update is done using momentum information] – Kaskari, paragraph 47).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to update the weight information using momentum information, as disclosed by Kaskari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve performance of the network and increase the training convergence rate.  See Kaskari, paragraph 47.

Regarding claim 35, the rejection of claim 33 is incorporated.  Ma further discloses that “an accumulated update is calculated based, at least in part[,] on[] … the metadata to update the one or more portions of the weight information (axon timestamp registers register rolling timestamps of the last firing event of the axons, and neuron timestamp registers register rolling timestamps of the last firing event of the neurons; when an axon fires, a corresponding timestamp register Tpre can be written with a value B and is decremented by 1 until B reaches zero; when a neuron fires, a corresponding timestamp register Tpost can be written with a value B and decremented in each timestep until B reaches zero; a comparison operation between Tpre and Tpost can be triggered when an axon or neuron fires and neither Tpre nor Tpost equals zero; long-term potentiation and long-term depression curves determine synaptic weight updates; when Tpost – Tpre = 1, the synaptic weight can be considered “Strong LTP;” when Tpost – Tpre = 0, the synaptic weight can be considered “weak LTD” – Ma, paragraphs 90-94 [i.e., the timestamp registers that store the time since last firing also store the time since the last weight update, since weight updates occur via firing events and are thereby associated with the registers [containing timestamp metadata]]).”
Kaskari further discloses that “an accumulated update is calculated based, at least in part[,] on[] the momentum information (weights and biases are updated for each epoch according to the rule weight(i) = weight(i – 1) + update, where the update is equal to a quantity dependent, inter alia, on a momentum coefficient provided that this quantity is in bounds [since the weight is dependent on the weight at the previous time step, which is dependent on the weight at a previous time step, etc., the weight update up to epoch i is an accumulated update] – Kaskari, paragraph 47)….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to calculate an accumulated update based on momentum information, as disclosed by Kaskari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve performance of the network and increase the training convergence rate.  See Kaskari, paragraph 47.

Regarding claim 44, the rejection of claim 37 is incorporated.  Ma further discloses that “the metadata … are used to determine an accumulated update to update the portions of the weight information (axon timestamp registers register rolling timestamps of the last firing event of the axons, and neuron timestamp registers register rolling timestamps of the last firing event of the neurons; when an axon fires, a corresponding timestamp register Tpre can be written with a value B and is decremented by 1 until B reaches zero; when a neuron fires, a corresponding timestamp register Tpost can be written with a value B and decremented in each timestep until B reaches zero; a comparison operation between Tpre and Tpost can be triggered when an axon or neuron fires and neither Tpre nor Tpost equals zero; long-term potentiation and long-term depression curves determine synaptic weight updates; when Tpost – Tpre = 1, the synaptic weight can be considered “Strong LTP;” when Tpost – Tpre = 0, the synaptic weight can be considered “weak LTD” – Ma, paragraphs 90-94 [i.e., the timestamp registers that store the time since last firing also store the time since the last weight update, since weight updates occur via firing events and are thereby associated with the registers [containing timestamp metadata]]).”
Kaskari discloses that the “momentum information to indicate how to update the one or more portions of the weight information … [is] used to determine an accumulated update to update the portions of the weight information (weights and biases are updated for each epoch according to the rule weight(i) = weight(i – 1) + update, where the update is equal to a quantity dependent, inter alia, on a momentum coefficient provided that this quantity is in bounds [since the weight is dependent on the weight at the previous time step, which is dependent on the weight at a previous time step, etc., the weight update up to epoch i is an accumulated update] – Kaskari, paragraph 47)….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to calculate an accumulated update based on momentum information, as disclosed by Kaskari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve performance of the network and increase the training convergence rate.  See Kaskari, paragraph 47.

Claims 17 and 39 are rejected under 35 U.S.C. 103 as being unpatentable over Ma in view of Le Gallo-Bordeau et al. (US 20200293855) (“Le Gallo-Bordeau”).
Regarding claim 17, Ma, as modified by Le Gallo-Bordeau, discloses that “a random or pseudo-random process is used to select the portions of the weight information to be used in the step of the one or more neural networks (neural network weight update is calculated for a weight in each of a plurality of arrays storing a weight set; in some embodiments the weight updates can be computed for only a subset of weights, e.g., a randomly-selected subset – Le Gallo-Bordeau, paragraph 33).”  
Le Gallo-Bordeau and the instant application both relate to selective weight updating in neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to select the weight information to be used randomly or pseudo-randomly, as disclosed by Le Gallo-Bordeau, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would reduce the processing requirements of the system by requiring only that certain weights, as opposed to the entire weight set, be updated.  See Le Gallo-Bordeau, paragraph 33.

Regarding claim 39, Ma, as modified by Le Gallo-Bordeau, discloses that “the one or more portions of the weight information are randomly or pseudo-randomly selected to be used to train the one or more neural networks in a step of training (neural network weight update is calculated for a weight in each of a plurality of arrays storing a weight set; in some embodiments the weight updates can be computed for only a subset of weights, e.g., a randomly-selected subset – Le Gallo-Bordeau, paragraph 33).”   It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to select the weights to be updated randomly, as disclosed by Le Gallo-Bordeau, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would reduce the processing requirements of the system by requiring only that certain weights, as opposed to the entire weight set, be updated.  See Le Gallo-Bordeau, paragraph 33.

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Ma in view of Yang et al. (US 20200356803) (“Yang”).
Regarding claim 19, Ma, as modified by Yang, discloses “generating the weight information by at least computing a gradient based at least in part on ground truth data and output data of the one or more neural networks (supervised learning algorithm may use forward propagation to generate a factor and overall scores, determine differences between the generated factor and overall scores [output data] to a ground truth factor and overall scores to estimate a loss function, use the differences to estimate a gradient of the loss function, and backpropagate the differences to weights and biases of the system according to the estimate of the gradient of the loss function – Yang, paragraph 35).”  
Yang and the instant application both relate to neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to generate the weight information by computing a gradient based on ground truth data and output data, as disclosed by Yang, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would provide a point of comparison to which the output of the network can be compared to determine how much to update the weights.  See Yang, paragraph 35.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RYAN C VAUGHN whose telephone number is (571)272-4849. The examiner can normally be reached M-R 7:50a-5:50p ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar, can be reached at 571-272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RYAN C VAUGHN/             Examiner, Art Unit 2125                                                                                                                                                                                           


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 Applicant appears to be using terminology in a way that differs from its accepted meaning in the art.  Examiner understands the term “epoch” to refer to a cycle through the full training dataset.  See¸DeepAI, What is an Epoch?, https://deepai.org/machine-learning-glossary-and-terms/epoch.  However, Applicant appears to be using “epoch” more broadly to mean any training step or batch.  See, e.g., specification paragraph 49 (“…metadata [are] used to track how many steps (batches) of training have been skipped.”).  For purposes of examination, the term “epoch” will be construed to mean any step of training.
        2 Applicant appears to be using the term “embedding vector” in a way that differs from its accepted meaning in the art.  In common parlance, the term “embedding vector” refers to a numerical vector representation of a set of input data.  See Tripathi, What Are Vector Embeddings?, https://www.pinecone.io/learn/vector-embeddings/.  However, Applicant appears to be using the term to refer to a weight vector.  See specification paragraph 66 (“In at least one embodiment, set of embedding vectors 402 comprises weights (e.g., 256 weights) that are used to control and adjust behavior of neural network 406.”).  For purposes of examination, the term “embedding vector” will be deemed synonymous with a weight vector.