Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

1.	The Examiner acknowledges the applicant’s amendment filed March 9, 2022.  At this point claims 1-20 are pending in the instant application and ready for examination by the Examiner.

Claim Rejections - 35 USC § 112
2. 	The following is a quotation of the first paragraph of 35 U.S.C. 112(a): (a) 

INGENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

Claims 9-16 and 20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AlA), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Claims 1, 9 and 16 recite the following.
‘....define parameter criticalities for parameter weights based on the neuron locations relative to the initial input and the final output  wherein the parameter weights weigh an importance of connections between neurons in the neural network in determining the final output, and wherein the parameter criticalities define levels of impact that the parameter weights have on a final output of the neural network,...’ (Claim 9)
‘....program instructions to define parameter criticalities for parameter weights based on the neuron locations relative to the initial input and the final wherein the parameter weights weigh an importance of connections between neurons in the neural network, and wherein the parameter criticalities define levels of impact that the parameter weights have on a final output of the neural network,....’ (Claim 16)
The overall domain of this claim element is in regards of weights, connections and importance (important) and is only mentioned in 0036 of the specification. The specification lacks written description that explains how weights are used as an indicator of importance of connections. In addition, the specification also lacks the meaning of ‘importance of connections.’
Dependent claims 10-15 and 20 are also rejected under 112(a) for failing to cure the deficiencies of their respective independent claims.

Claim Rejections - 35 USC § 103
3. 	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Claim(s) 1, 5, 7, 9, 12 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over in view of Chung, in view of Hyde and further in view of Anderson. ( U. S. Patent Publication 20160379109, referred to as Chung; U. S. Patent Publication 20160028544, referred to as Hyde; U. S. Patent Publication 20020143720, referred to as Anderson)

Claim 1
Chung discloses a computer-implemented method comprising: defining, by one or more processors, layers in a neural network based on neuron locations relative to an initial input and a final output of the neural network wherein neurons in the neural network are hardware processors (Chung, figs 49-50, 0060; Each hardware acceleration component, on the other hand, may correspond to hardware logic for implementing functions, such as a field-programmable gate array (FPGA) device, a massively parallel processor array (MPPA) device, a graphics processing unit (GPU), an application-specific integrated circuit (ASIC), a multiprocessor System-on-Chip (MPSoC), and so on.), wherein a first layer in the neural network, is closer to the initial input than a second layer, and wherein the second layer is closer to the final output than the first layer (Chung, figs 37, 0246; A feature gathering component 3708 collects the feature values from the feature state machines and makes them available to downstream acceleration components. Feature gathering component 3708 may be implemented with one or computer processors with memory store instructions, or dedicated logic gate arrays implemented, for example, in an FPGA, ASIC, or other similar device.) defining, by one or more processors, parameter criticalities for parameter weights based on the neuron locations relative to the initial input and the final output, wherein the parameter weights weigh an impact that connections of connections between neurons in the neural network in determining the final output and wherein the parameter criticalities define levels of impact that the parameter weights have on a final output of the neural network. (Chung, 0306, figure 50; Per the specification, [0049] In block 705, one or more processors define parameter criticalities for parameter weights stored in a memory used by the neural network. These parameter criticalities define impacts of parameter weights on the original outgoing final outputs. For example, assume that the parameter shown in FIG. 2 as stored weight w.sub.1 has little impact on the output shown FIG. 2 (assuming that y is one of the final outputs from the DNN 500 shown in FIG. 5). Assume further that the parameter shown in FIG. 2 as stored weight w.sub.2 has a relatively greater impact on the output y. As such, the criticality of parameter w.sub.2 is greater than the parameter criticality of w.sub.1. [0050] Referring again to FIG. 7, in block 707, one or more processors associate the defined layers of the neural network with different memory banks based on the parameter criticalities for the parameter weights. For example, different layers in a DNN are assigned to different memory banks according to their respective significance. By way of further example (with reference now to FIGS. 5-6), a first defined layer (e.g., layer 501 in FIG. 5) is assigned to a first memory bank (e.g., bank 601 in FIG. 6) that is more error-prone than a second memory bank (e.g., bank 604) that is assigned to the second defined layer (e.g., layer 504). As shown in FIG. 6, the first memory bank uses less power than the second memory bank. As such, bank 601 uses less power than bank 604, thus saving power used by DNN 500.) (EC:This establishes different memory error parameters for different layers of a neural network.) ‘ Without wanting to be bound by any particular theory, it is believed that partitioning a DNN into higher memory bandwidth layers (e.g., linear layers) and lower memory bandwidth layers (e.g., convolutional layers), and allocating higher memory bandwidth layers to a host component and lower memory bandwidth layers to an acceleration component using the CNN acceleration techniques described above, may more efficiently implement the DNN compared to implementing the DNN fully on a host component, or fully on an acceleration component.’ Chung assigns the layers to memory priority (i.e. high or low bandwidth) based on location of the layers.  Linear layers (closest to the output) is assigned to high bandwidth memory and convolutional layers (farthest from the output) is assigned to low bandwidth memory.  Note, the weights of these layers will also be stored in the corresponding assigned memory.)
Chung does not disclose expressly wherein the first layer is assigned a first portion of memory that is more error-prone than a second portion of memory that is assigned to the second layer.
Hyde discloses wherein the first layer is assigned a first portion of memory that is more error-prone than a second portion of memory that is assigned to the second layer. (Hyde, 0068; ‘Similarly, other aspects of computation additionally or alternatively to random number generation to achieve advantage such as association of computationally-intensive security operations with higher performing memory while allocating slower or more error prone to less sensitive or less important data.’ of Hyde.) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung and Hyde before him before the effective filing date of the claimed invention, to modify Chung to incorporate better memory as the computation of the neural network algorithm progresses of Hyde. Given the advantage of avoiding re- computing at the later stages of the algorithm due to error failure, one having ordinary skill in the art would have been motivated to make this obvious modification.
Chung and Hyde do not disclose expressly associating, by one or more processors, defined layers in the neural network with portions of memory based on defined parameter criticalities,…. storing, by one or more processors, parameter weights used by neurons in the first layer in the first portion of memory; and storing, by one or more processors, parameter weights used by neurons in the second layer in the second portion of memory.
Anderson discloses associating, by one or more processors, defined layers in the neural network with portions of memory based on defined parameter criticalities,…. storing, by one or more processors, parameter weights used by neurons in the first layer in the first portion of memory; and storing, by one or more processors, parameter weights used by neurons in the second layer in the second portion of memory. (Anderson, 0022, 0024, fig 2; Associating, by one or more processors, defined layers in the neural network with portions of memory based on defined parameter criticalities …..storing, by one or more processors, parameter weights used by neurons in the first layer in the first portion of memory; and storing, by one or more processors, parameter weights used by neurons in the second layer in the second portion of memory of applicant maps to ‘The first data structure portion 110 stores a single respective set of values of the input data signals for each layer of the neural network 10. Because all of the input data signals for each particular layer of the neural network are provided to each of the neurons in that layer, the values of each of the input data signals need only be stored once in the first data structure portion 110.’ and ‘Further as shown in FIG. 2, the second data structure portion 120 of the new data structure 100 includes array locations 0-12 corresponding to each of the weight values used by each of the neurons 30, 40 and 60 of each of the layers 20, 50 of the neural network 10. Specifically, the second data structure portion 120 includes array locations 0-4 for storing the weight values 31-35 for neuron 30, array locations 5-9 for storing the weight values 41-45 for neuron 40, and array locations 10-12 for storing the weight values 61-63 for neuron 60. The second data structure portion 120 stores the weight values 31-35, 41-45 and 61-63 sequentially in successive array ( memory) locations. That is, the weight values for each given neuron in each layer are stored sequentially, the sets of array locations storing the sets of weight values for each of the neurons of each respective layer are ordered successively, and further, each of the groups of sets of array locations storing the weight values of neurons in different layers are ordered successively in order of the layers of the neural network 10.’ of Anderson.) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung, Hyde and Anderson before him before the effective filing date of the claimed invention, to modify Chung and Hyde to incorporate by storing different stages of the neural network into different types of memory of Anderson. Given the advantage of reducing the additional cost of additional energy on memory when it is not as important, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 5
Chung discloses wherein the neural network is a deep neural network that supports a deep learning model. (Chung, fig 50)

Claim 7
Chung and Hyde do not disclose expressly storing, by a memory controller, parameter weights in different memory banks according to the parameter criticalities of the parameter weights.
Anderson discloses storing, by a memory controller, parameter weights in different memory banks according to the parameter criticalities of the parameter weights. (Anderson, 0022, 0024, fig 2; Storing, by a memory controller, parameter weights in different memory banks according to the parameter criticalities of the parameter weights of applicant maps to ‘The first data structure portion 110 stores a single respective set of values of the input data signals for each layer of the neural network 10. Because all of the input data signals for each particular layer of the neural network are provided to each of the neurons in that layer, the values of each of the input data signals need only be stored once in the first data structure portion 110.’ and ‘Further as shown in FIG. 2, the second data structure portion 120 of the new data structure 100 includes array locations 0-12 corresponding to each of the weight values used by each of the neurons 30, 40 and 60 of each of the layers 20, 50 of the neural network 10. Specifically, the second data structure portion 120 includes array locations 0-4 for storing the weight values 31-35 for neuron 30, array locations 5-9 for storing the weight values 41-45 for neuron 40, and array locations 10-12 for storing the weight values 61-63 for neuron 60. The second data structure portion 120 stores the weight values 31-35, 41-45 and 61-63 sequentially in successive array ( memory) locations. That is, the weight values for each given neuron in each layer are stored sequentially, the sets of array locations storing the sets of weight values for each of the neurons of each respective layer are ordered successively, and further, each of the groups of sets of array locations storing the weight values of neurons in different layers are ordered successively in order of the layers of the neural network 10.’ of Anderson.) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung, Hyde and Anderson before him before the effective filing date of the claimed invention, to modify Chung and Hyde to incorporate by storing different stages of the neural network into different types of memory of Anderson. Given the advantage of reducing the additional cost of additional energy on memory when it is not as important, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 9
Chung discloses a computer program product for optimizing a neural network, the computer program product comprising a non-transitory computer readable storage device having program instructions embodied therewith, the program instructions readable and executable by a computer to (Chung, 0060; For instance, a software-driven host component may correspond to a server computer that executes machine-readable instructions using one or more central processing units (CPUs).) define layers in a neural network based on neuron locations relative to an initial input and a final output of the neural network, wherein neurons in the neural network are hardware processors (Chung, figs 49-50, 0060; Each hardware acceleration component, on the other hand, may correspond to hardware logic for implementing functions, such as a field-programmable gate array (FPGA) device, a massively parallel processor array (MPPA) device, a graphics processing unit (GPU), an application-specific integrated circuit (ASIC), a multiprocessor System-on-Chip (MPSoC), and so on.) wherein a first layer in the neural network is closer to the initial input than a second layer in the neural network, and wherein the second layer is closer to the final output than the first layer (Chung, figs 37, 0246; A feature gathering component 3708 collects the feature values from the feature state machines and makes them available to downstream acceleration components. Feature gathering component 3708 may be implemented with one or computer processors with memory store instructions, or dedicated logic gate arrays implemented, for example, in an FPGA, ASIC, or other similar device.) define parameter criticalities for parameter weights based on the neuron locations relative to the initial input and the final output  wherein the parameter weights weigh an importance of connections between neurons in the neural network in determining the final output, and wherein the parameter criticalities define levels of impact that the parameter weights have on a final output of the neural network. (Chung, 0306, figure 50; Per the specification, [0049] In block 705, one or more processors define parameter criticalities for parameter weights stored in a memory used by the neural network. These parameter criticalities define impacts of parameter weights on the original outgoing final outputs. For example, assume that the parameter shown in FIG. 2 as stored weight w.sub.1 has little impact on the output shown FIG. 2 (assuming that y is one of the final outputs from the DNN 500 shown in FIG. 5). Assume further that the parameter shown in FIG. 2 as stored weight w.sub.2 has a relatively greater impact on the output y. As such, the criticality of parameter w.sub.2 is greater than the parameter criticality of w.sub.1. [0050] Referring again to FIG. 7, in block 707, one or more processors associate the defined layers of the neural network with different memory banks based on the parameter criticalities for the parameter weights. For example, different layers in a DNN are assigned to different memory banks according to their respective significance. By way of further example (with reference now to FIGS. 5-6), a first defined layer (e.g., layer 501 in FIG. 5) is assigned to a first memory bank (e.g., bank 601 in FIG. 6) that is more error-prone than a second memory bank (e.g., bank 604) that is assigned to the second defined layer (e.g., layer 504). As shown in FIG. 6, the first memory bank uses less power than the second memory bank. As such, bank 601 uses less power than bank 604, thus saving power used by DNN 500.) (EC:This establishes different memory error parameters for different layers of a neural network.) ‘ Without wanting to be bound by any particular theory, it is believed that partitioning a DNN into higher memory bandwidth layers (e.g., linear layers) and lower memory bandwidth layers (e.g., convolutional layers), and allocating higher memory bandwidth layers to a host component and lower memory bandwidth layers to an acceleration component using the CNN acceleration techniques described above, may more efficiently implement the DNN compared to implementing the DNN fully on a host component, or fully on an acceleration component.’ Chung assigns the layers to memory priority (i.e. high or low bandwidth) based on location of the layers.  Linear layers (closest to the output) is assigned to high bandwidth memory and convolutional layers (farthest from the output) is assigned to low bandwidth memory.  Note, the weights of these layers will also be stored in the corresponding assigned memory.)
Chung does not disclose expressly wherein the first layer is assigned a first portion of memory that is more error-prone than a second portion of memory that is assigned to the second layer.
Hyde discloses wherein the first layer is assigned a first portion of memory that is more error-prone than a second portion of memory that is assigned to the second layer. (Hyde, 0068; Wherein the first layer is assigned a first portion of memory that is more error-prone than a second portion of memory that is assigned to the second layer of applicant maps to ‘Similarly, other aspects of computation additionally or alternatively to random number generation to achieve advantage such as association of computationally-intensive security operations with higher performing memory while allocating slower or more error prone to less sensitive or less important data.’ of Hyde.) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung and Hyde before him before the effective filing date of the claimed invention, to modify Chung to incorporate better memory as the computation of the neural network algorithm progresses of Hyde. Given the advantage of avoiding re- computing at the later stages of the algorithm due to error failure, one having ordinary skill in the art would have been motivated to make this obvious modification.
Chung and Hyde do not disclose expressly associate defined layers in the neural network with different portions of memory based on defined parameter criticalities,….store parameter weights used by neurons in the first layer in the first portion of memory; and store parameter weights used by neurons in the second layer in the second portion of memory.
Anderson discloses associate defined layers in the neural network with different portions of memory based on defined parameter criticalities,….store parameter weights used by neurons in the first layer in the first portion of memory; and store parameter weights used by neurons in the second layer in the second portion of memory. (Anderson, 0022, 0024, fig 2; Associate defined layers in the neural network with different portions of memory based on defined parameter criticalities ….. store parameter weights used by neurons in the first layer in the first portion of memory; and store parameter weights used by neurons in the second layer in the second portion of memory of applicant maps to ‘The first data structure portion 110 stores a single respective set of values of the input data signals for each layer of the neural network 10. Because all of the input data signals for each particular layer of the neural network are provided to each of the neurons in that layer, the values of each of the input data signals need only be stored once in the first data structure portion 110.’ and ‘Further as shown in FIG. 2, the second data structure portion 120 of the new data structure 100 includes array locations 0-12 corresponding to each of the weight values used by each of the neurons 30, 40 and 60 of each of the layers 20, 50 of the neural network 10. Specifically, the second data structure portion 120 includes array locations 0-4 for storing the weight values 31-35 for neuron 30, array locations 5-9 for storing the weight values 41-45 for neuron 40, and array locations 10-12 for storing the weight values 61-63 for neuron 60. The second data structure portion 120 stores the weight values 31-35, 41-45 and 61-63 sequentially in successive array ( memory) locations. That is, the weight values for each given neuron in each layer are stored sequentially, the sets of array locations storing the sets of weight values for each of the neurons of each respective layer are ordered successively, and further, each of the groups of sets of array locations storing the weight values of neurons in different layers are ordered successively in order of the layers of the neural network 10.’ of Anderson.) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung, Hyde and Anderson before him before the effective filing date of the claimed invention, to modify Chung and Hyde to incorporate by storing different stages of the neural network into different types of memory of Anderson. Given the advantage of reducing the additional cost of additional energy on memory when it is not as important, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 12
Chung discloses wherein the neural network is a deep neural network that supports a deep learning model. (Chung, fig 50)

Claim 16
Chung discloses a system comprising: one or more processors; one or more computer readable memories operably coupled to the one or more processors; one or more computer readable storage mediums operably coupled to the one or more computer readable memories; and program instructions stored on at least one of the one or more computer readable storage mediums for execution by at least one of the one or more processors via at least one of the one or more computer readable memories, the program instructions comprising (Chung, 0060; For instance, a software-driven host component may correspond to a server computer that executes machine-readable instructions using one or more central processing units (CPUs).); program instructions to define layers in a neural network based on neuron locations relative to an initial input and a final output of the neural network, wherein neurons in the neural network are hardware processors (Chung, figs 49-50, 0060; Each hardware acceleration component, on the other hand, may correspond to hardware logic for implementing functions, such as a field-programmable gate array (FPGA) device, a massively parallel processor array (MPPA) device, a graphics processing unit (GPU), an application-specific integrated circuit (ASIC), a multiprocessor System-on-Chip (MPSoC), and so on.), wherein a first layer in the neural network is closer to the initial input than a second layer in the neural network, and wherein the second layer is closer to the final output than the first layer (Chung, figs 37, 0246; A feature gathering component 3708 collects the feature values from the feature state machines and makes them available to downstream acceleration components. Feature gathering component 3708 may be implemented with one or computer processors with memory store instructions, or dedicated logic gate arrays implemented, for example, in an FPGA, ASIC, or other similar device.); program instructions to define parameter criticalities for parameter weights based on the neuron locations relative to the initial input and the final wherein the parameter weights weigh an importance of connections between neurons in the neural network, and wherein the parameter criticalities define levels of impact that the parameter weights have on a final output of the neural network. (Chung, 0306, figure 50; Per the specification, [0049] In block 705, one or more processors define parameter criticalities for parameter weights stored in a memory used by the neural network. These parameter criticalities define impacts of parameter weights on the original outgoing final outputs. For example, assume that the parameter shown in FIG. 2 as stored weight w.sub.1 has little impact on the output shown FIG. 2 (assuming that y is one of the final outputs from the DNN 500 shown in FIG. 5). Assume further that the parameter shown in FIG. 2 as stored weight w.sub.2 has a relatively greater impact on the output y. As such, the criticality of parameter w.sub.2 is greater than the parameter criticality of w.sub.1. [0050] Referring again to FIG. 7, in block 707, one or more processors associate the defined layers of the neural network with different memory banks based on the parameter criticalities for the parameter weights. For example, different layers in a DNN are assigned to different memory banks according to their respective significance. By way of further example (with reference now to FIGS. 5-6), a first defined layer (e.g., layer 501 in FIG. 5) is assigned to a first memory bank (e.g., bank 601 in FIG. 6) that is more error-prone than a second memory bank (e.g., bank 604) that is assigned to the second defined layer (e.g., layer 504). As shown in FIG. 6, the first memory bank uses less power than the second memory bank. As such, bank 601 uses less power than bank 604, thus saving power used by DNN 500.) (EC:This establishes different memory error parameters for different layers of a neural network.) ‘ Without wanting to be bound by any particular theory, it is believed that partitioning a DNN into higher memory bandwidth layers (e.g., linear layers) and lower memory bandwidth layers (e.g., convolutional layers), and allocating higher memory bandwidth layers to a host component and lower memory bandwidth layers to an acceleration component using the CNN acceleration techniques described above, may more efficiently implement the DNN compared to implementing the DNN fully on a host component, or fully on an acceleration component.’ Chung assigns the layers to memory priority (i.e. high or low bandwidth) based on location of the layers.  Linear layers (closest to the output) is assigned to high bandwidth memory and convolutional layers (farthest from the output) is assigned to low bandwidth memory.  Note, the weights of these layers will also be stored in the corresponding assigned memory.)
Chung does not disclose expressly wherein the first layer is assigned a first portion of memory that is more error-prone than a second portion of memory that is assigned to the second layer.
Hyde discloses wherein the first layer is assigned a first portion of memory that is more error-prone than a second portion of memory that is assigned to the second layer. (Hyde, 0068; Wherein the first layer is assigned a first portion of memory that is more error-prone than a second portion of memory that is assigned to the second layer of applicant maps to ‘Similarly, other aspects of computation additionally or alternatively to random number generation to achieve advantage such as association of computationally-intensive security operations with higher performing memory while allocating slower or more error prone to less sensitive or less important data.’ of Hyde.) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung and Hyde before him before the effective filing date of the claimed invention, to modify Chung to incorporate better memory as the computation of the neural network algorithm progresses of Hyde. Given the advantage of avoiding re- computing at the later stages of the algorithm due to error failure, one having ordinary skill in the art would have been motivated to make this obvious modification.
Chung and Hyde do not disclose expressly program instructions to associate defined layers in the neural network with different portions of memory based on defined parameter criticalities,….program instructions to store parameter weights used by neurons in the first layer in the first portion of memory; and program instructions to store parameter weights used by neurons in the second layer in the second portion of memory.
Anderson discloses program instructions to associate defined layers in the neural network with different portions of memory based on defined parameter criticalities,….program instructions to store parameter weights used by neurons in the first layer in the first portion of memory; and program instructions to store parameter weights used by neurons in the second layer in the second portion of memory. (Anderson, 0022, 0024, fig 2; Program instructions to associate defined layers in the neural network with different portions of memory based on defined parameter criticalities ….. program instructions to store parameter weights used by neurons in the first layer in the first portion of memory; and program instructions to store parameter weights used by neurons in the second layer in the second portion of memory of applicant maps to ‘The first data structure portion 110 stores a single respective set of values of the input data signals for each layer of the neural network 10. Because all of the input data signals for each particular layer of the neural network are provided to each of the neurons in that layer, the values of each of the input data signals need only be stored once in the first data structure portion 110.’ and ‘Further as shown in FIG. 2, the second data structure portion 120 of the new data structure 100 includes array locations 0-12 corresponding to each of the weight values used by each of the neurons 30, 40 and 60 of each of the layers 20, 50 of the neural network 10. Specifically, the second data structure portion 120 includes array locations 0-4 for storing the weight values 31-35 for neuron 30, array locations 5-9 for storing the weight values 41-45 for neuron 40, and array locations 10-12 for storing the weight values 61-63 for neuron 60. The second data structure portion 120 stores the weight values 31-35, 41-45 and 61-63 sequentially in successive array ( memory) locations. That is, the weight values for each given neuron in each layer are stored sequentially, the sets of array locations storing the sets of weight values for each of the neurons of each respective layer are ordered successively, and further, each of the groups of sets of array locations storing the weight values of neurons in different layers are ordered successively in order of the layers of the neural network 10.’ of Anderson.) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung, Hyde and Anderson before him before the effective filing date of the claimed invention, to modify Chung and Hyde to incorporate by storing different stages of the neural network into different types of memory of Anderson. Given the advantage of reducing the additional cost of additional energy on memory when it is not as important, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim(s) 2-3, 6, 10-11 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chung, Hyde and Anderson as applied to claim 1, 5, 7, 9, 12 and 16 above, and further in view of Mehrotra. (‘Elements of artificial neural networks’, referred to as Mehrotra)

Claim 2
Chung, Hyde and Anderson do not disclose expressly identifying, by one or more processors, error-prone processing elements and interconnects used in the first layer of the neural network; and further defining, by one or more processors, the parameter criticalities for parameter weights stored in the first portion of memory based on use of the error-prone processing elements and interconnects used in the first layer of the neural network, wherein parameter weights that are less critical to a correct model behavior of the neural network are processed, stored and transmitted using error-prone computing resources and parameter weights in the neural network.
Mehrotra discloses identifying, by one or more processors, error-prone processing elements and interconnects used in the first layer of the neural network (Mehrotra, fig 1.15, p20; Identifying, by one or more processors, error-prone processing elements and interconnects used in the first layer of the neural network of applicant maps to the two nodes in layer 1 of Mehrotra.  EC: Per the specification a processing element is discloses in figure 2 item 202.); and further defining, by one or more processors, the parameter criticalities for parameter weights stored in the first portion of memory based on use of the error-prone processing elements and interconnects used in the first layer of the neural network, wherein parameter weights that are less critical to a correct model behavior of the neural network are processed, stored and transmitted using error-prone computing resources and parameter weights in the neural network. (Mehrotra, fig 1.16(b) p21, p18; Further defining, by one or more processors, the parameter criticalities for parameter weights stored in the first portion of memory based on use of the error-prone processing elements and interconnects used in the first layer of the neural network, wherein parameter weights that are less critical to a correct model behavior of the neural network are processed, stored and transmitted using error-prone computing resources and parameter weights in the neural network of applicant maps to as the progress of the algorithm goes from left to right within the figure, ‘refinement’ is obtained. The processing elements interconnections are defined with ‘Connections, with arbitrary weights, may exist from any node in layer i to any node in layer j for j > i ; intra-layer connections may exist.’ of Mehrotra.) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung, Hyde, Anderson and Mehrotra before him before the effective filing date of the claimed invention, to modify Chung, Hyde and Anderson to incorporate physical computer hardware, multiple layer neural network topology design, neurons which connect nodes between layers, weights associated with the connecting nodes, each layer having an associated critical level with a secondary associated rate of change of critical level, training the neural network of Mehrotra. Given the advantage of the invention to be employed in a real world employing of the invention, having different layers of abstract classification, having a passage of information between a broader concept to an narrow concept, to adjust the answer space of the classification, determining the most critical region of change of the classification answer space, being able to improve the accuracy of the neural network, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 3
Chung, Hyde and Anderson do not disclose expressly identifying, by one or more processors, error-free processing elements and interconnects used in the second layer of the neural network; and further defining, by one or more processors, the parameter criticalities for parameter weights stored in the second portion of memory based on use of the error-free processing elements and interconnects used in the second layer of the neural network, wherein parameter weights that are more critical to a correct model behavior of the neural network are processed, stored and transmitted using error-free computing resources and parameter weights in the neural network.
Mehrotra discloses identifying, by one or more processors, error-free processing elements and interconnects used in the second layer of the neural network (Mehrotra, fig 1.15, p20; Identifying, by one or more processors, error-free processing elements and interconnects used in the second layer of the neural network of applicant maps to the two nodes in layer 2 of Mehrotra.  EC: Per the specification a processing element is discloses in figure 2 item 202.); and further defining, by one or more processors, the parameter criticalities for parameter weights stored in the second portion of memory based on use of the error-free processing elements and interconnects used in the second layer of the neural network, wherein parameter weights that are more critical to a correct model behavior of the neural network are processed, stored and transmitted using error-free computing resources and parameter weights in the neural network. (Mehrotra, fig 1.16(b) p21, p18; Further defining, by one or more processors, the parameter criticalities for parameter weights stored in the second portion of memory based on use of the error-free processing elements and interconnects used in the second layer of the neural network, wherein parameter weights that are more critical to a correct model behavior of the neural network are processed, stored and transmitted using error-free computing resources and parameter weights in the neural network of applicant maps to as the progress of the algorithm goes from left to right within the figure, ‘refinement’ is obtained. In this case, ‘second layer’ would be further ‘right’ than that of the ‘first layer.’ The processing elements interconnections are defined with ‘Connections, with arbitrary weights, may exist from any node in layer i to any node in layer j for j > i ; intra-layer connections may exist.’ of Mehrotra.) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung, Hyde, Anderson and Mehrotra before him before the effective filing date of the claimed invention, to modify Chung, Hyde and Anderson to incorporate physical computer hardware, multiple layer neural network topology design, neurons which connect nodes between layers, weights associated with the connecting nodes, each layer having an associated critical level with a secondary associated rate of change of critical level, training the neural network of Mehrotra. Given the advantage of the invention to be employed in a real world employing of the invention, having different layers of abstract classification, having a passage of information between a broader concept to an narrow concept, to adjust the answer space of the classification, determining the most critical region of change of the classification answer space, being able to improve the accuracy of the neural network, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 6
Chung, Hyde and Anderson do not disclose expressly wherein the parameter criticalities are determined based on an amount of change for each parameter weight across training iterations epochs.
Mehrotra discloses wherein the parameter criticalities are determined based on an amount of change for each parameter weight across training iterations epochs. (Mehrotra, p22; Wherein the parameter criticalities are determined based on an amount of change for each parameter weight across training iterations epochs of applicant maps to ‘In artificial neural networks, learning refers to the method of modifying the weights of connections between the nodes of a specified network.’ of Mehrotra. EC) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung, Hyde, Anderson and Mehrotra before him before the effective filing date of the claimed invention, to modify Chung, Hyde and Anderson to incorporate physical computer hardware, multiple layer neural network topology design, neurons which connect nodes between layers, weights associated with the connecting nodes, each layer having an associated critical level with a secondary associated rate of change of critical level, training the neural network of Mehrotra. Given the advantage of the invention to be employed in a real world employing of the invention, having different layers of abstract classification, having a passage of information between a broader concept to an narrow concept, to adjust the answer space of the classification, determining the most critical region of change of the classification answer space, being able to improve the accuracy of the neural network, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 10
Chung, Hyde and Anderson do not disclose expressly identify error-prone processing elements and interconnects used in the first layer of the neural network; and further define the parameter criticalities for parameter weights stored in the first portion of memory based on use of the error-prone processing elements and interconnects used in the first layer of the neural network, wherein parameter weights that are less critical to a correct model behavior of the neural network are processed, stored and transmitted using error-prone computing resources and parameter weights in the neural network.
Mehrotra discloses identify error-prone processing elements and interconnects used in the first layer of the neural network (Mehrotra, fig 1.15, p20; Identify error-prone processing elements and interconnects used in the first layer of the neural network of applicant maps to the two nodes in layer 1 of Mehrotra.  EC: Per the specification a processing element is discloses in figure 2 item 202.); and further define the parameter criticalities for parameter weights stored in the first portion of memory based on use of the error-prone processing elements and interconnects used in the first layer of the neural network, wherein parameter weights that are less critical to a correct model behavior of the neural network are processed, stored and transmitted using error-prone computing resources and parameter weights in the neural network. (Mehrotra, fig 1.16(b) p21, p18; Further define the parameter criticalities for parameter weights stored in the first portion of memory based on use of the error-prone processing elements and interconnects used in the first layer of the neural network, wherein parameter weights that are less critical to a correct model behavior of the neural network are processed, stored and transmitted using error-prone computing resources and parameter weights in the neural network of applicant maps to as the progress of the algorithm goes from left to right within the figure, ‘refinement’ is obtained. The processing elements interconnections are defined with ‘Connections, with arbitrary weights, may exist from any node in layer i to any node in layer j for j > i ; intra-layer connections may exist.’ of Mehrotra.) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung, Hyde, Anderson and Mehrotra before him before the effective filing date of the claimed invention, to modify Chung, Hyde and Anderson to incorporate physical computer hardware, multiple layer neural network topology design, neurons which connect nodes between layers, weights associated with the connecting nodes, each layer having an associated critical level with a secondary associated rate of change of critical level, training the neural network of Mehrotra. Given the advantage of the invention to be employed in a real world employing of the invention, having different layers of abstract classification, having a passage of information between a broader concept to an narrow concept, to adjust the answer space of the classification, determining the most critical region of change of the classification answer space, being able to improve the accuracy of the neural network, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 11
Chung, Hyde and Anderson do not disclose expressly identify error-free processing elements and interconnects used in the second layer of the neural network; and further define the parameter criticalities for parameter weights stored in the second portion of memory based on use of the error-free processing elements and interconnects used in the second layer of the neural network, wherein parameter weights that are more critical to a correct model behavior of the neural network are processed, stored and transmitted using error-free computing resources and parameter weights in the neural network.
Mehrotra discloses identify error-free processing elements and interconnects used in the second layer of the neural network (Mehrotra, fig 1.15, p20; Identify error-free processing elements and interconnects used in the second layer of the neural network of applicant maps to the two nodes in layer 2 of Mehrotra.  EC: Per the specification a processing element is discloses in figure 2 item 202.); and further define the parameter criticalities for parameter weights stored in the second portion of memory based on use of the error-free processing elements and interconnects used in the second layer of the neural network, wherein parameter weights that are more critical to a correct model behavior of the neural network are processed, stored and transmitted using error-free computing resources and parameter weights in the neural network. (Mehrotra, fig 1.16(b) p21, p18; Further define the parameter criticalities for parameter weights stored in the second portion of memory based on use of the error-free processing elements and interconnects used in the second layer of the neural network, wherein parameter weights that are more critical to a correct model behavior of the neural network are processed, stored and transmitted using error-free computing resources and parameter weights in the neural network of applicant maps to as the progress of the algorithm goes from left to right within the figure, ‘refinement’ is obtained. In this case, ‘second layer’ would be further ‘right’ than that of the ‘first layer.’ The processing elements interconnections are defined with ‘Connections, with arbitrary weights, may exist from any node in layer i to any node in layer j for j > i ; intra-layer connections may exist.’ of Mehrotra.) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung, Hyde, Anderson and Mehrotra before him before the effective filing date of the claimed invention, to modify Chung, Hyde and Anderson to incorporate physical computer hardware, multiple layer neural network topology design, neurons which connect nodes between layers, weights associated with the connecting nodes, each layer having an associated critical level with a secondary associated rate of change of critical level, training the neural network of Mehrotra. Given the advantage of the invention to be employed in a real world employing of the invention, having different layers of abstract classification, having a passage of information between a broader concept to an narrow concept, to adjust the answer space of the classification, determining the most critical region of change of the classification answer space, being able to improve the accuracy of the neural network, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 19
Chung, Hyde and Anderson do not disclose expressly wherein a parameter weight for a neuron in the first layer has less impact on a value of the final output than a neuron in the second layer.
Mehrotra discloses wherein a parameter weight for a neuron in the first layer has less impact on a value of the final output than a neuron in the second layer. (Mehrotra, p38; The capabilities of a network are limited by its size . Despite this, the use of large networks increases training time and reduces generalizability . Size of a network can be measured in terms of the number of nodes, connections, and layers in a network . Complexity of node functions, possibly estimated as the number of bits needed to represent the functions, also contributes to network complexity measures. EC: Mehrotra is saying the deeper networks (In terms of nodes) there is less generalizability or increased precision. In terms of the claim increased impact.) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung, Hyde, Anderson and Mehrotra before him before the effective filing date of the claimed invention, to modify Chung, Hyde and Anderson to incorporate physical computer hardware, multiple layer neural network topology design, neurons which connect nodes between layers, weights associated with the connecting nodes, each layer having an associated critical level with a secondary associated rate of change of critical level, training the neural network of Mehrotra. Given the advantage of the invention to be employed in a real world employing of the invention, having different layers of abstract classification, having a passage of information between a broader concept to an narrow concept, to adjust the answer space of the classification, determining the most critical region of change of the classification answer space, being able to improve the accuracy of the neural network, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim(s) 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chung, Hyde and Anderson as applied to claim 1, 5, 7, 9, 12 and 16 above, and further in view of Duch. (‘Energy vs. Reliability Trade-offs Exploration in Biomedical Ultra-Low Power Devices’, referred to as Duch)

Claim 4
Chung, Hyde and Anderson do not disclose expressly wherein the first portion of memory uses less power than the second portion of memory.
Duch discloses wherein the first portion of memory uses less power than the second portion of memory. (Duch, p840, fig 3; Wherein the first portion of memory uses less power than the second portion of memory of applicant maps to ‘Error prone memory running below nominal supply voltage, Error free memory running at high supply voltage’ of Duch.) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung, Hyde, Anderson and Duch before him before the effective filing date of the claimed invention, to modify Chung, Hyde and Anderson to incorporate low energy required memory for insignificant details of Duch. Given the advantage of reducing the additional cost of additional energy on memory when it is not as important, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim(s) 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chung, Hyde and Anderson as applied to claim 1, 5, 7, 9, 12 and 16 above, and further in view of Horesh. (U. S. Patent Publication 20150161987, referred to as Horesh)

Claim 20
Chung, Hyde and Anderson do not disclose expressly wherein the program instructions are provided as a service in a cloud environment.
Horesh discloses wherein the program instructions are provided as a service in a cloud environment. (Horesh, 0120; Wherein the program instructions are provided as a service in a cloud environment of applicant maps to ‘Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with computer system/server 1012 include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, multiprocessor systems, microprocessor -based systems, set top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computer systems, and distributed cloud computing environments that include any of the above systems or devices, and the like.’ of Horesh.) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung, Hyde, Anderson and Horesh before him before the effective filing date of the claimed invention, to modify Chung, Hyde and Anderson to incorporate being able to implement the huge memory requirements of a deep neural network of Horesh. Given the advantage of reducing the additional cost of additional energy on memory when it is not as important, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim(s) 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chung, Hyde and Anderson as applied to claim 1, 5, 7, 9, 12 and 16 above, and further in view of Alkon. (U. S. Patent 5588091, referred to as Alkon)

Claim 8
Chung, Hyde and Anderson do not disclose expressly wherein the parameter weights are established by: modifying, by one or more processors, the initial input by flipping bits in the initial input; generating, by the neural network, a new final output based on the modified initial input; and adjusting, by one or more processors, the parameter weights until the new final output matches the final output.
Alkon discloses wherein the parameter weights are established by: modifying, by one or more processors, the initial input by flipping bits in the initial input; generating, by the neural network, a new final output based on the modified initial input (Alkon, c24:1-14; A testing set was generated by adding multiplicative noise, that is randomly changing bits in the input patterns. Flipping 30 percent of the bits produces patterns with a signal-to-noise ratio of 1.75:1 (note that flipping 50 percent of the bits results in a completely random pattern).); and adjusting, by one or more processors, the parameter weights until the new final output matches the final output. (Alkon, c25:5-19; The network was trained by presenting each pattern in the training set once (increasing the number of pattern presentations did not change performance.) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung, Hyde, Anderson and Alkon before him before the effective filing date of the claimed invention, to modify Chung, Hyde and Anderson to incorporate ‘flipping bits of information’ with input data of Alkon. Given the advantage of the invention flipping bit of input equates introducing noise into the input. Training data that has noise results in a robust neural network, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim(s) 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chung, Hyde and Anderson as applied to claim 1, 5, 7, 9, 12 and 16 above, and further in view of Tsai. (U. S. Patent Publication 20110066580, referred to as Tsai)


Claim 18
Chung and Hyde do not disclose expressly.... in response to determining that the second quantity of neurons is greater than the first quantity of neurons, storing, by one or more processors, parameter weights for neurons in the first layer in the first portion of memory and storing, by one or more processors, parameter weights for neurons in the second layer in the second portion of memory.
Anderson discloses.... in response to determining that the second quantity of neurons is greater than the first quantity of neurons, storing, by one or more processors, parameter weights for neurons in the first layer in the first portion of memory and storing, by one or more processors, parameter weights for neurons in the second layer in the second portion of memory. (Anderson, 0007; By storing the input signals for each given layer of neurons only once, the first data structure portion reduces the amount of memory required for implementing the neural network. Additionally, by sequentially storing the input data signals for each given layer, sequentially storing the sets of input data signals in accordance with the layers of the neural network, and sequentially storing the weight values corresponding to the different input data signals being provided to each of the different neurons, both the storage and retrieval of information to and from memory can be performed at a rapid pace during processing of the neural network.) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung, Hyde and Anderson before him before the effective filing date of the claimed invention, to modify Chung and Hyde to incorporate by storing different stages of the neural network into different types of memory of Anderson. Given the advantage of reducing the additional cost of additional energy on memory when it is not as important, one having ordinary skill in the art would have been motivated to make this obvious modification.
Chung, Hyde and Anderson do not disclose expressly identifying, by one or more processors, a first quantity of neurons in the first layer in a neural network; identifying, by one or more processors, a second quantity of neurons in the second layer in the neural network; determining, by one or more processors, that the second quantity of neurons is greater than the first quantity of neurons.
Tsai discloses identifying, by one or more processors, a first quantity of neurons in the first layer in a neural network; identifying, by one or more processors, a second quantity of neurons in the second layer in the neural network; determining, by one or more processors, that the second quantity of neurons is greater than the first quantity of neurons. (Tsai, 0054; Specifically, the SOM method adapted in this step is the same as that used in the first layer neuron training step S3; both terminate the training procedures based on the same condition--namely, the averaged distortion rate. Therefore, the proposed method is more suitable for training the groups 2 with different number of the second-level neurons 16. This avoids wasting too much time on training the groups 2 with less number of the second-level neurons 16. EC: Here Tsai is disclosing the number of neurons in each layer and a specific action if neurons numbers per layer differ in a specific way.) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung, Hyde, Anderson and Tsai before him before the effective filing date of the claimed invention, to modify Chung, Hyde and Anderson to incorporate being able to implement the ability to keep track of neurons of a neural network per layer of Tsai. Given the advantage of determining if additional memory is required. If not memory is better utilized, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim(s) 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chung, Hyde, Mehrotra and Anderson as applied to claims 2-3, 6-8, 10-11 and 13-14 above, and further in view of Matzlavi. (U. S. Patent Publication 20150134424, referred to as Matzlavi)

Claim 17
Chung, Hyde, Mehrotra and Anderson do not disclose expressly wherein an increase in a level of change to a particular parameter weight during training results in a proportional increase in a parameter criticality for the particular parameter weight.
Matzlavi discloses wherein an increase in a level of change to a particular parameter weight during training results in a proportional increase in a parameter criticality for the particular parameter weight. (Matzlavi, 0031-0032, 0035; ‘Examples of organizational parameters include criticality, regulations, and organizational approach, which are described as follows:’ and ‘The level of criticality may be a significant parameter in a hybridization decision.’ With ‘FIG. 4 shows an example table of historical services and associated quantitative and organizational parameters and goals. The table of historical services, quantitative and organizational parameters, and goals may be used to train a decision model.’ EC: Matzlavi connects parameters with weights and training models is associated with modifying weights.) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung, Hyde, Mehrotra, Anderson and Matzlavi before him before the effective filing date of the claimed invention, to modify Chung, Hyde, Mehrotra and Anderson to incorporate associating a

Claim(s) 13-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chung, Hyde, Mehrotra and Anderson as applied to claims 2-3, 6, 10-11 and 19 above, and further in view of OShea. (An Introduction to Convolutional Neural Networks, referred to as OShea)

Claim 13
Chung, Hyde, Anderson and Mehrotra do not disclose expressly wherein the neural network comprises an input layer, a hidden layer, and an output layer, wherein the input layer has more neurons than the hidden layer, wherein the hidden layer has more neurons that the output layer, and wherein the computer-implemented method further comprises: determining, by one or more processors, that a quantity of parameter weights used by the input layer exceeds a quantity of parameter weights used by the hidden layer; and in response to determining that the quantity of parameter weights used by the input layer exceeds the quantity of parameter weights used by the hidden layer, determining, by one or more processors, that each of the parameters weights used by the neurons in the input layer have less of an impact on the output layer than each of the parameter weights used by the neurons in the hidden layer such that each of the parameter weights used by the neurons in the input layer and each of the parameter weights used by the neurons in the hidden layer differently weight an impact on the final output of the neural network based on the hidden layer having fewer neurons and parameter weights than the input layer.
OShea discloses wherein the neural network comprises an input layer, a hidden layer, and an output layer, wherein the input layer has more neurons than the hidden layer, wherein the hidden layer has more neurons that the output layer (OShea, fig 1), and wherein the computer-implemented method further comprises: determining, by one or more processors, that a quantity of parameter weights used by the input layer exceeds a quantity of parameter weights used by the hidden layer; and in response to determining that the quantity of parameter weights used by the input layer exceeds the quantity of parameter weights used by the hidden layer (OShea, fig 1 EC: Each connection between nodes has an associated weight. In this example, there are 8 connections (and weights) between input and hidden layers and 2 connections (and weights) between hidden and output layers), determining, by one or more processors, that each of the parameters weights used by the neurons in the input layer have less of an impact on the output layer than each of the parameter weights used by the neurons in the hidden layer such that each of the parameter weights used by the neurons in the input layer and each of the parameter weights used by the neurons in the hidden layer differently weight an impact on the final output of the neural network based on the hidden layer having fewer neurons and parameter weights than the input layer. (OShea, fig 1: EC: Since there are 8 weights vs 2 weights, the loss of a connection would be 7(12.25% loss) and 1(50% loss) weights. Thus a ‘differently weight an impact.’) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung, Hyde, Anderson and OShea before him before the effective filing date of the claimed invention, to modify Chung, Hyde and Anderson to incorporate the basic elements of convolutional neural networks of OShea. Given the advantage of the invention flipping bit of input equates introducing noise into the input. Training data that has noise results in a robust neural network, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 14
Chung, Hyde and Anderson do not disclose expressly discloses wherein the neural network comprises an input layer, a hidden layer, and an output layer, wherein the input layer has more neurons than the hidden layer, wherein the hidden layer has more neurons that the output layer and wherein the computer- implemented method further comprises: determining, by one or more processors, that there is an overlapping functionally between neurons in the input layer of the neural network based on the input layer having more neurons than the hidden layer and the output layer.
Mehrotra discloses wherein the neural network comprises an input layer, a hidden layer, and an output layer, wherein the input layer has more neurons than the hidden layer, wherein the hidden layer has more neurons that the output layer (Mehrotra, p106, fig 3.14) and wherein the computer- implemented method further comprises: determining, by one or more processors, that there is an overlapping functionally between neurons in the input layer of the neural network based on the input layer having more neurons than the hidden layer and the output layer. (Mehrotra, p106, fig 3.14, p11, fig 1.5) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung, Hyde, Anderson and Mehrotra before him before the effective filing date of the claimed invention, to modify Chung, Hyde and Anderson to incorporate physical computer hardware, multiple layer neural network topology design, neurons which connect nodes between layers, weights associated with the connecting nodes, each layer having an associated critical level with a secondary associated rate of change of critical level, training the neural network of Mehrotra. Given the advantage of the invention to be employed in a real world employing of the invention, having different layers of abstract classification, having a passage of information between a broader concept to an narrow concept, to adjust the answer space of the classification, determining the most critical region of change of the classification answer space, being able to improve the accuracy of the neural network, one having ordinary skill in the art would have been motivated to make this obvious modification.
Chung, Hyde, Anderson and Mehrotra do not disclose expressly in response to determining that there is the overlapping functionality between neurons in the input layer of the neural network based on to the input layer having more neurons than the hidden layer and the output layer, determining, by one or more processors, that the parameter weights used by neurons in the input layer have less of an impact on the final output of the neural network than parameter weights used by neurons in the hidden layer.
OShea discloses in response to determining that there is the overlapping functionality between neurons in the input layer of the neural network based on to the input layer having more neurons than the hidden layer and the output layer (OShea, p6; When the data hits a convolutional layer, the layer convolves each filter across the spatial dimensionality of the input to produce a 2D activation map. These activation maps can be visualized, as seen in Figure 3.), determining, by one or more processors, that the parameter weights used by neurons in the input layer have less of an impact on the final output of the neural network than parameter weights used by neurons in the hidden layer. (OShea, fig 1: EC: Since there are 8 weights vs 2 weights, the loss of a connection would be 7(12.25% loss) and 1(50% loss) weights. Thus a ‘differently weight an impact.’) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung, Hyde, Anderson and OShea before him before the effective filing date of the claimed invention, to modify Chung, Hyde and Anderson to incorporate the basic elements of convolutional neural networks of OShea. Given the advantage of the invention flipping bit of input equates introducing noise into the input. Training data that has noise results in a robust neural network, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 15
Chung, Hyde, Anderson and Mehrotra do not disclose expressly wherein the neural network comprises an input layer, a hidden layer, and an output layer, wherein the neural network is trained to identify a handwritten figure that is made up of a first set of multiple sections of the handwritten figure and a second set of multiple sections of the handwritten figure, and wherein the computer-implemented method further comprises: inputting the first set of multiple sections of the handwritten figure into a first set of neurons in the input layer; inputting the second set of multiple sections of the handwritten figure into a second set of neurons in the input layer; identifying, by the neural network, the handwritten figure using only the first set of multiple sections of the handwritten figure; in response to the neural network identifying the handwritten figure using only the first set of multiple sections of the handwritten figure, determining, by one or more processors, that parameter weights associated with the second set of neurons in the input layer are inconsequential to the final output of the neural network.
OShea discloses wherein the neural network comprises an input layer, a hidden layer, and an output layer (OShea, fig 1), wherein the neural network is trained to identify a handwritten figure (OShea, fig 2: EC It is either a ‘0’ or a ‘9.’) that is made up of a first set of multiple sections of the handwritten figure and a second set of multiple sections of the handwritten figure (OShea, fig 2; The input is a 6x6 matrix. First set of multiple sections is taking the 6x6 matrix and using a 3x3 matrix sliding horizontality and vertically to produce 9 3x3 matrixes.) , and wherein the computer-implemented method further comprises: inputting the first set of multiple sections of the handwritten figure into a first set of neurons in the input layer (OShea, fig 2: EC It is either a ‘0’ or a ‘9.’); inputting the second set of multiple sections of the handwritten figure into a second set of neurons in the input layer (OShea, fig 2; ‘fully-connected’); identifying, by the neural network, the handwritten figure using only the first set of multiple sections of the handwritten figure (OShea, fig 2; Output of either a ‘0’ or a ‘9.’); in response to the neural network identifying the handwritten figure using only the first set of multiple sections of the handwritten figure, determining, by one or more processors, that parameter weights associated with the second set of neurons in the input layer are inconsequential to the final output of the neural network. (OShea, fig 4 EC: The pooled vector matrix is a 3x3 but only has a computed value in four locations. The remaining locations and associated neurons can be viewed an ‘inconsequential.’) It would have been obvious to one having ordinary skill in the art, having the teachings of Chung, Hyde, Anderson and OShea before him before the effective filing date of the claimed invention, to modify Chung, Hyde and Anderson to incorporate the basic elements of convolutional neural networks of OShea. Given the advantage of the invention flipping bit of input equates introducing noise into the input. Training data that has noise results in a robust neural network, one having ordinary skill in the art would have been motivated to make this obvious modification.

Response to Arguments
4.	Applicant’s arguments filed on 2/2/2022, 2/15/2022 and 3/9/2022 for claims 1-20 have been fully considered but are not persuasive.

5.	Applicant’s argument:
Rejections under 35 U.S.C. § 101

In paragraph 3 of the present Office Action, Claims 9-15 are rejected for potentially claiming non-statutory subject matter when claiming a “computer readable storage device”. Although Applicant believes that a “device” is a tangible non-transitory device (e.g., a floppy disk, a hard drive, etc. — see paragraph [0026] of the present specification), in an effort to promote the prosecution of the present docket, independent/base Claim 9 is now amended to claim “a non- transitory computer readable storage device”.

As such, Applicant now requests that the rejection of Claims 9-15 under 35 U.S.C. § 101 be removed.

Examiners answer:
The rejection is removed. 

6.	Applicant’s argument:
Claim rejections under 35 U.S.C. § 112

In paragraph 4 of the present Office Action, Claims 1-20 are rejected under 35 U.S.C. § 112(a) as failing to comply with the written description requirement.

More specifically, Claims 1, 9, and 16 are rejected under 35 U.S.C. § 112(a) for claiming “wherein the parameter weights weight an importance of connections between neurons in the neural network”. That is, the present Office Action states that the specification lacks a “written description that explains how weights are used as an indicator of importance of connections”, or even what is meant by “importance of connections”.

Applicant believes that these concepts/terms are supported by the specification. 

For example, consider FIG. 3 and paragraph [0036] of the present specification. 

As described in paragraph [0036] of the present specification, assume that there are 784 neurons in the input layer 303, 128 neurons in the hidden layer 305, and 10 neurons in the output layer 307 of a deep neural network (DNN) 301. Assume that each connection between two neurons includes a weight, which is a parameter for multiplying the value of an input (see paragraph [0028] of the present specification).

There are about 100,352 (784x128) connections between neurons in the input layer 103 and the hidden layer 305). Since each connection has a weight, then there are 100,352 weights for connections between neurons in the input layer 303 and the hidden layer 305.

However, there are only about 1,280 (128x10) connections between the middle layer neurons and the output layer neurons such, and thus there are only 1,280 weights for the connections between these neurons. Thus, due to factors such as redundancy, numerousness, etc., each of the weights associated with neurons associated with the input layer 303 are individually less important than each of the weights associated with hidden layer 305, which are individually less important than each of the weights associated with the output layer 307. That is, neurons and their respective weights in different layers of the DNN 301 contribute differently to the outcome of the network. Specifically, those neurons in layers closer to the input have a lower impact on the model’s output as compared to neurons in the layers farther from the input.

Therefore, since each of numerous connections (between neurons in the input layer 303 and the middle layer 305) are less important that the fewer connections (between neurons in the middle layer 305 and the output layer 307), then the parameter weights (i.e., the weights of the connections between two neurons) “weight an importance of connections between neurons in the neural network in determining the final output”.

This “importance” is further described in currently amended Claim 13 (supported by FIG. 3 and paragraph [0036] of the present specification) by claiming that the importance of the connections is determined by how many parameter weights are used in different sections of the neural network:

Examiners answer:
The examiner disagrees with the applicant’s assumption of ‘redundancy.’  The concept of ‘importance’ based on a assumption is groundless. There are no specific formulas or algorithm which generate a determination of what is importance and what is not importance. 

7.	Applicant’s argument:
In another embodiment of the present invention, and as claimed in currently amended dependent Claim 14 (supported by FIG. 5 and paragraph [0040] of the present specification), the importance of the connections is determined by overlapping functionality in neurons in the input layer of the neural network:

Examiners answer:
The examiner disagrees with the assumption of ‘overlapping.’ This can be viewed as a refining process in which all neurons have a part of the result. 

8.	Applicant’s argument:
Since the term “important” seems to be creating an impediment to the prosecution of the present docket, the present amendment amends Claim 1 to change “importance of connections” to “impact that connections between neurons in the neural network have in determining the final output’.

Examiners answer:
The rejection in regards to claim 1 of this topic has been withdrawn. 

9.	Applicant’s argument:
Currently amended dependent Claim 13 (supported by FIG. 3 and paragraph [0036]) describe the different impact that the number of neurons/weights in different layers of the neural network have on the output. More specifically, the input layer has more neurons/weights than the hidden layer of the neural network. Therefore, “each of the parameters weights used by the neurons in the input layer have less of an impact on the output layer than each of the parameter weights used by the neurons in the hidden layer such that each of the parameter weights used by the neurons in the input layer and each of the parameter weights used by the neurons in the hidden layer differently weight an impact on the final output of the neural network based on the hidden layer having fewer neurons and parameter weights than the input layer”.

Currently amended dependent Claim 14 (supported by FIG. 3 and paragraph [0040]) describes the numerous neurons in the input layer of the neural network leading to overlapping functionality between neurons in the input layer, thus leading to a determination that parameter weights used by neurons in the input layer have less of an impact on the output layer than parameter weights used by neurons in the hidden layer.

Again, it is the difference in impact that each single weight/neuron in the input layer has on the output of the neural network, relative to the impact of each single weight/neuron found in the hidden layer, that is being addressed, not the combined neurons/weights in the input layer or hidden layer.

Examiners answer:
Claims 13 and 14 are now rejected by OShea. 

10.	Applicant’s argument:
Nothing in Chung teaches or suggests storing parameter weights of neurons that are close to an initial input in error-prone memory, while storing parameter weights of neurons that are closer to the final output in memory that is less error-prone.

Nothing in Hyde teaches or suggests storing parameter weights of neurons that are close to an initial input in error-prone memory, while storing parameter weights of neurons that are closer to the final output in memory that is less error-prone.

Examiners answer:
‘Storing parameter weights’ is not mentioned within the claims. The examiner is unsure what claim the applicant is arguing. Hyde of 0068 addresses , ‘…associating, by one or more processors, defined layers in the neural network with portions of memory based on defined parameter criticalities, wherein the first layer is assigned a first portion of memory that is more error-prone than a second portion of memory that is assigned to the second layer…’ There is no mention of storing parameter weights. 

11.	Applicant’s argument:
Nothing in Anderson teaches or suggests storing parameter weights of neurons that are close to an initial input in error-prone memory, while storing parameter weights of neurons that are closer to the final output in memory that is less error-prone.

Examiners answer:
 	Because all of the input data signals for each particular layer of the neural network are provided to each of the neurons in that layer, the values of each of the input data signals need only be stored once in the first data structure portion 110.’ and ‘Further as shown in FIG. 2, the second data structure portion 120 of the new data structure 100 includes array locations 0-12 corresponding to each of the weight values used by each of the neurons 30, 40 and 60 of each of the layers 20, 50 of the neural network 10. Specifically, the second data structure portion 120 includes array locations 0-4 for storing the weight values 31-35 for neuron 30, array locations 5-9 for storing the weight values 41-45 for neuron 40, and array locations 10-12 for storing the weight values 61-63 for neuron 60. The second data structure portion 120 stores the weight values 31-35, 41-45 and 61-63 sequentially in successive array ( memory) locations. That is, the weight values for each given neuron in each layer are stored sequentially, the sets of array locations storing the sets of weight values for each of the neurons of each respective layer are ordered successively, and further, each of the groups of sets of array locations storing the weight values of neurons in different layers are ordered successively in order of the layers of the neural network 10.’
Because the first data structure is used only once, the reliable of the memory is not a factor. The second data structure holds a plurality of weights and reliability is required. 

12.	Applicant’s argument:
That is, if a processing element and its interconnections in the first layer (closest to the input) of a neural network are error prone, then parameter weights used by that processing element are deemed to be less critical to how the neural network functions. This information is then used to further define the parameter criticalities for the parameters weights (identifying. .. error-prone processing elements and interconnects used in the first layer of the neural network; and further defining...the parameter criticalities for parameter weights stored in the first portion of memory based on use of the error-prone processing elements and interconnects used in the first layer of the neural network, wherein parameter weights that are less critical to a correct model behavior of the neural network are processed, stored and transmitted using error-prone computing resources and parameter weights in the neural network”).

The present Office Action cites pages 18 and 20-21 of Mehrotra as teaching this feature.

Pages 18 and 20 of Mehrotra merely describe input nodes and output nodes in a neural network.

Page 21 of Mehrotra merely describes various types of interconnections among neurons in a neural network.

No combination of the cited prior art, and specifically the cited passages from Mehrotra, teach or suggest using a condition of a processing element and its interconnections in the first layer (closest to the input) being error prone to further define parameter criticalities for parameters weights (“identifying...error-prone processing elements and interconnects used in the first layer of the neural network; and further defining...the parameter criticalities for parameter weights stored in the first portion of memory based on use of the error-prone processing elements and interconnects used in the first layer of the neural network, wherein parameter weights that are less critical to a correct model behavior of the neural network are processed, stored and transmitted using error-prone computing resources and parameter weights in the neural network”).

The rejection of original dependent computer program product Claim 10 is respectfully traversed based on the arguments presented in the traversal of dependent computer-implemented method Claim 2.

Examiners answer:
Mehrotra is used to disclose processors, connections and associated weights. different memory requirements are addressed by Anderson. 

13.	Applicant’s argument:
With regard to exemplary original dependent computer-implemented method Claim 3, a combination of the cited prior art does not teach or suggest further identifying where to store parameter criticalities based on the parameter criticalities being used with parameter weights used in neurons having error-free processing elements and interconnects (“identifying...error-free processing elements and interconnects used in the second layer of the neural network; and further defining...the parameter criticalities for parameter weights stored in the second portion of memory based on use of the error-free processing elements and interconnects used in the second layer of the neural network, wherein parameter weights that are more critical to a correct model behavior of the neural network are processed, stored and transmitted using error-free computing resources and parameter weights in the neural network”).

The present Office Action again cites 18 and 20-21 of Mehrotra. As stated above, pages 18 and 20 of Mehrotra describe input nodes and output nodes in a neural network, and page 21 of Mehrotra describes various types of interconnections among neurons in a neural network.

The rejection of original dependent computer program product Claim 11 is respectfully traversed based on the arguments presented in the traversal of dependent computer-implemented method Claim 3.

Examiners answer:
Mehrotra is used to disclose processors, connections and associated weights. different memory requirements are addressed by Anderson. 

14.	Applicant’s argument:
With regard to exemplary original dependent computer-implemented method Claim 6, a combination of the cited art does not teach or suggest determining how critical a weight is based on how much it is changed during training (“wherein the parameter criticalities are determined based on an amount of change for each parameter weight across training iterations epochs’).

Examiners answer:
Claim 6 pertains to a decision engine in regards to training. Applicant’s argument above is moot. 

15.	Applicant’s argument:
For example, and as described in paragraph [0060] of the present specification as originally filed, “assume that training data is run through DNN 500. Assume now that bits are intentionally flipped in certain portions of the training data, followed by a change to the parameter weights. That is, once the training data bits are flipped, the parameter weights are changed until a same output results. The greater the change to a parameter weight; the greater the parameter criticality of that parameter weight.” For example, “if a first parameter weight is doubled and a second parameter weight is tripled in order to arrive at the same output value from the DNN 500, then the second parameter weight is deemed to be more critical since it must be more greatly modified in order to achieve the same result.”

Examiners answer:
Changing bit values (flipped) are disclosed in 8, not claim 6. This is the introduction of noise into training data. The examiner does not agree with the statement, ‘The greater the change to a parameter weight; the greater the parameter criticality of that parameter weight.’

16.	Applicant’s argument:
The present Office Action cites page 22 of Mehrotra as teaching this feature. Page 22 of Mehrotra teaches that a neural network can be trained to output a certain value by adjusting weights used by the neurons in the neural network. This does not teach or suggest determining how critical a weight is based on how much it is changed during training (“wherein the parameter criticalities are determined based on an amount of change for each parameter weight across training iterations epochs”).

Examiners answer:
As stated above, claim 6 pertains to a decision engine in regards to training. Applicant’s argument above is moot. 

17.	Applicant’s argument:
That is, in exemplary dependent Claim 6, how critical a parameter weight is depends on how much that parameter weight changed during training (“wherein the parameter criticalities are determined based on an amount of change for each parameter weight across training iterations epochs”). 

For example, and as described in paragraph [0060] of the present specification, “The greater the change to a parameter weight; the greater the parameter criticality of that parameter weight. That is, if a first parameter weight is doubled and a second parameter weight is tripled in order to arrive at the same output value from the DNN 500, then the second parameter weight is deemed to be more critical since it must be more greatly modified in order to achieve the same result.” The specific scenario is now found in dependent Claim 17 (“wherein an increase in a level of change to a particular parameter weight during training results in a proportional increase in a parameter criticality for the particular parameter weight’).

Examiners answer:
The examiner disagrees. Since initial weights are randomly set, the rate of change has an inherent randomness associated with it. The examiner disagrees with this logic. 

18.	Applicant’s argument:
15. (currently amended) The computer-implemented method of claim 1, wherein the neural network comprises an input layer, a hidden layer, and an output layer, wherein the neural network is trained to identify a handwritten figure that is made up of a first set of multiple sections of the handwritten figure and a second set of multiple sections of the handwritten figure, and wherein the computer-implemented method further comprises:
inputting the first set of multiple sections of the handwritten figure into a first set of neurons in the input layer;
inputting the second set of multiple sections of the handwritten figure into a second set of neurons in the input layer;
identifying, by the neural network, the handwritten figure using only the first set of multiple sections of the handwritten figure;
in response to the neural network identifying the handwritten figure using only the first set of multiple sections of the handwritten figure, determining, by one or more processors, that parameter weights associated with the second set of neurons in the input layer are inconsequential to the final output of the neural network. 

That is, if only some part(s) of the handwritten figure is needed for the neural network to identify what it is, then other part(s) of the handwritten figure are inconsequential to the final output of the neural network. No combination of the cited prior art teaches or suggests this feature.

With regard to dependent Claim 17, a combination of the cited prior art does not teach or suggest “wherein an increase in a level of change to a particular parameter weight during training results in a proportional increase in a parameter criticality for the particular parameter weight.”

Examiners answer:
The applicant has now introduced a convolutional neural network. OShea addresses these claims. 

Conclusion – Final
19.	THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

20.	Claims 1-20 are rejected.

Correspondence Information
21.	Any inquiry concerning this information or related to the subject disclosure should be directed to the Examiner Mr. Peter Coughlan, whose telephone number is (571) 272-5990 (Fax 571-273-5990).  The Examiner can be reached on Monday through Friday from 7:15 a.m. to 3:45 p.m.
	If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor Mr. Michael Huntley can be reached at (303) 297-4307.  .  Any response to this office action should be mailed to:
	Commissioner of Patents and Trademarks, 
	Washington, D. C. 20231;
Hand delivered to:
	Receptionist, 
	Customer Service Window, 
	Randolph Building, 
	401 Dulany Street,
	Alexandria, Virginia 22313,
	(located on the first floor of the south side of the Randolph Building);
or faxed to:
	(571) 272-3150 (for formal communications intended for entry.)
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129