Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed 2021-06-30 has been entered.  Claims 1-20 remain pending in the application.  Applicant’s amendments to the Specification, Drawings, and Claims have overcome each and every objection and 112(b) rejection previously set forth in the Non Final Office Action mailed on 2021-04-05.
Response to Arguments
Applicant's arguments in response to rejections under 35 U.S.C. 103 have been fully considered but they are not persuasive.
Applicant argues that Kumar does not teach “propagating the received code through a neural network of at least one decoder” because Kumar discloses inputting a “feature map of a bit” to a neural network.  Examiner respectfully disagrees.  Examiner draws attention to Kumar Fig. 3:

    PNG
    media_image1.png
    704
    1190
    media_image1.png
    Greyscale

Examiner points out that the word “of” can be used with a possessive meaning, and here it is clear that the Neural Network 314 lies inside of the LDPC Decoder 310, and is thus possessed by the Decoder, and element 314 is therefore a “neural network of at least one decoder”.  The above figure also shows that the code, LPDC Codeword 320, is “propagated” through a neural network of a decoder, as the Decoding Procedure 312 is “iterative”, and thus all bits of the code are input to the neural network.  This is true regardless of whatever intervening steps are in between the code and the neural network, and of whatever state the code is in by the time it reaches the neural network.  Examiner notes that “the” in “propagating the received encoded linear code through a neural network” is to establish antecedent basis with the previous limitation, and that the word “encoded” here does not necessarily mean that a decoding procedure 312 lying in between the raw encoded linear code and the neural network 314, as in 
	Applicant also argues that Kumar does not teach “outputting a recovered version of the encoded linear code according to a final output of the neural network”, because Kumar’s neural network outputs an indication of whether a bit should be flipped.  Examiner respectfully disagrees.  Examiner points out that Kumar teaches “outputting a recovered version of the encoded linear code” as shown in Fig. 3 “Decoded bits 330”.  Part of the process of determining the “Decoded bits 330” is a result of the bit-flipping indication of the neural network 314.  Therefore, the “outputting a recovered version of the encoded linear code” is “according to a final output of the neural network”, as all outputs of the neural network are necessary to output the decoded bits, including the final one.
	Applicant argues that Kumar “practically teaches away” from message passing algorithms by reciting that “message passing algorithms exist in the art” and that “in addition to the message passing algorithm, a bit flipping algorithm is used”.  Examiner points out that neither acknowledging prior existence of a technique, nor actually using a technique, can be interpreted as “teaching away” from said technique.  Applicant seems to mean that Kumar teaches away from using a neural network to directly perform a message passing algorithm, which distinguishes Kumar from Claim 1.  Examiner points out that the only relevant matter in this case is if Kumar reads on the language as stated in the claims.
Applicant argues, regarding Karami, that “it is impossible to train the neural network decoder of Karami in advance using training samples”.  Examiner points out that Karami was not relied upon to teach the training of the neural network, and this limitation was taught by 
Applicant argues that Kumar and Karami are very different algorithms, and their combination would be impossible.  Examiner points out that the test for obviousness is not whether the features of a secondary reference may be bodily incorporated into the structure of the primary reference; nor is it that the claimed invention must be expressly suggested in any one or all of the references.  Rather, the test is what the combined teachings of the references would have suggested to those of ordinary skill in the art.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3 and 13-15 are rejected under 35 U.S.C. 103 as being unpatentable over Patent No. US 10,491,243 B2 to Kumar et al., (hereinafter, “Kumar”) in view of “Novel LDPC Decoder via MLP Neural Networks” to Karami et al., (hereinafter, “Karami”).
As per claim 1, Kumar teaches a computer implemented method of decoding a linear code transmitted over a transmission channel subject to noise, (Kumar, Abstract:  “In an example, the error correction system implements low-density parity-check (LDPC) decoding that uses bit flipping.”  Kumar Col. 2, line 57; “FIG. 9 is representative of a computer system capable of embodying the present disclosure”.  Kumar Col 5, lines 16-21; “In some embodiments… the data is transmitted and received over a wired and/or wireless channel. In this case, the errors in the received codeword may be introduced during transmission of the codeword.” Kumar Col. 5, line 25; “The received data may include some noise or errors.”)
comprising: using at least one processor for: receiving, over a transmission channel, an encoded linear code corresponding to a parity check matrix (Kumar Col. 13, line 1; “In an example, the system includes one or more processors …” Kumar Col. 5, lines 16-19; “In some embodiments… the data is transmitted and received over a wired and/or wireless channel.” Kumar Col. 4, lines 42-46; “…the embodiments similarly apply to other usage of LDPC codes including, for example, data transmission.  LDPC (low density parity codes) codes are linear block codes defined by a sparse parity-check matrix H, which consists of zeros and ones.”);
propagating the received encoded linear code through a neural network of at least one decoder (Kumar Col. 14, lines 4-9; “In an example, the neural network includes an output layer that has an output node. The output node generates the indication based on the propagation of information about the features of the input feature map from the input layer and through one or more hidden layers. The indication is stored in memory of the system and/or provided as input to the LDPC decoder.”
Kumar, Fig.3, discloses:
	
    PNG
    media_image1.png
    704
    1190
    media_image1.png
    Greyscale

Here, Kumar discloses that the received encoded linear code 320 is decoded, represented as a feature map, and then propagated through a neural network 314 of a decoder 310. ) 
the neural network having an input layer, an output layer and a plurality of hidden layers comprising a plurality of nodes (Kumar Col. 10, lines 16-18; “One or more hidden layers of the neural network exist between the input layer and the output layer. “ Kumar Col. 10, line 59; “…each of the hidden layers also includes a set of nodes that are referred to herein as hidden nodes.”)
Kumar fails to explicitly teach corresponding to transmitted messages of a message passing algorithm over a plurality of edges of a bipartite graph representation of the encoded linear code and a plurality of edges connecting the plurality of nodes, 
However, Karami teaches corresponding to transmitted messages of a message passing algorithm over a plurality of edges of a bipartite graph representation of the encoded linear code and a plurality of edges connecting the plurality of nodes, (Karami Pg. 18; “The neural decoder is based on the Tanner graph and can be considered as a type of message passing algorithm…” Examiner note: A Tanner graph is a type of bipartite graph. Karami Pg. 6, Fig. 2 shows the plurality of edges connecting the plurality of nodes.)
Kumar and Karami are analogous because they are both directed to neural network decoders. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Karami’s method of using a bipartite graph representation of the encoded code into Kumar’s method of LDPC decoding because one would be motivated to improve efficiency of the decoder:  “In this context, the need for comparing the calculated probabilities for each situation and using memory for these probabilities can be ignored. The proposed decoder operates with less complexity than of SPA. (Karami, Pg. 18).
Kumar further teaches wherein each one of the plurality of edges having a source node and a destination node is assigned with a weight previously calculated during a training session of the neural network in which the neural network is trained using a plurality of training samples to reduce a loss function (Kumar Col. 9, lines 4-8; “An interconnection represents a piece of information learned about the two interconnected nodes. The interconnection has a numeric weight that can be tuned (e.g., based on a training dataset), rendering the neural network adaptive to inputs and capable of learning.” Examiner note: The interconnection is an edge.  Kumar Col 4 Lines 27-30 also discloses a plurality of training samples: “Here, the training of the first neural network uses a first training set of feature maps, where the features relate to information used in the bit flipping (e.g., number of unsatisfied check nodes).”  Kumar Col 11 Lines 21-30 also discloses training to reduce a loss function:  “The neural network 400 also uses a loss function l (or, referred to also as a cost function c) to find an optimal solution. The optimal solution represents the situation where no solution has a loss less than the loss of the optimal solution. In an example, the loss function l includes a mean-squared error function that minimizes the average squared error between an output ƒ (x) and a target value y over all the example pairs (x, y). A backpropagation algorithm that uses gradient descent to minimize the loss function is used to train the neural network 400.”
the propagation follows a propagation path through the neural network dictated by respective weights of the plurality of edges (Kumar Col. 14, lines 5-8; “The output node generates the indication based on the propagation of information about the features of the input feature map from the input layer and through one or more hidden layers.” Kumar Col.9, lines 4-8; “The interconnection has a numeric weight”); and
outputting a recovered version of the encoded linear code according to a final output of the neural network (Kumar Col. 1, lines 58-62; “The error correction system accesses an output of the neural network. The output indicates that the bit should be flipped based on the feature map. The error correction system flips the bit in the decoding iteration based on the output of the neural network.”
Kumar, Fig.3, discloses:
	
    PNG
    media_image1.png
    704
    1190
    media_image1.png
    Greyscale

Here, Kumar discloses “outputting a recovered version of the encoded linear code” as shown in Fig. 3 “Decoded bits 330”.  Part of the process of determining the “Decoded bits 330” is a result of the bit-flipping indication of the neural network 314.  Therefore, the “outputting a recovered version of the encoded linear code” is “according to a final output of the neural network”, as all outputs of the neural network are necessary to output the decoded bits, including the final one).
As per claim 2, the combination of Kumar and Karami as shown above teaches the computer implemented method of claim 1, Kumar further teaches wherein the bipartite graph is a member of a group consisting of: a Tanner graph and a factor graph (Kumar Col. 6, line 29; “Various types of bipartite graphs are possible including, for example, a Tanner graph.”). 
As per claim 3, the combination of Kumar and Karami as shown above teaches the computer implemented method of claim 1, Kumar further teaches wherein the parity check matrix is a member of a group consisting of: algebraic linear code, polar code, Low Density Parity Check (LDPC) code and High Density Parity Check (HDPC) code (Kumar Col. 4, line 45; “LDPC (low density parity codes) codes are linear block codes defined by a sparse parity-check matrix H, which consists of zeros and ones.”)
As per claim 13, Kumar teaches a system for decoding a linear code transmitted over a transmission channel subject to noise, (Kumar, Abstract:  “In an example, the error correction system implements low-density parity-check (LDPC) decoding that uses bit flipping.”  Kumar Col. 2, line 57; “FIG. 9 is representative of a computer system capable of embodying the present disclosure” Kumar Col 5, lines 16-21; “In some embodiments… the data is transmitted and received over a wired and/or wireless channel. In this case, the errors in the received codeword may be introduced during transmission of the codeword.” Kumar Col 5, line 25; “The received data may include some noise or errors.”)
comprising: at least one processor adapted to execute code, the code comprising: code instructions to receive, over a transmission channel, an encoded linear code corresponding to a parity check matrix (Kumar Col. 18, lines 59-64; “The methods and processes described herein may be partially or fully embodied as code and/or data stored in a computer-readable storage medium or device, so that when a computer system reads and executes the code and/or data, the computer system performs the associated methods and processes.” Kumar Col. 13, line 1; “In an example, the system includes one or more processors …” Kumar Col. 5, lines 16-19; “In some embodiments… the data is transmitted and received over a wired and/or wireless channel.” Kumar Col. 4, lines 42-46; “…the embodiments similarly apply to other usage of LDPC codes including, for example, data transmission.  LDPC (low density parity codes) codes are linear block codes defined by a sparse parity-check matrix H, which consists of zeros and ones.”);
 code instructions to propagate the received encoded linear code through a neural network of at least one decoder (Kumar Col. 18, lines 59-64; “The methods and processes described herein may be partially or fully embodied as code and/or data stored in a computer-readable storage medium or device, so that when a computer system reads and executes the code and/or data, the computer system performs the associated methods and processes.” Kumar Col. 14, lines 4-9; “In an example, the neural network includes an output layer that has an output node. The output node generates the indication based on the propagation of information about the features of the input feature map from the input layer and through one or more hidden layers. The indication is stored in memory of the system and/or provided as input to the LDPC decoder.”
Kumar, Fig.3, discloses:
	
    PNG
    media_image1.png
    704
    1190
    media_image1.png
    Greyscale

Here, Kumar discloses that the received encoded linear code 320 is decoded, represented as a feature map, and then propagated through a neural network 314 of a decoder 310. ) 
the neural network having an input layer, an output layer and a plurality of hidden layers comprising a plurality of nodes (Kumar Col. 10, lines 16-18; “One or more hidden layers of the neural network exist between the input layer and the output layer. “ Kumar Col. 10, line 59; “…each of the hidden layers also includes a set of nodes that are referred to herein as hidden nodes.”)
Kumar fails to explicitly teach corresponding to transmitted messages of a message passing algorithm over a plurality of edges of a bipartite graph representation of the encoded linear code and a plurality of edges connecting the plurality of nodes, 
However, Karami teaches corresponding to transmitted messages of a message passing algorithm over a plurality of edges of a bipartite graph representation of the encoded linear code and a plurality of edges connecting the plurality of nodes, (Karami Pg. 18; “The neural decoder is based on the Tanner graph and can be considered as a type of message passing algorithm…” Examiner note: A Tanner graph is a type of bipartite graph. Karami Pg. 6, Fig. 2 shows the plurality of edges connecting the plurality of nodes.)
Kumar and Karami are analogous because they are both directed to neural network decoders. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Karami’s method of using a bipartite graph representation of the encoded code into Kumar’s method of LDPC decoding because a neural decoder based on a bipartite (Tanner) graph can be considered as a type of message passing algorithm, where the transferred massages are not probabilistic amounts. In this context, the need for comparing the calculated probabilities for each situation and using memory for these probabilities can be ignored.” (Karami, Pg. 18).
Kumar further teaches wherein each one of the plurality of edges having a source node and a destination node is assigned with a weight previously calculated during a training session of the neural network in which the neural network is trained using a plurality of training samples to reduce a loss function (Kumar Col. 9, lines 4-8; “An interconnection represents a piece of information learned about the two interconnected nodes. The interconnection has a numeric weight that can be tuned (e.g., based on a training dataset), rendering the neural network adaptive to inputs and capable of learning.” Examiner note: The interconnection is an edge.  Kumar Col 4 Lines 27-30 also discloses a plurality of training samples: “Here, the training of the first neural network uses a first training set of feature maps, where the features relate to information used in the bit flipping (e.g., number of unsatisfied check nodes).”  Kumar Col 11 Lines 21-30 also discloses training to reduce a loss function:  “The neural network 400 also uses a loss function l (or, referred to also as a cost function c) to find an optimal solution. The optimal solution represents the situation where no solution has a loss less than the loss of the optimal solution. In an example, the loss function l includes a mean-squared error function that minimizes the average squared error between an output ƒ (x) and a target value y over all the example pairs (x, y). A backpropagation algorithm that uses gradient descent to minimize the loss function is used to train the neural network 400.”)
the propagation follows a propagation path through the neural network dictated by respective weights of the plurality of edges (Kumar Col. 14, lines 5-8; “The output node generates the indication based on the propagation of information about the features of the input feature map from the input layer and through one or more hidden layers.” Kumar Col.9, lines 4-8; “The interconnection has a numeric weight”); and
code instructions to output a recovered version of the encoded linear code according to a final output of the neural network (Kumar Col. 18, lines 59-64; “The methods and processes described herein may be partially or fully embodied as code and/or data stored in a computer-readable storage medium or device, so that when a computer system reads and executes the code and/or data, the computer system performs the associated methods and processes.” Kumar Col. 1, lines 58-62; “The error correction system accesses an output of the neural network. The output indicates that the bit should be flipped based on the feature map. The error correction system flips the bit in the decoding iteration based on the output of the neural network.”
Kumar, Fig.3, discloses:
	
    PNG
    media_image1.png
    704
    1190
    media_image1.png
    Greyscale

Here, Kumar discloses “outputting a recovered version of the encoded linear code” as shown in Fig. 3 “Decoded bits 330”.  Part of the process of determining the “Decoded bits 330” is a result of the bit-flipping indication of the neural network 314.  Therefore, the “outputting a recovered version of the encoded linear code” is “according to a final output of the neural network”, as all outputs of the neural network are necessary to output the decoded bits, including the final one).
As per claim 14, the claim recites the bipartite graph is a member of a group consisting of: a Tanner graph and a factor graph. This is the same limitation as disclosed in claim 2 and is thus rejected with the same rationale applied against claim 2.
As per claim 15, the claim recites the parity check matrix is a member of a group consisting of: algebraic linear code, polar code, Low Density Parity Check (LDPC) code and High Density Parity Check (HDPC) code. This is the same limitation as disclosed in claim 3 and is thus rejected with the same rationale applied against claim 3.
Claims 4-6 and 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Kumar in view of Karami as shown above, further in view of “Neural Network Error Correcting Decoders for Convolutional Codes” to Caid et al., (hereinafter, “Caid”).
As per claim 4, the combination of Kumar and Karami as shown above teaches the computer implemented method of claim 1. Kumar further teaches wherein the training session is conducted through a plurality of training iterations using a dataset comprising a plurality of training samples (Kumar Col. 2, line 62; “Generally, LDPC decoding uses an iterative decoding procedure.” Kumar Col. 11, lines 63-66; “…the training data includes a large number of training LDPC codewords...” Examiner note: the training code-words are the samples),
The combination of Kumar and Karami fails to explicitly teach each of the plurality of training samples maps at least one training codeword of the linear code that is subjected to a different noise pattern injected to the transmission channel. 
However, Caid teaches each of the plurality of training samples maps at least one training codeword of the linear code that is subjected to a different noise pattern injected to the transmission channel (Caid Sec. 2.1; “To perform the training of the neural ECC decoders, a training set of channel symbol patterns was required. This data set was created by the Training Set Generator…Valid source data patterns are provided to the ECC encoder. The encoded data is then corrupted by either “flipping bits" (0 -> 1 or 1 -> 0) or by adding Gaussian noise at a specified SNR. The resulting corrupted sequence along with the input data sequence (the network output which is the decoded bits) become the training set for the network.”)
Kumar, Karami, and Caid are analogous because they are all directed to neural network decoders. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Caid’s method of adding noise into the training codewords into Kumar’s system as modified by Karami because the resulting corrupted sequence along with the input data sequence become the training set for the network. (Caid Sec. 2.1).
As per claim 5, the combination of Kumar, Karami, and Caid as shown above teaches the computer implemented method of claim 4, Caid further teaches wherein the at least one training codeword is the zero codeword (Caid Sec 2.1; “To perform the training of the neural ECC decoders, a training set of channel symbol patterns was required. This data set was created by the Training Set Generator…Valid source data patterns are provided to the ECC encoder.” Examiner note: valid codewords are known as “zero codewords” in the art).
Kumar, Karami, and Caid are analogous because they are all directed to neural network decoders. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Caid’s method of training with  to perform the training of the neural error correcting decoders, a training set of channel symbol patterns is required. (Caid Sec. 2.1).
As per claim 6, the combination of Kumar, Karami, and Caid as shown above teaches the computer implemented method of claim 4, Kumar further teaches wherein the training is done using at least one of: stochastic gradient descent, batch gradient descent and mini- batch gradient descent (Kumar Col. 11, lines 28-30; “A backpropagation algorithm that uses gradient descent to minimize the loss function is used to train the neural network.” Examiner note: Stochastic Gradient descent, Batch gradient descent, and mini-batch gradient descent, are three variants of gradient descent).
As per claim 16, the claim recites wherein the training session is conducted through a plurality of training iterations using a dataset comprising a plurality of samples, each of the plurality of samples maps at least one training codeword of the linear code that is subjected to a different noise pattern injected to the transmission channel. This is the same limitation as disclosed in claim 4 and is thus rejected with the same rationale applied against claim 4.
As per claim 17, the claim recites wherein the at least one training codeword is the zero codeword. This is the same limitation as disclosed in claim 5 and is thus rejected with the same rationale applied against claim 5.
As per claim 18, the claim recites wherein the training is done using at least one of: stochastic gradient descent, batch gradient descent and mini-batch gradient descent. . 
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Kumar in view of Karami as shown above, further in view of Patent No. 6,026,177 to Mong et al., (hereinafter, “Mong”).
As per claim 8, the combination of Kumar and Karami as shown above teaches the computer implemented method of claim 1, the combination of Kumar and Karami fails to explicitly teach wherein the neural network is a feed-forward neural network in which the weight is arbitrarily set for each of a plurality of corresponding edges in each layer of the neural network.
However, Mong teaches wherein the neural network is a feed-forward neural network in which the weight is arbitrarily set for each of a plurality of corresponding edges in each layer of the neural network (Mong Col. 1, line 57; “A multi-layer feed-forward window-based neural network model is used.” Mong Col. 11, lines 13-16; “A back propagation network is a feed forward network that employs a back propagation algorithm for training the network in the training phase. This training algorithm starts with an arbitrary set of weights throughout the full connected network.”).
Kumar, Karami, and Mong are analogous because they are all directed to neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Mong’s method of setting the weights of a neural network to an arbitrary value into Kumar’s system as modified by Karami, .
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Kumar in view of Karami as shown above, further in view of “Using the Output Embedding to Improve Language Models” to Press et al., (hereinafter, “Press”).
As per claim 9, the combination of Kumar and Karami as shown above teaches the computer implemented method of claim 1, the combination of Kumar and Karami fails to explicitly teach wherein the neural network is a recurrent neural network (RNN) in which the weight is equal for corresponding edges in each layer of the neural network.
However, Press teaches wherein the neural network is a recurrent neural network (RNN) in which the weight is equal for corresponding edges in each layer of the neural network. (Press, Pg. 5; “Weight tying is applied similarly in all models.” Examiner note: Table 6 includes various RNN models that were analyzed with weight tying. In the instant application specification Par. [0052], the applicant defines “tied weights” as edges in the neural network with equal weighs).
Kumar, Karami, and Press are analogous because they are all directed to neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Press’s method of tying the weights of a neural network into Kumar’s system as modified by Karami, because weight tying can reduce the size of neural translation models to less than half of their original size without harming their performance. (Press, abstract).
Claims 10 and 20  are rejected under 35 U.S.C. 103 as being unpatentable over Kumar in view of Karami as shown above, further in view of “Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations” to Hubara et al., (hereinafter, “Hubara”).
As per claim 10, the combination of Kumar and Karami as shown above teaches the computer implemented method of claim 1, the combination of Kumar and Karami fails to explicitly teach further comprising the weight is quantized.
However, Hubara teaches further comprising the weight is quantized (Hubara Pg. 1, Abstract; “We introduce a method to train Quantized Neural Networks (QNNs) — neural networks with extremely low precision (e.g., 1-bit) weights and activations, at run-time. At train-time the quantized weights and activations are used for computing the parameter gradients.”).
Kumar, Karami, and Hubara are analogous because they are all directed to neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Hubara’s method of quantizing the weights of a neural network into Kumar’s system as modified by Karami, because quantized neural networks drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations (Hubara, abstract).
As per claim 20, .
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Kumar in view of Karami as shown above, further in view of Patent No. US 9,742,431 to Babin, (hereinafter, “Babin”).
As per claim 11, the combination of Kumar and Karami as shown above teaches the computer implemented method of claim 1, the combination of Kumar and Karami fails to explicitly teach further comprising generating an aggregated recovered version of the encoded linear code by aggregating the recovered version produced by a plurality of decoders such as the at least one decoder. 
However, Babin teaches, further comprising generating an aggregated recovered version of the encoded linear code by aggregating the recovered version produced by a plurality of decoders such as the at least one decoder (Babin Col. 10, lines 59-63; “…a processor coupled to each respective pair of binary output lines of each decoder circuit, the binary output lines of the quaternary decoder in aggregate provide a resulting set of binary digits to the processor.”).
Kumar, Karami, and Babin are analogous because they are all directed to neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Babin’s method of generating an aggregated recovered version of the code into Kumar’s system as modified by Karami in order to provide a resulting set of binary digits. (Babin Col. 10, lines 59-63).
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Kumar and Karami in view of Babin as shown above, further in view of “Apparatus and Method for Encoding and Decoding Data in Twisted Polar Code” to Trifonov et al., (hereinafter, “Trifonov”).
As per claim 12, the combination of Kumar, Karami, and Babin as shown above teaches the computer implemented method of claim 11, Kumar further teaches wherein the weight is calculated for each one of the plurality of decoders by training a respective neural network of each decoder (Kumar Col. 9, lines 4-8; “The interconnection has a numeric weight that can be tuned (e.g., based on a training dataset), rendering the neural network adaptive to inputs and capable of learning.”) 
The combination of Kumar, Karami, and Babin fails to explicitly teach using a different set of permutation values of the linear code following each of a plurality of training iterations, wherein the set of permutation values is deterministically set and/or randomly selected from an automorphism group of the code.
However, Trifonov teaches using a different set of permutation values of the linear code following each of a plurality of training iterations, wherein the set of permutation values is deterministically set and/or randomly selected from an automorphism group of the code (Trifonov, Par [0012]; “The present invention increases the speed of encoding and/or decoding through the reduction in the number of iterations to be performed.” Trifonov Par. [0042]; “Basic idea of the present approach, which is reflected in respective methods for encoding and decoding, relies on the introduction of permutation of intermediate symbols…permutations are selected from a common affinity group, which is a group of automorphisms of Reed-Muller sub-codes for this code.”).
Kumar, Karami, Babin, and Trifonov are analogous because they are all directed to neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Trifonov’s method .
Allowable Subject Matter
Claims 7 and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claims 7 and 19 recite “during the training, an updated marginalization value is calculated for each even layer of the plurality of hidden layers, a multi-loss function used for the training is updated with the updated marginalization value”.  Individually, the two concepts of marginalization and multi-loss function appear to be well-known in the art.  Multi-loss function is disclosed by Xu et. al. (“Multi-loss Regularized Deep Neural Network”) in the Abstract, with a motivation to avoid overfitting:  “A proper strategy to alleviate overfitting is critical to a deep neural network (DNN). In this paper, we introduce the cross-loss-function regularization for boosting the generalization capability of the DNN, which results in the multi-loss regularized DNN (ML-DNN) framework. For a particular learning task, e.g., image classification, only a single-loss function is used for all previous DNNs, and the intuition behind the multi-loss framework is that the extra loss functions with different theoretical motivations (e.g., pairwise loss and LambdaRank loss) may drag the algorithm away from overfitting to one particular single-loss function (e.g., softmax loss).”   

Examiner found some pieces of art that do combine these two concepts.  Lian et. al. (“Learned Belief-Propagation Decoding with Simple Scaling and SNR Adaptation”) states on Page 163 Section C: “The optimization behavior for WBP can be improved by using a multi-loss function [1]”.  Wang et. al. (“Deep Learning for Wireless Physical Layer: Opportunities and Challenges”) states on Page 100 Paragraph 2:  “In [27], the aforementioned fully connected DNN-based BP decoder is transformed into an RNN architecture, which is named as BP-RNN decoder, by unifying the weights in each iteration and feeding back the outputs of parity layers into the inputs of variable layers, as shown in figure 8. The number of time steps equals to that of iterations. This process significantly reduces the number of parameters and results in a performance that is comparable with that of the former decoder. The multiloss concept is also adopted in this architecture.”  However, in both these cases, the art was published after the effective filing date of this application.  In fact, the references cited by Lian and Wang are the papers of the inventors on this application.  Lian cited “Learning to decode linear codes using deep learning” from 2016 by Nachmani et. al., and Wang cited “RNN decoding of linear block codes” by Nachmani et. al., both of which were included by Applicant in the IDS.  Agrawal et. al. (US 20210142158 A1) may have also independently arrived to this idea, as they incorporate marginalization and a multi-loss function, but their effectively filed date is 9 days after the effectively filed date of the instant application.

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Sharon et. al. (US 2018/0358988 A1) discloses a message passing decoder, the messages represented by a bipartite graph, and the decoder comprising a machine learning model with trained parameters
Berrou et. al. (US 2013/0318017 A1) discloses decoding a message by means of a neural network, wherein the decoding relies on a bipartite graph
Kumar et. al. ("Non Iterative LDPC Decoding By Syndrome Generation Using Artificial Neural Network") discloses using a neural network to decode LDPC messages which are represented by bipartite networks
Xiao et. al. (“The Design and Implementation of Neural Network Encoding and Decoding”) discloses using neural networks to decode messages which can be represented with bipartite graphs

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710.  The examiner can normally be reached on M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on Mon-Fri 9:00AM – 6:00PM.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ANN J LO/Supervisory Patent Examiner, Art Unit 2126