DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 2021-12-07 has been entered.  The status of the claims is as follows:
Claims 1-20 remain pending in the application.
Claims 1, 4, 13, and 16 are amended.
Response to Arguments
Applicant’s arguments with respect to rejections under 35 USC 103 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention 

Claims 1-6, 8, 11, and 13-18 are rejected under 35 U.S.C. 103 as being unpatentable over Blackmer et. al. (“Non Iterative Decoding of Low Density Parity Check Codes Using Artificial Neural Networks”; hereinafter “Blackmer”) in view of Thomas et. al. (“Accelerated Optimal Topology Search for Two-Hidden-Layer Feedforward Neural Networks”; hereinafter “Thomas”) and Charei et. al. (“An Improved soft decision method in Viterbi decoder using artificial neural networks”; hereinafter Charei).
As per claim 1, Blackmer teaches a computer implemented method of decoding a linear code transmitted over a transmission channel subject to noise, comprising: 
using at least one processor for: (Blackmer, Page 3 top left, discloses:  “This results in a network which can be implemented in a FPGA in a massively parallel fashion taking up no extra clock cycles for a CPU to accomplish near real-time decoding”).
receiving, over a transmission channel, an encoded linear code corresponding to a parity check matrix (Blackmer, Page 4 Top left Step 3, discloses:  “Encode and transmit the LDPC vector through the channel as normal”.  Here, the LDPC is a Low Density Parity Check vector, and thus corresponds to a parity check matrix.  This is encoded, and then transmitted over the channel, and thus on the other end of the transmission channel, it is received.)
propagating the received encoded linear code through a neural network of at least one decoder, the neural network having an input layer, an output layer and a [plurality of] hidden layer[s] comprising a plurality of nodes corresponding to transmitted messages of a message (Blackmer, Page 4 Top left Step 4, discloses:  “Feed each row of the received vector through the correctly sized neural network”.  Here, the encoded linear code is propagated through a neural network for decoding, as the neural network is described as a “decoder” on Blackmer Page 4, Last sentence of Conclusion:  “This paper has presented a starting framework from which to build a better performing neural network decoder”.  Blackmer, Page 2 Figure 2, shown below, discloses an input layer, an output layer, and a hidden layer: 

    PNG
    media_image1.png
    265
    412
    media_image1.png
    Greyscale

Blackmer, Page 2 Figure 2 above, shows that the layers comprise a plurality of nodes.  AS shown above, Blackmer Page 4 Top left Steps 3 and 4, discloses:  “Encode and transmit the LDPC vector through the channel as normal.  Feed each row of the received vector through the correctly sized neural network”.  Thus Blackmer teaches that the nodes correspond to transmitted messages of a message passing algorithm.   Figure 2 above also shows “a plurality of edges connecting the plurality of nodes”.
Blackmer, Page 3 right side Step 1, discloses: “Offline: Determine the number of ones in each of the i rows of the H matrix”.  Here, Blackmer discloses that the code is represented as an H matrix.  Blackmer, Page 1 Figure 1 demonstrates the H matrix:

    PNG
    media_image2.png
    153
    326
    media_image2.png
    Greyscale

One of ordinary skill in the art will appreciate that a Tanner graph is a bipartite graph that is another way of visualizing an H matrix.  Blackmer suggests this on Page 2 Left Column Lines 8-9:  “Using a tanner graph one can verify that the smallest cycle in this H matrix is of length 3.”  Further justification for this can be found in supplementary reference Karami et. al. (“Multi Layer Perceptron Neural Networks Decoder for LDPC Codes”) Page 1 Intro Para 2: “LDPC codes can be illustrated in two forms: the matrix form and the graphical form. The graphical display creates a very useful realization for decoding of this code. This display was introduced first by Tanner in [6] and is called Tanner graph. The Tanner graph is a two partite graph with two types of nodes called variable nodes and check nodes. The variable nodes can be connected only to check nodes and vice versa. The connections between variable and check nodes is performed in light of matrix H, in such a way that if the columns of matrix H be considered as variable nodes and the rows of that as check nodes, in any location there is a one in the matrix, the related variable and check nodes will be connected by an edge.”  Thus, the 1’s in Blackmer’s H matrix represent “a plurality of edges of a bipartite graph representation of the encoded linear code”, and Blackmer bases the neural network upon these edges by finding the ones, as shown on Page 3 right side Steps 1-2:  “Offline: Determine the number of ones Xi in each of the i rows of the H matrix.  2) Train individual networks for each unique Xi.”
wherein each one of the plurality of edges having a source node and a destination node is assigned with a weight previously calculated during a training session of the neural network in which the neural network is trained using a plurality of training samples to reduce a loss function, the propagation follows a propagation path through the neural network dictated by respective weights of the plurality of edges (Blackmer, as shown above in Page 2 Figure 2, discloses a neural network with layers and edges between the layers, wherein each edge is between two nodes in adjoining layers, and thus have a source node and a destination node.  Blackmer, Page 3 right side Step 2, discloses:  “Train individual networks for each unique Xi”.  Here, Blackmer discloses a training session.   Blackmer, Page 2 right side last paragraph of Section 3, discloses more about training:  “This method allows the training to be done offline, iteratively approaching successively better and better network weight and bias configurations for network performance.”  Here, Blackmer discloses weights previously calculated (“offline”) during a training session, which is done “iteratively”. One of ordinary skill in the art will appreciate that iterative training of a neural network involves minimizing a loss function over a plurality of training samples.  Blackmer, Page 4 Steps 4-5, discloses:  “Feed each row of the received vector through the correctly sized neural network. The output Y will now be the size
    PNG
    media_image3.png
    21
    65
    media_image3.png
    Greyscale
 vector of likelihoods that the given input sequence belongs to each of the 
    PNG
    media_image4.png
    21
    39
    media_image4.png
    Greyscale
possible sequences. Multiply the 
    PNG
    media_image3.png
    21
    65
    media_image3.png
    Greyscale
 likelihood output sequence by the 
    PNG
    media_image5.png
    20
    73
    media_image5.png
    Greyscale
 matrix of all possible valid sequences. This will generate the probabilistic 
    PNG
    media_image3.png
    21
    65
    media_image3.png
    Greyscale
values to fill back into the positions from the current row of the H matrix.”  Here, Blackmer discloses that propagation follows a propagation path through the neural network dictated by respective weights of the plurality of edges.)
and outputting a recovered version of the encoded linear code according to a final output of the neural network (Blackmer, Page 3 Figure 3 discloses decoded output according to the output of the neural network, see the rightmost item:

    PNG
    media_image6.png
    393
    792
    media_image6.png
    Greyscale

As shown above, Blackmer discloses “decoded output”.)
	However, Blackmer does not explicitly teach a plurality of hidden layers; wherein each of the plurality of training samples maps at least one training codeword of the linear code subjected to a respective noise pattern of a plurality of noise patterns representing noise induced during transmission over the transmission channel
	Thomas discloses a plurality of hidden layers (Thomas, Page 1 Abstract, discloses:  “Two-hidden-layer feedforward neural networks are investigated for the existence of an optimal hidden node ratio”).
	Blackmer and Thomas are analogous art because they are both in the field of endeavor of machine learning.
	Blackmer discloses the use of a Multilayer Perceptron, which is a feedforward neural network, for a non-iterative LDPC decoder (Blackmer, Page 2 Section 3: “Artificial Neural Networks (ANN) of the Multi Layer Perception (MLP) are a class of feed forward neural networks, meaning they have no recursive or feedback connections.”)  Blackmer’s feedforward neural network has one hidden layer, as shown in Page 2 Figure 2.  Thomas discloses a two hidden layer feedforward neural network (Thomas, Page 1 Intro Para 1:  “This paper addresses
the question: ‘Does there exist an optimal ratio of nodes between the first and second hidden layers of a two-hidden-layer neural network (TLFN)?’”)  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Blackmer and Thomas, which would result in a two-layer feedforward neural network for a non-iterative LDPC decoder.  One of ordinary skill in the art would be motivated to do so to achieve more accurate results (Thomas, Page 2 Section2:  “Given a sufficiently large number of hidden units, a single layer will suffice [1], however two hidden layers can often achieve better result than a single layer [6]. In the Authors’ own experience, node for node, a TLFN will give a better generalisation capability than a single-hidden-layer feedforward neural network (SLFN) in many cases.” And Thomas Page 11 Section 7 Number 2:  “TLFNs can often outperform SLFNs, as was proved in [6], and demonstrated in these experiments.”)
	However, the combination of Blackmer and Thomas thus far fails to teach wherein each of the plurality of training samples maps at least one training codeword of the linear code subjected to a respective noise pattern of a plurality of noise patterns representing noise (Blackmer suggests, but does not explicitly teach this limitation.  Blackmer, Page 2 Section 3 Para 2, discloses:  “These feedforward networks are ideally suited for pattern recognition [8], and have several major benefits which make them a great choice decoding signals. Their ability to properly classify inputs when presented with novel signal data means that even when corrupted with random noise, the neural networks can be trained to look past the signal errors and noise and find the underlying geometric relationship that defines the coded signal.”  Here Blackmer suggests that the feedforward neural network can be “trained to look past the signal errors and noise”, but does not explicitly describe this process.)
	Charei teaches wherein each of the plurality of training samples maps at least one training codeword of the [linear] code subjected to a respective noise pattern of a plurality of noise patterns representing noise induced during transmission over the transmission channel (While Charei teaches convolutional code, note that Blackmer above, as shown above, discloses linear code.  Charei, Page 3 Section V, discloses:  “To train the neural network, a random sequence of zeros and ones is produced. The Convolutional coding and then a BPSK modulation is performed on the sequence. The output of modulation block is the target of the neural network. By adding an AWGN noise with specific SNR to the sequence the input of the neural network for training process will be ready. An important point in training of neural network is the number of training data. For efficient network training, large number of training data is required to introduce the behavior of channel noise properly.”  Here, Charei discloses subjecting a codeword (“convolutional coding”) to a respective noise pattern (“adding an AWGN noise with specific SNR”) of a plurality of noise patterns representing noise induced during transmission over the transmission channel (“introduce the behavior of channel noise”) for a plurality of training samples (“training process will be ready”)).
	Charei and the combination of Blackmer and Thomas are analogous art because they are both in the field of endeavor of machine learning.
	The combination of Blackmer and Thomas teaches a feedforward neural network used for decoding linear codes over an Additive White Gaussian Noise channel (see Blackmer Page 3 Figure 3, far left shows “AWGN Channel”).  Blackmer, as shown above, suggests training the neural network to look past noise, but does not explicitly describe the process. Charei teaches using a neural network as part of a process to decode convolutional codes, also over an AWGN channel, and training that neural network by introducing noise patterns that are typical and rerpesentative of the transmission channel.  Thus, the combination of Charei with Blackmer and Thomas would result in a deep feedforward neural network for LDPC decoding that is trained to look past noise patterns that are characteristic of the transmission channel.  One of ordinary skill in the art would be motivated to do so in order to achieve higher accuracy of the decoder neural network (Charei, Page 1 Abstract:  “Using the neural networks soft decision in Viterbi decoder block would reduce the bit error rate (BER) in data transmission for AWGN channels”).

As per claim 2, the combination of Blackmer, Thomas, and Charei teaches the computer implemented method of claim 1.  Blackmer teaches wherein the bipartite graph is a member of a group consisting of: a Tanner graph and a factor graph (Blackmer, Page 2 Left Column Lines 8-9, discloses a tanner graph:  “Using a tanner graph one can verify that the smallest cycle in this H matrix is of length 3.”)
 
As per claim 3, the combination of Blackmer, Thomas, and Charei teaches the computer implemented method of claim 1.  Blackmer teaches wherein the parity check matrix is a member of a group consisting of: algebraic linear code, polar code, Low Density Parity Check (LDPC) code and High Density Parity Check (HDPC) code (Blackmer, Page 1 Intro Last Sentence, discloses LDPC: “Using the inherent pattern recognition and generalization abilities of a properly trained neural network can enable constant time very high speed, non iterative LDPC decoding, with error performance levels on short codes approaching or even surpassing more traditional iterative belief propagation decoding methods”)

As per claim 4, the combination of Blackmer, Thomas, and Charei teaches the computer implemented method of claim 1.  Blackmer teaches wherein the training session is conducted through a plurality of training iterations using a dataset comprising the plurality of training samples (Blackmer, Page 3 right side Step 2, discloses:  “Train individual networks for each unique Xi”.  Here, Blackmer discloses a training session.   Blackmer, Page 2 right side last paragraph of Section 3, discloses more about training:  “This method allows the training to be done offline, iteratively approaching successively better and better network weight and bias configurations for network performance.”  Here, Blackmer discloses weights previously calculated (“offline”) during a training session, which is done “iteratively”. One of ordinary skill in the art will appreciate that iterative training of a neural network involves minimizing a loss function over a plurality of training samples.)

As per claim 5, the combination of Blackmer, Thomas, and Charei teaches the computer implemented method of claim 4.  Charei teaches wherein the at least one training codeword is the zero codeword (Charei, Page 3 Section V, discloses:  “To train the neural network, a random sequence of zeros and ones is produced.”  One of ordinary skill in the art will appreciate that this random sequence of zeros and ones may be the zero codeword (all zeroes)).  Note that Blackmer also suggests this in a separate embodiment in Page 3 Section 5:  “Therefore this paper presents two different approaches to using neural networks for decoding. The first approach is for illustrative purposes and has been investigated before [13] for linear block codes. This technique is to train the network with all possible valid codewords”.  “All possible codewords” will include the zero codeword.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Charei with the combination of Blackmer and Thomas, for at least the reasons recited in Claim 1.

As per claim 6, the combination of Blackmer, Thomas, and Charei teaches the computer implemented method of claim 4.  Blackmer teaches wherein the training is done using at least one of: stochastic gradient descent, batch gradient descent and mini- batch gradient descent (Blackmer, Page 2 Section 3 Para 2, discloses:  “The method of training the network, rather than having a static design provides another benefit for the purposes of codeword recognition. Being shown perfect versions of the signal, then having a gradient descent algorithm update each interconnecting weight in an attempt to iteratively find global function minima in the output space.”  Here, Blackmer discloses “gradient descent” for the training. One of ordinary skill in the art will appreciate that stochastic, batch, and mini-batch are the 3 types of gradient descent algorithm.)

As per claim 8, the combination of Blackmer, Thomas, and Charei teaches the computer implemented method of claim 1.  Blackmer teaches wherein the neural network is a feed-forward neural network in which the weight is arbitrarily set for each of a plurality of corresponding edges in each layer of the neural network.  (Blackmer, Page 2 Section 3, discloses feed-forward neural network:  “Artificial Neural Networks (ANN) of the Multi Layer Perception (MLP) are a class of feed forward neural networks, meaning they have no recursive or feedback connections.”  Blackmer, Page 2 Section 3 Para 2, discloses:  “The method of training the network, rather than having a static design provides another benefit for the purposes of codeword recognition. Being shown perfect versions of the signal, then having a gradient descent algorithm update each interconnecting weight in an attempt to iteratively find global function minima in the output space.”  Here, Blackmer discloses “gradient descent” for the training. One of ordinary skill in the art will appreciate that in order for gradient descent to update the weights after each iteration, the weights must be set to some arbitrary value before the first iteration, as a basis from which the gradient descent performs the update to the weights.)

As per claim 11, the combination of Blackmer, Thomas, and Charei teaches the computer implemented method of claim 1.  Blackmer teaches further comprising generating an aggregated recovered version of the encoded linear code by aggregating the recovered version produced by a plurality of decoders such as the at least one decoder. (Blackmer, Page 3 Figure 3, discloses:

    PNG
    media_image7.png
    385
    804
    media_image7.png
    Greyscale

As shown above, Blackmer discloses aggregating the recovered version by a plurality of decoders (“neural network decoders”) into an aggregated recovered version (“decoded output”)).

Claim 13 is a system claim corresponding to method Claim 1.  Claim 13 is rejected for the same reasons as Claim 1.

Claim 14 is a system claim corresponding to method Claim 2.  Claim 14 is rejected for the same reasons as Claim 2.

Claim 15 is a system claim corresponding to method Claim 3.  Claim 15 is rejected for the same reasons as Claim 3.

Claim 16 is a system claim corresponding to method Claim 4.  Claim 16 is rejected for the same reasons as Claim 4.

Claim 17 is a system claim corresponding to method Claim 4.  Claim 17 is rejected for the same reasons as Claim 5.

Claim 18 is a system claim corresponding to method Claim 6.  Claim 18 is rejected for the same reasons as Claim 6.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Blackmer, Thomas, and Charei, further in view of Ha et al. (“Hypernetworks”; hereinafter “Ha”).
As per claim 9, the combination of Blackmer, Thomas, and Charei teaches the computer implemented method of claim 1.  However, the combination of Blackmer, Thomas, and Charei does not teach wherein the neural network is a recurrent neural 
However, Ha teaches wherein the neural network is a recurrent neural network (RNN) in which the weight is equal for corresponding edges in each layer of the neural network. (Ha, Pg. 3 discloses: “Recurrent Networks can be viewed as a really deep feed forward network with the identical weights at each layer (this is called weight tying)”).
Ha and the combination of Blackmer, Thomas, and Charei are analogous art because they are both in the field of endeavor of machine learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the recurrent neural network with weight tying of Ha with the neural network decoder of the combination of the combination of Blackmer, Thomas, and Charei.  One of ordinary skill in the art would be motivated to do so in order to gain efficiency by reducing the number of weights that need to be learned during training (Ha, Pg. 3: “Recurrent Networks can be viewed as a really deep feed forward network with the identical weights at each layer (this is called weight tying)”)

Claims 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Blackmer, Thomas, and Charei further in view of Hubara et. al. (“Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations”; hereinafter “Hubara”).
As per claim 10, the combination of Blackmer, Thomas, and Charei teaches the computer implemented method of claim 1.  However, the combination of Blackmer, Thomas, and Charei fails to teach further comprising the weight is quantized.
Hubara teaches further comprising the weight is quantized (Hubara Pg. 1 Abstract, discloses: “We introduce a method to train Quantized Neural Networks (QNNs) — neural networks with extremely low precision (e.g., 1-bit) weights and activations, at run-time. At train-time the quantized weights and activations are used for computing the parameter gradients.”)
Hubara and the combination of Blackmer, Thomas, and Charei are analogous art because they are both in the field of endeavor of machine learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Hubara’s method of quantizing the weights of a neural network into the neural network decoder of the combination of Blackmer, Thomas, and Charei.  One of ordinary skill in the art would be motivated to do so in order to gain efficiency by reducing resource usage (Hubara, Pg. 1 Abstract:  “During the forward pass, QNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations).

Claim 20 is a system claim corresponding to method Claim 10.  Claim 20 is rejected for the same reasons as Claim 10.

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Blackmer, Thomas, and Charei further in view of Chen et al. (“Enhancing Iterative Decoding of Cyclic LDPC Codes Using Their Automorphism Groups”;  hereinafter “Chen”).
As per claim 12, the combination of Blackmer, Thomas, and Charei teaches the computer implemented method of claim 11.  Blackmer further teaches wherein the weight is calculated for each one of the plurality of decoders by training a respective neural network of each decoder (Blackmer, Page 3 Figure 3, discloses:

    PNG
    media_image7.png
    385
    804
    media_image7.png
    Greyscale

As shown above, Blackmer discloses aggregating the recovered version by a plurality of decoders (“neural network decoders”) into an aggregated recovered version (“decoded output”).  Each decoder is a neural network.  Blackmer, Page 2 Section 3 Para 2, discloses training the neural network:  “The method of training the network, rather than having a static design provides another benefit for the purposes of codeword recognition. Being shown perfect versions of the signal, then having a gradient descent algorithm update each interconnecting weight in an attempt to iteratively find global function minima in the output space.”
	However, Blackmer does not explicitly teach using a different set of permutation values of the linear code following each of a plurality of training iterations; wherein the set of permutation values is deterministically set and/or randomly selected from an automorphism group of the code.
Charei teaches using a different set of permutation values of the [linear] code following each of a plurality of training iterations (Recall above Blackmer teaches linear code.  Charei, Page 3 Section V, discloses:  “To train the neural network, a random sequence of zeros and ones is produced. The Convolutional coding and then a BPSK modulation is performed on the sequence. The output of modulation block is the target of the neural network. By adding an AWGN noise with specific SNR to the sequence the input of the neural network for training process will be ready. An important point in training of neural network is the number of training data. For efficient network training, large number of training data is required to introduce the behavior of channel noise properly.”  Here, Charei discloses subjecting a codeword (“convolutional coding”) to a respective noise pattern (“adding an AWGN noise with specific SNR”) of a plurality of noise patterns representing noise induced during transmission over the transmission channel (“introduce the behavior of channel noise”) for a plurality of training samples (“training process will be ready”), and thus following a plurality of training iterations.  Adding the noise to the signal could result in a permutation of the original signal.)

However, the combination of Blackmer, Thomas, and Charei thus far fails to teach wherein the set of permutation values is deterministically set and/or randomly selected from an automorphism group of the code.
Chen teaches wherein the set of permutation values is deterministically set and/or randomly selected from an automorphism group of the code. (Chen, Pg 2128 Intro Para 5, discloses: “In this paper, we are interested in applying automorphism group aided iterative decoding techniques to LDPC codes”).
Chen and the combination of Blackmer, Thomas, and Charei are analogous art because they are both in the field of endeavor of decoding linear codes.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Chen’s teachings of automorphism groups of LDPC codes with the neural network decoder of the combination of Blackmer, Thomas, and Charei.  One of ordinary skill in the art would be motivated to do so in order to improve decoding performance (Chen, Page 2128 Abstract:  “Simulation results show that for our constructed .

Allowable Subject Matter
Claims 7 and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claims 7 and 19 recite “during the training, an updated marginalization value is calculated for each even layer of the plurality of hidden layers, a multi-loss function used for the training is updated with the updated marginalization value”.  Individually, the two concepts of marginalization and multi-loss function appear to be well-known in the art.  Multi-loss function is disclosed by Xu et. al. (“Multi-loss Regularized Deep Neural Network”) in the Abstract, with a motivation to avoid overfitting:  “A proper strategy to alleviate overfitting is critical to a deep neural network (DNN). In this paper, we introduce the cross-loss-function regularization for boosting the generalization capability of the DNN, which results in the multi-loss regularized DNN (ML-DNN) framework. For a particular learning task, e.g., image classification, only a single-loss function is used for all previous DNNs, and the intuition behind the multi-loss framework is that the extra loss functions with different theoretical motivations (e.g., pairwise loss and LambdaRank loss) may drag the algorithm away from overfitting to one particular single-loss function (e.g., softmax loss).”   
As for marginalization, Jensen et. al. (“Message Passing Algorithm and Linear Programming Decoding for LDPC and Linear Block Codes”) discloses on Page 32 Section 4.3:  
Examiner found some pieces of art that do combine these two concepts.  Lian et. al. (“Learned Belief-Propagation Decoding with Simple Scaling and SNR Adaptation”) states on Page 163 Section C: “The optimization behavior for WBP can be improved by using a multi-loss function [1]”.  Wang et. al. (“Deep Learning for Wireless Physical Layer: Opportunities and Challenges”) states on Page 100 Paragraph 2:  “In [27], the aforementioned fully connected DNN-based BP decoder is transformed into an RNN architecture, which is named as BP-RNN decoder, by unifying the weights in each iteration and feeding back the outputs of parity layers into the inputs of variable layers, as shown in figure 8. The number of time steps equals to that of iterations. This process significantly reduces the number of parameters and results in a performance that is comparable with that of the former decoder. The multiloss concept is also adopted in this architecture.”  However, in both these cases, the art was published after the effective filing date of this application.  In fact, the references cited by Lian and Wang are the papers of the inventors on this application.  Lian cited “Learning to decode linear codes using deep learning” from 2016 by Nachmani et. al., and Wang cited “RNN decoding of linear block codes” by Nachmani et. al., both of which were included by Applicant in the IDS.  Agrawal et. al. (US 20210142158 A1) may have also independently arrived to this idea, as they incorporate marginalization and a multi-loss function, but their effectively filed date is 9 days after the effectively filed date of the instant application.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Kumar et. al. (“Non Iterative LDPC Decoding By Syndrome Generation Using Artificial Neural Network”) discloses a non-iterative LDPC decoder using a neural network.  
Karami et. al. (“Multi Layer Perceptron Neural Networks Decoder for LDPC Codes”) discloses an LDPC decoder using a neural network
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710. The examiner can normally be reached M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for 



/L.A.S./Examiner, Art Unit 2126 
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126