Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1-14 and 17-20 are rejected under the 35 U.S.C. 103 as being unpatentable over Chorowski et al. (“End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results”) hereinafter Chorowski in further view of Weiss et al. (US 20160350655 A1) hereinafter Weiss.
Regarding claim 1, Chorowski teaches a computer implemented method comprising: -
processing a plurality of characters using a first recurrent neural network, the first recurrent neural network sequentially producing a first plurality of internal arrays of values; (Pg. 3 Fig 1, “Input is 1024 features per frame. Each recurrent layer has 512 hidden units, thus the annotation is 1024-dimensional” thus the reference teaches the limitation of the claim in training the neural network. The reference’s ‘recurrent layer has 512 hidden units’ which represents a recurrent neural network that the claim’s limitation first recurrent neural network. Moreover, the reference in fig. 1 displays that the recurrent layers produces the next phoneme which corresponds to the limitation’s producing plurality of internal arrays of value.)
multiplying the stored plurality of arrays of values by a plurality of attention weights to produce a plurality of selection values; (Pg. 5 Para. 1, “The context is a weighted sum of annotations: 
    PNG
    media_image1.png
    74
    174
    media_image1.png
    Greyscale
where α is a normalized weight for each annotation hi. This effectively means that the decoder selects each annotation hi with a certainly αo, i” thus, the references teach the limitation on the multiplication of array of values by a plurality of weights in which the α, in the reference refers to as the weight of the annotation, h, which can be referred as array of value of characters.)
generating an attention array of values from the stored plurality of arrays of values based on the selection values; (Pg. 2 Section 1.2, “The model we consider in this paper is closely related to the RNN Transducer, however, with an attention mechanism that decides which input frames be used to generate the next output element” thus the reference state that the attention mechanism has given the priority to the output element which corresponds to the limitation’s generation of attention array of value from a plurality of array of values based on the selection value.)
processing the attention array of values using a second recurrent neural network, the second recurrent neural network producing values corresponding to characters of the plurality of characters forming a recognized character sequence. (Pg. 3 Figure 1, “Figure 1: Proposed model architecture. The system contains three parts: an encoder that computes annotations of input frames (learned features that may depend on the whole sequence), an attention mechanism that decides where to look in the input sequence to provide a context for emitting the next output, and a generative output RNN which iteratively predicts the next phoneme conditioned on its state and the context. For visual simplicity we have shown only one context computation” hence, the reference states that an attention mechanism is used to decide where to look in the sequence to provide a context for emitting the next output which correspond to the limitation of the claim.)
Regarding the further limitation in claim 1, Chorowski teaches the computer implemented method of evaluating internal arrays of values from the recurrent neural network. Chorowski does not explicitly teach the limitation about storing the first plurality of internal array of value to form a stored plurality of arrays of values. 
However, Weiss teaches storing the first plurality of internal arrays of values to form a stored plurality of arrays of values (Para. 0029, “In some embodiments, a data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements, for example, through a system bus. The memory elements may include, for example, local memory employed during actual execution of the program code, bulk storage, and cache memories which may provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution” thus, the references stating the storing and executing program code to memory corresponds to the limitation’s storing of internal array of value in memory [Specification Para. 0014]).
Therefore, it would have been obvious for the one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the computer implemented method of character sequence recognition by Chorowski, to include the storing of the array of value to form a stored plurality of arrays of value taught by Weiss, in order to reduce the number of times code must be retrieved from bulk storage during execution [Para. 0029 Line 5-6]. 
Regarding claim 2, Chorowski-Weiss teaches the method of claim 1 further comprising: encoding different characters of the plurality of characters as a plurality of zeros (0) and a single one (1); (Pg. 3 Figure 1, “Encoder RNN: computes an annotation for each input frame” thus, the annotation for each input frame are encoded such as to learn features that may depend on the whole sequence which corresponds to the claim’s limitation.)
In further view regarding claim 2, Chorowski-Weiss teaches the method of decoding the values corresponding to characters from the output of the second recurrent neural network to produce the recognized character sequence. (Pg. 3 Figure 1, “Decoder RNN: Recurrently predicts the next phoneme, input annotations are accessed through a context computed separately for each output” thus, the decoder in the references decodes to predict the next phoneme which corresponds to claim’s limitation.)
The reasons of obviousness have been noted in the rejection of claim 1, above and are applicable herein.
Regarding claim 3, Chrorwski-Weiss teaches the method of claim 1 wherein the selection values comprise one selection value greater than a threshold for selecting one of the stored plurality of arrays of values. (Pg. 8 Section 3.2.1, “The rescaled gradient rg was then passed to the AdaDelta update procedure. The accumulators Elogng and Elog2ng were initialized to 0 at the beginning of the run. The algorithm uses a smaller decay constant to compute the moving average of the mean in the early stage to underestimate the standard deviation. This has the effect of having a threshold close to the mean value at the beginning and increasing the threshold only when the running averages are correctly tracking the norm of the gradient” thus the reference states, ‘This effect of having a threshold close to the mean value at the beginning and increasing the threshold only when the running averages are correctly tracking the norm of the gradient’ states that there are values higher than the threshold that is why the threshold is increasing due to the proximity of the mean values to the threshold.)
The reasons of obviousness have been noted in the rejection of claim 1, above and are applicable herein.
Regarding claim 4, Chorowski-Weiss teaches the method of claim 1 wherein the attention array of values is approximately equal to one of the stored plurality of arrays of values. (Pg. 3 Section 2, “The model consists of an encoder that maps the raw input sequence x = (xi ; i = 1…., I) to a sequence of features h = (hi; i = 1; : : : ; I) (also called annotations), a decoder that generates the output sequence1 y = (yo; o = 1,...,O) and an attention mechanism that matches parts of the input sequence with elements of the output sequence (Bahdanau et al., 2014). A graphical illustration of the model is presented in Fig. 1” thus the references with the input value of x = 1 refers as in array of value in the limitation. The input value in the reference is observed as one of the output sequences as y= 1 after attention mechanism is applied to the input value which explicitly corresponds to the limitation where the attention array of value (y in reference) is equal to at least one of the arrays of value (x in reference).)
The reasons of obviousness have been noted in the rejection of claim 1, above and are applicable herein.
Regarding claim 5, Chorowski-Weiss teaches the attention array of values is approximately equal to a sum of a plurality of adjacent stored arrays of values each multiplied by corresponding selection values. (Pg. 5, “The context is a weighted sum of annotations: 
    PNG
    media_image2.png
    80
    168
    media_image2.png
    Greyscale
 where αo,i is a normalized weight for each annotation hi . This effectively means that the decoder selects each annotation hi with a certainty αo,i. The selection is performed in two steps. First, eo,i scores are computed to match the previous state of the decoder to all annotations. The scores are then penalized based on the relative position of the current and previous selection, and normalized:(”
    PNG
    media_image3.png
    158
    558
    media_image3.png
    Greyscale
where the computation a (.,.) is an MLP with one hidden layer and linear output, while d(.) is an MLP with a single hidden layer and logistic sigmoid output ([0, 1]). The proposed addition of the gating procedure can be understood as follows. The attention mechanism searches through the input sequence to find frames that match the current state of the decoder. However, in an utterance there may be repeated phonemes that have very similar annotations, which consequently, result in matching to all similarly sounding locations, as shown in Fig. 2 (b).The gating procedure prevents this behavior by confining the search to locations that are near the inputs relevant to the previously generated symbol” thus, the reference states that the (Co) context is weighted sum of annotations (Σ) which corresponds to the limitation’s attention array of value which is equal to the sum of adjacent array of values multiplied by selection value. Moreover, the reference states that the MLP includes the hidden layer and the logistic sigmoid output meaning that the value [0 to 1] is multiplied to the layers corresponding to the selection values which are multiplied with array of values in the limitation.)
The reasons of obviousness have been noted in the rejection of claim 1, above and are applicable herein.

Regarding claim 6, Chorowski-Weiss teaches the method of claim 1 wherein the selection values comprise a plurality of selection values, the generating the attention array step further comprising:  multiplying each selection value by a corresponding array of values in the stored plurality of arrays of values to produce a plurality of weighted arrays; adding the weighted arrays to produce the attention array of values (Pg. 5 Section 2.1.1, “We would like to constrain the selection of the frames for consecutive outputs to advance monotonically in time. To motivate the network to find monotonic alignments we design a penalty that is added to the optimization cost. We penalize any alignment that maps to inputs which were already considered for output emission. Since the selection weights _o;i are normalized, their cumulative sum over the input sequence is monotonically increasing and bounded in the range [0; 1]. To encourage the monotonicity of the alignment over time, we add to the optimization cost the differences po between the cumulative sums of the selection weights used to emit the current and previous phoneme:” thus the reference states that the selection weight are normalized and their cumulative sum over the input sequence is monotonicity of the alignment over time which corresponds to the limitation that states the multiplying each selection value by a corresponding array of value and adding the weighted arrays to produce the attention array of values. Moreover, the reference initially states that the selection of the frame which, here, refers to the selected value of limitation, these selection weight for the frame are normalized with the sum of input which is referred as the array of value in the limitation.)
The reasons of obviousness have been noted in the rejection of claim 1, above and are applicable herein.
Regarding claim 7, Chorowski-Weiss teaches the method of claim 1 wherein, as each character is processed by the first neural network, the output of the first recurrent neural network produces a new internal array of values of the first plurality of internal arrays of values, and in accordance therewith, the stored plurality of arrays of values correspond to particular characters of the plurality of characters.
(Para 0013, “According to some embodiments, an unsupervised, or weakly supervised, learning process, executed by a word semantics derivation model of a system for Deep Learning may include: (1)receiving a set of one or more words (hereinafter: ‘words input’; e.g. a sentence); (2) entering the word set, as an input set, to a sequence classifying, deep multi-layered recurrent, and/or recursive, neural network based, word semantics derivation model, wherein the word semantics derivation model is adapted for: (i) weakly supervising the model learning by providing a substantially small amount of ‘right’ semantic taggings as learning examples to the model; (ii) assigning markup language semantic tags to at least some, and/or a subset, of the words; (3) repeating stages (1) and (2) one or more additional times, while utilizing stochastic gradient descend for learning ‘correct’ semantic tagging, and improving following taggings' outputs” thus the reference states that the input of receiving a string of one or more characters which are encoded with multi-value index which are input for the second recurrent neural network [Tal, Weiss, fig 1] which corresponds to the limitation where it states that the output of the first neural network is the particular character of the plurality of the characters.)
The reasons of obviousness have been noted in the rejection of claim 1, above and are applicable herein.
Regarding claim 8, Chorowski-Weiss teaches the method of claim 1 wherein the plurality of characters are N characters, the stored plurality of arrays of values comprise N arrays each having M values, and the plurality of attention weights comprise M attention weights, wherein each of the stored N arrays of M values is multiplied by the M attention weights to produce N selection values, where N and M are integers. (Page 2 section:1.2, “The model we consider in this paper is closely related to the RNN Transducer, however, with an attention mechanism that decides which input frames be used to generate the next output element. The attention mechanism in this respect was first used by Graves (2013) to build a neural network that generates convincing handwriting from a given text. At each step, the network predicts a (soft-)window over the input sequence that corresponds to the character being currently written. A similar approach of attention was used more recently in a so-called “neural machine translation model” (Bahdanau et al., 2014). In this case, for generating each target word, the network computes a score matching the hidden state of an output RNN to each location of the input sequence (Bahdanau et al., 2014). The scores are normalized to sum to one over the input sequence and can be interpreted as a probability of each input location being aligned to the currently generated target word” thus the reference states that the character is generated from handwriting text by using neural network with attention mechanism which explicitly corresponds to the weight attention, which generates selection values for the internal array of value, that predicts the desired stored array of values to be sent to the second stage RNN [Para.  0028 Line 2-5].)
The reasons of obviousness have been noted in the rejection of claim 1, above and are applicable herein.
Regarding claim 9, Chorowski-Weiss teaches the method of claim 1 wherein each of the plurality of selection values are between zero (0) and one (1). (Pg. 5, “The selection is performed in two steps. First, eo,i scores are computed to match the previous state of the decoder to all annotations. The scores are then penalized based on the relative position of the current and previous selection, and normalized:
(”
    PNG
    media_image3.png
    158
    558
    media_image3.png
    Greyscale
where the computation a (.,.) is an MLP with one hidden layer and linear output, while d(.) is an MLP with a single hidden layer and logistic sigmoid output ([0, 1])” thus the function of the selection in the reference is observed with computing the score considering the pervious state with the relative position of the current and previous selection. The computation includes multi-layer perception (MLP) with single layer and logistic sigmoid output [0,1] which corresponds to the selection value of the limitation between 0 and 1.)
The reasons of obviousness have been noted in the rejection of claim 1, above and are applicable herein.
Regarding claim 10, Chorowski-Weiss teaches the method of claim 1 wherein the plurality of characters are processed by the first recurrent neural network before performing the generating the attention array step and before the processing the attention array of values using the second recurrent neural network step. (Para. 0013 , “According to some embodiments, an unsupervised, or weakly supervised, learning process, executed by a word tokenization and spelling correction model of a system for Deep Learning may include: (1) receiving a string of one or more characters; (2) encoding and indexing the characters as a multi-value index; (3) embedding each character as a numbers vector; (4) entering a matrix of one or more character number vectors, as input, to a recurrent, or a convolutional, neural network language model, wherein the language model is adapted for” thus as the reference states that the character is processed through the first recurrent neural network before generating a model weights for tuning module is done [Fig. 1])
The reasons of obviousness have been noted in the rejection of claim 1, above and are applicable herein.
Regarding claim 11, Chorowski-Weiss teaches the method of claim 1 wherein the attention array of values is maintained as an input to the second recurrent neural network for a plurality of cycles. (Para. 0017, “FIG. 1 is a block diagram of an exemplary system for deep learning based natural language understanding, comprising a tokenization and spelling correction model/machine and a word semantics derivation model/machine, in accordance with some embodiments of the present invention;” thus the figure displayed in the references with first machine for tokenization and spelling check, second for semantics derivation uses recurrent neural network [Fig. 1] which corresponds to the claim’s limitation.)
The reasons of obviousness have been noted in the rejection of claim 1, above and are applicable herein.
Regarding claim 12, Chorowski-Weiss teaches the method of claim 1 wherein the second recurrent neural network comprises output layer weights operating on a second plurality of internal arrays of values in the second recurrent neural network. (Para. 0017, “FIG. 1 is a block diagram of an exemplary system for deep learning based natural language understanding, comprising a tokenization and spelling correction model/machine and a word semantics derivation model/machine, in accordance with some embodiments of the present invention;” thus the ‘machine 2’ is the second recurrent neural network in the reference to process the data output from machine 1 which corresponds to the limitation of the claim. [See also Fig. 2])
The reasons of obviousness have been noted in the rejection of claim 1, above and are applicable herein.
Regarding claim 13, Chorowski-Weiss teaches the method of claim 12 wherein the first plurality of internal arrays of values are multiplied by first feedback weights to produce a first feedback result in the first recurrent neural network, and wherein the second plurality of internal arrays of values are multiplied by second feedback weights to produce a second feedback result in the second recurrent neural network. (Fig. 1, figure one displays the recurrent neural network wherein the character goes through the first RNN and comes out in for different feedback weight tuning such as training input noise tuning module. The same is displayed for the second machine or the second recurrent neural network. The reference that states that the term ‘feedback’ is given to the system for correct/successful handling or removal of the artificially introduced ‘noise’ element as part of the learning process [Para. 0052].)
The reasons of obviousness have been noted in the rejection of claim 12, above and are applicable herein.
Regarding claim 14, Chorowski-Weiss teaches the method of claim 1 wherein the second recurrent neural network successively produces a plurality of output arrays of likelihood values, and wherein a position of each likelihood value in each of the output arrays corresponds to a different character of the plurality of characters, the method further comprising successively producing a character having a highest likelihood values in each of the output arrays. (Para. 0017, “FIG. 1 is a block diagram of an exemplary system for deep learning based natural language understanding, comprising a tokenization and spelling correction model/machine and a word semantics derivation model/machine, in accordance with some embodiments of the present invention;” thus the figure displayed in the references with first machine for tokenization and spelling check, second for semantics derivation uses recurrent neural network [Fig. 1] which corresponds to the claim’s limitation.)
The reasons of obviousness have been noted in the rejection of claim 1, above and are applicable herein.
Regarding claim 17, Chorowski-Weiss teaches the method of claim 1 wherein the plurality of characters are received from an optical character recognition system and the plurality of characters correspond to a transaction receipt. (Para. 0028, “In some embodiments, the medium may be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Some demonstrative examples of a computer-readable medium may include a semiconductor or solid-state memory, magnetic tape, a removable computer diskette, a random-access memory (RAM), a read-only memory (ROM), a rigid magnetic disk, and an optical disk. Some demonstrative examples of optical disks include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W), and DVD” thus the medium used in the reference states that, “the medium may be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium” corresponds to the limitation’s ‘receive characters from an optical character recognition.’)
The reasons of obviousness have been noted in the rejection of claim 1, above and are applicable herein.
Regarding claim 18, Chorowski-Weiss teaches the method of claim 17 wherein the plurality of characters corresponds to the transaction receipt date or amount. (Para. 0102, “The exemplary system shown comprises: a tokenization and spelling correction model/machine and a word semantics derivation model/machine. The spelling correction model/machine comprises a Characters String Receipt Module for receiving from a Training Input Data database/corpus character string inputs (e.g. training data during model learning and ‘real’ user data once in its ‘steady state’), and/or for sourcing at least some of the training phase character strings inputs for the word tokenization and spelling correction language-model from corpuses or databases that address textual mistakes and provide their respective fixes or corrections (e.g. Wikipedia)” thus the reference state that the example that the receipt module receives are training data and ‘real’ data which corresponds to the invention limitation’s receipt. The limitation claims about the receipt’s amount which is correspondent to the reference’s amount of training data or user data.)
The reasons of obviousness have been noted in the rejection of claim 17, above and are applicable herein.
Regarding claim 19, Chorowski-Weiss teaches a non-transitory machine-readable medium storing a program executable by at least one processing unit of a computer, the program comprising sets of instructions for: (Para. 0027, “Furthermore, some embodiments of the invention may take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For example, a computer-usable or computer-readable medium may be or may include any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device” thus the reference states that the computer-readable medium executes the program code for use which explicitly corresponds to the claim’s invitation.)
processing a plurality of characters using a first recurrent machine learning algorithm, the first recurrent machine learning algorithm sequentially producing a first plurality of internal arrays of values; (Para. 0013, “According to some embodiments, an unsupervised, or weakly supervised, learning process, executed by a word tokenization and spelling correction model of a system for Deep Learning may include: (1) receiving a string of one or more characters; (2) encoding and indexing the characters as a multi-value index; (3) embedding each character as a numbers vector; (4) entering a matrix of one or more character number vectors, as input, to a recurrent, or a convolutional, neural network language model, wherein the language model is adapted for:” thus the reference states the ‘multi-value index’ which corresponds to the limitation’s ‘array of values.’)
storing the first plurality of internal arrays of values to form a stored plurality of arrays of values; (Para. 0029, “In some embodiments, a data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements, for example, through a system bus. The memory elements may include, for example, local memory employed during actual execution of the program code, bulk storage, and cache memories which may provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution” thus, the references stating the storing and executing program code to memory corresponds to the limitation’s storing of internal array of value in memory [Specification Para. 0014])
multiplying the stored plurality of arrays of values by a plurality of attention weights to produce a plurality of selection values; (Para. 0073, According to some embodiments, specific words inputs for ‘Forced Training’ or ‘Trauma Induced Training’ may be grouped in a separate corpus. According to some embodiments, ‘Forced Training’ corpus inputs may be designated with a higher ‘weight’ than other ‘regular’ corpora inputs, thus, making them more likely to be elected and drawn for the model's training. According to some embodiments, the selected ‘weight’ allocated to the ‘Forced Training’ corpus inputs may be set empirically, by finding the value providing optimal results” thus the reference points out the word ‘Forced Training’ as an example which is designated with a higher weight which explicitly corresponds to the claim’s limitation.)
generating an attention array of values from the stored plurality of arrays of values based on the selection values; (Para. 0047, “According to some embodiments, an unsupervised, or weakly supervised, learning process ,executed by a word tokenization and spelling correction model of a system for Deep Learning may include: (1) receiving a string of one or more characters; (2) encoding and indexing the characters as a multi-value index; (3) embedding each character as a numbers vector; (4) entering a matrix of one or more character number vectors, as input, to a recurrent, or a convolutional, neural network language model, wherein the language model is adapted for: (i) parsing the data into words and tokenizing the words; (ii) correcting misspelled words; and/or (iii) auto-completing words; (5) repeating stages (1)-(4)one or more additional times, while intermittently introducing noise into the input and examining the language model's output; and/or (6) tuning the language model, and/or the amount and type of introduced noise, at least partially based on the language model's examined output(s)” thus the reference’s stating that the array of value which are the characters embedded as a number vector corresponds to the limitation’s attention array of value which are produced with multiplying array vector [Specification Para. 0017].)
processing the attention array of values using a second recurrent machine learning algorithm, the second recurrent machine learning algorithm producing values corresponding to characters of the plurality of characters forming a recognized character sequence. (Para. 0017, “FIG. 1 is a block diagram of an exemplary system for deep learning based natural language understanding, comprising a tokenization and spelling correction model/machine and a word semantics derivation model/machine, in accordance with some embodiments of the present invention;” thus the figure displayed in the references with first machine for tokenization and spelling check, second for semantics derivation uses recurrent neural network [Fig. 1] which corresponds to the claim’s limitation.)
Regarding claim 20, Chorowski-Weiss teaches a computer system comprising: 
a processor;(Para. 0028, “In some embodiments, a data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements, for example, through a system bus. The memory elements may include, for example, local memory employed during actual execution of the program code, bulk storage, and cache memories which may provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution” thus the reference corresponds to the limitation.)
a non-transitory machine-readable medium storing a program executable by the processor, the program comprising sets of instructions for: (Para. 0027, “Furthermore, some embodiments of the invention may take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For example, a computer-usable or computer-readable medium may be or may include any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device” thus the reference states that the computer-readable medium executes the program code for use which explicitly corresponds to the claim’s invitation.)
processing a plurality of characters using a first recurrent neural network, the first recurrent neural network sequentially producing a first plurality of internal arrays of values. (Para. 0013, “According to some embodiments, an unsupervised, or weakly supervised, learning process, executed by a word tokenization and spelling correction model of a system for Deep Learning may include: (1) receiving a string of one or more characters; (2) encoding and indexing the characters as a multi-value index; (3) embedding each character as a numbers vector; (4) entering a matrix of one or more character number vectors, as input, to a recurrent, or a convolutional, neural network language model, wherein the language model is adapted for:” thus the reference states the ‘multi-value index’ which corresponds to the limitation’s ‘array of values.’)
storing the first plurality of internal arrays of values to form a stored plurality of arrays of values; (Para. 0029, “In some embodiments, a data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements, for example, through a system bus. The memory elements may include, for example, local memory employed during actual execution of the program code, bulk storage, and cache memories which may provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution” thus, the references stating the storing and executing program code to memory corresponds to the limitation’s storing of internal array of value in memory [Specification Para. 0014])
multiplying the stored plurality of arrays of values by a plurality of attention weights to produce a plurality of selection values; (Para. 0073, According to some embodiments, specific words inputs for ‘Forced Training’ or ‘Trauma Induced Training’ may be grouped in a separate corpus. According to some embodiments, ‘Forced Training’ corpus inputs may be designated with a higher ‘weight’ than other ‘regular’ corpora inputs, thus, making them more likely to be elected and drawn for the model's training. According to some embodiments, the selected ‘weight’ allocated to the ‘Forced Training’ corpus inputs may be set empirically, by finding the value providing optimal results” thus the reference points out the word ‘Forced Training’ as an example which is designated with a higher weight which explicitly corresponds to the claim’s limitation.)
generating an attention array of values from the stored plurality of arrays of values based on the selection values; (Para. 0047, “According to some embodiments, an unsupervised, or weakly supervised, learning process ,executed by a word tokenization and spelling correction model of a system for Deep Learning may include: (1) receiving a string of one or more characters; (2) encoding and indexing the characters as a multi-value index; (3) embedding each character as a numbers vector; (4) entering a matrix of one or more character number vectors, as input, to a recurrent, or a convolutional, neural network language model, wherein the language model is adapted for: (i) parsing the data into words and tokenizing the words; (ii) correcting misspelled words; and/or (iii) auto-completing words; (5) repeating stages (1)-(4)one or more additional times, while intermittently introducing noise into the input and examining the language model's output; and/or (6) tuning the language model, and/or the amount and type of introduced noise, at least partially based on the language model's examined output(s)” thus the reference’s stating that the array of value which are the characters embedded as a number vector corresponds to the limitation’s attention array of value which are produced with multiplying array vector [Specification Para. 0017].)
processing the attention array of values using a second recurrent neural network, the second recurrent neural network producing values corresponding to characters of the plurality of characters forming a recognized character sequence. (Para. 0017, “FIG. 1 is a block diagram of an exemplary system for deep learning based natural language understanding, comprising a tokenization and spelling correction model/machine and a word semantics derivation model/machine, in accordance with some embodiments of the present invention;” thus the figure displayed in the references with first machine for tokenization and spelling check, second for semantics derivation uses recurrent neural network [Fig. 1] which corresponds to the claim’s limitation.)
Claim 15 and 16 are rejected under the 35 U.S.C. 103 as being unpatentable over Chorowski et al. (“End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results”) hereinafter Chorowski in further view of Weiss et al. (US 20160350655 A1) hereinafter Weiss in further view of Tilk et al.  (“Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration”) hereinafter Tilk.
Regarding claim 15, Chorowski-Weiss teaches the method of claim 1 further comprising processing the characters using first and second recurrent neural network, having an output sequentially producing first and second plurality of internal arrays of values respectively. Chorowski-Weiss does not teach the method of processing the characters using a third recurrent neural network in reverse order relative to the processing of characters using the first recurrent neural network, the third recurrent neural network having an output sequentially producing a second plurality of internal arrays of values.
However, Tilk teaches the method of processing of character using the first recurrent neural network, the third recurrent neural network having an output sequentially producing a second plurality of internal arrays of values. (Pg. 3048, “The bidirectional layer is followed by a unidirectional GRU layer with an attention mechanism. This layer processes the bidirectional states sequentially and keeps track of the current position in text, while the attention mechanism can focus on relevant bidirectional context aware word representations before and after the current position. The state st of the layer st = GRU(ht,st-1) is late fused with the attention model output at which is computed based on the previous states H = (h1,….,hT) as described in [25]. The late fused state ft” thus, the figure 1 in the reference clearly display that the bi-directional recurrent neural network has processed each input/character of a text into the layer as time (t) increases. The figure 1 in the reference as h1 passes from left to right corresponds to the first RNN with the limitation and as h1 passes from right to left corresponds to the third RNN of the limitation. As the limitation has indicated that the with the processing of the character with first and third RNN to produce together another array of values which can be observed in the reference figure 1.)
In further limitation of claim 15, storing the second plurality of internal arrays of values with the stored plurality of arrays of values, wherein arrays of values from the first and second plurality of arrays of values produced at the same time are stored together in the stored plurality of arrays of values. (Pg. 3048, “The model described above is used for single stage training and as the first stage in two-stage training. Graphical description of the model can be seen in Figure 1. For two-stage training, to incorporate pause duration and adapt to target domain, the second stage discards the first stage output layer and replaces it with a new recurrent GRU layer” thus the reference states that there is two-stage training which involve two values that are stored together such as the ‘late fusion’ in fig 1. Of the reference. The two recurrent neural network model                     
                        
                            
                                h
                            
                            ⃑
                        
                    
                t and                     
                        
                            
                                h
                            
                            ⃐
                        
                    
                t, fuse together using LSTM or GRU [Tilk Pg. 3048 Section:2 Para. 4].) 
Therefore, it would have been obvious for the one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the computer implemented method of character sequence recognition using first and second recurrent neural network, by Chorowski-Weiss, to include the bi-directional recurrent neural network with attention mechanism with textual approach , by Tilk, combining with textual features with prosodic information for improving character recognition though text-only model [Pg. 3050 Section: Conclusion].
The disclosure of Chorowski-Weiss and Tilk, hereinafter CWT are analogous art to the claimed invention because they are in the same field of recognizing character using recurrent neural network.
Regarding claim 16, CWT teaches the method of claim 15, wherein processing the plurality of characters using the first recurrent neural network and the third recurrent neural network comprising: 
for each character of the plurality of characters: receiving an input array of values, the input array of values corresponding to representations of characters of the plurality of characters, wherein different characters of the plurality of characters are represented as a plurality of zeros (0) and a single one (1). (Pg. 5, “The selection is performed in two steps. First, eo,i scores are computed to match the previous state of the decoder to all annotations. The scores are then penalized based on the relative position of the current and previous selection, and normalized:(”
    PNG
    media_image3.png
    158
    558
    media_image3.png
    Greyscale
where the computation a (.,.) is an MLP with one hidden layer and linear output, while d(.) is an MLP with a single hidden layer and logistic sigmoid output ([0, 1])” thus the function of the selection in the reference is observed with computing the score considering the pervious state with the relative position of the current and previous selection. The computation includes multi-layer perception (MLP) with single layer and logistic sigmoid output [0,1] which corresponds to the selection value of the limitation between 0 and 1.)
multiplying the input array of values by a plurality of input weights to produce a weighted input array of values. (Pg. 5 Para. 1, “The context is a weighted sum of annotations: 
    PNG
    media_image1.png
    74
    174
    media_image1.png
    Greyscale
where α is a normalized weight for each annotation hi. This effectively means that the decoder selects each annotation hi with a certainly αo, i.”  thus, the references teach the limitation on the multiplication of array of values by a plurality of weights in which the α, in the reference refers to as the weight of the annotation, h, which can be referred as array of value of characters.)
multiplying an internal array of values of a first and second plurality of internal arrays of values by a plurality of feedback weights to produce a weighted internal array of values; (Fig. 1, figure one displays the recurrent neural network wherein the character goes through the first RNN and comes out in for different feedback weight tuning such as training input noise tuning module. The same is displayed for the second machine or the second recurrent neural network. The reference that states that the term ‘feedback’ is given to the system for correct/successful handling or removal of the artificially introduced ‘noise’ element as part of the learning process [Para. 0052].)
adding the weighted input array of values to the weighted internal array of values to produce an intermediate result array of values. (Pg. 5 Section 2.1.1, “We would like to constrain the selection of the frames for consecutive outputs to advance monotonically in time. To motivate the network to find monotonic alignments we design a penalty that is added to the optimization cost. We penalize any alignment that maps to inputs which were already considered for output emission. Since the selection weights _o;i are normalized, their cumulative sum over the input sequence is monotonically increasing and bounded in the range [0; 1]. To encourage the monotonicity of the alignment over time, we add to the optimization cost the differences po between the cumulative sums of the selection weights used to emit the current and previous phoneme:” thus the reference states that the difference between the cumulative states that the selection weight are normalized and their cumulative sum over the input sequence is monotonicity of the alignment over time which corresponds to the limitation that states the multiplying each selection value by a corresponding array of value and adding the weighted arrays to produce the attention array of values. Moreover, the reference initially states that the selection of the frame which, here, refers to the selected value of limitation, these selection weight for the frame are normalized with the sum of input which is referred as the array of value in the limitation.)
subtracting a bias array of values from the intermediate result array of values to produce an updated internal array of values. (Pg. 5 Section 2.1.1, “We would like to constrain the selection of the frames for consecutive outputs to advance monotonically in time. To motivate the network to find monotonic alignments we design a penalty that is added to the optimization cost. We penalize any alignment that maps to inputs which were already considered for output emission. Since the selection weights _o;i are normalized, their cumulative sum over the input sequence is monotonically increasing and bounded in the range [0; 1]. To encourage the monotonicity of the alignment over time, we add to the optimization cost the differences po between the cumulative sums of the selection weights used to emit the current and previous phoneme: 
    PNG
    media_image4.png
    73
    364
    media_image4.png
    Greyscale
” thus the reference clearly has cited the optimization cost the difference between the cumulative sums of the selection weights used to emit the current and previous phoneme.) 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SUBASH LIMBU whose telephone number is (571)272-0633. The examiner can normally be reached Monday - Friday 0730 - 530.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, OMAR FERNANDEZ RIVAS can be reached on (571)272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/S.L./Examiner, Art Unit 2128                                                                                                                                                                                                        
/ERIC NILSSON/Primary Examiner, Art Unit 2122