Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103 is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
Claim Rejections - 35 USC § 103
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 103 that form the basis for the rejections under this section made in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-2, 4-6, 8-9, 11-13, 15-16, and 18-20 are rejected under 35 USC 103(a) as being unpatentable over Leidner et al. (US 2018/0329883 A1 A1) in view of Lin et al. (A Structured Self-Attentive Sentence Embedding).
Regarding Claims 1, 8, and 15, Leidner discloses a system for transforming unstructured text into structured form (Fig. 1, Computer based system 100), comprising: 
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to (¶26 and ¶43, a computer-based system comprising a processor in electrical communication with a memory, the memory adapted to store data and instructions for executing by the processor, and a neural paraphrase generator): 
¶26 and Fig. 3, receiving source sequence of words w1 – w5); 
obtain a first word embedding and a first part-of-speech ("POS") tag embedding both corresponding to the first word (¶26 and Fig. 5, input composition component transforms word w1 into a vector by (1) mapping w1 to word embed matrix to generate word vector and (2) map parts of speech tag p1 to tag embedding matrix to generate tag vector for p1); 
obtain a second word embedding and a second POS tag embedding both corresponding to the second word (¶26 and Fig. 5, input composition component transforms word w2 into a vector by (1) mapping w2 to word embed matrix to generate word vector and (2) map parts of speech tag p2 to tag embedding matrix to generate tag vector for p2); 
concatenate the first word embedding with the first POS word embedding into a first input and the second word embedding with the second POS word embedding into a second input (¶26, respectively concatenating together the word vectors and the tag vectors); and 
use attention (Fig. 4 and ¶56, encoder 402, decoder 404, attention module 406) to process the first input and the second input through a bidirectional recurrent neural network ("RNN") to generate a first output corresponding to the first input and a second output corresponding to the second input (¶24-25, a model using a bi-directional recurrent neural network encoder-decoder with POS tag information; ¶26, predict a probability of a target sequence of words representing a target output sentence based on a recurrent state in the decoder, a set of previous words and a context vector), wherein the ¶66 and Fig. 7, define two distinct probability distributions, the first over the vocabulary of words Wand the second over the POS tags vocabulary P. To predict the probability of both xj and pj).
Leidner does not disclose using self-attention to process the first input and the second input
Lin discloses a model for extracting an interpretable sentence embedding by introducing self-attention where instead of using a vector, use a 2D matrix to represent embedding with each row of the matrix attending on a different part of the sentence (Abstract).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to modify Leidner to use self-attention process the first input and the second input through the RNN to predict a probability of a target sequence of words representing a target output sentence based on a recurrent state in the decoder (Leidner, Abstract) with significant performance gain (Lin, Abstract).
Further regarding claim 8, Leidner discloses a non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to transform unstructured text into structured form according to claims 1 and 15 (¶26, processor executing instructions stored in a memory). 
Regarding claims 2, 9, and 16, Leidner discloses wherein the first output includes three labels corresponding to the first word (Fig. 7, in view of ¶66, see a third softmax layer to compute a third probability corresponding to attr2i). 
Regarding claims 4, 11, and 18, Leidner discloses wherein the bidirectional RNN includes one of a long short-term memory ("LSTM") (¶24, The encoder of the model, which is a bidirectional Recurrent Neural Network (RNN) with LSTM cells) and a gated recurrent unit ("GRU") that controls how information is passed down from layer to layer of the bidirectional RNN (¶45, popular choice for fenc are a vanilla RNN unit, a GRU, or an LSTM unit).
Regarding claims 5, 12, and 19, Leidner discloses wherein the instructions, upon execution, further cause the one or more computers to transform unstructured text into structured form by: saving a key pair including the first word and the first output in a database (¶57 and ¶64, output target sequence of tuples representing predicted words and structured tag elements; ¶68, after correctly predicting both words and POS tags, store as target tuples composed of aligned sequences of target words and target POS tags in a parallel training corpus; see ¶43, database 204 storing tagged training data).
Regarding claims 6, 13, and 20, Leidner discloses wherein the instructions, upon execution, further cause the one or more computers to transform unstructured text into structured form by: saving a key pair including the second word and the second output in a database (¶57 and ¶64, output target sequence of tuples representing predicted words and structured tag elements; ¶68, after correctly predicting both words and POS tags, store as target tuples composed of aligned sequences of target words and target POS tags in a parallel training corpus; see ¶43, database 204 storing tagged training data).
Claims 7 and 14 are rejected under 35 USC 103(a) as being unpatentable over Leidner et al. (US 2018/0329883 A1 A1) in view of Lin et al. (A Structured Self-Attentive Sentence Embedding) as applied to claims 1 and 8, in further view of Le Roux et al. (US 2019/0318725 A1).
Leidner discloses wherein at an output layer of the bidirectional RNN (¶46), each of the label prediction scores is normalized between 0 and 1 (Figs. 11-14), and wherein a probability that a word fits with a label is independent of a probability that the same word fits with another label (e.g., Fig. 11, word “European” with respective prediction scores between 0 and 1 for respective POS tags).
Leidner does not disclose using a Sigmoid function is used to normalize each of the label prediction scores.
Le Roux teaches a speech recognition neural network comprising bidirectional LSTM RNN layers stacked on top of each other, followed by a linear layer to compute a D-dimensional vector for each T-F “time frame and frequency” unit within a given frame from the output of the stack of BLSTM layers at that frame, followed by a non-linearity such as a sigmoid, and a unit-norm normalization of the D-dimensional vector to obtain a D-dimensional embedding (¶145).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to modify Leidner to use a sigmoid function to normalize each of the label prediction scores in order to generate a corresponding embedding vector (Le Roux, ¶80, applying a sigmoid to each element of the D-dimension vector, and renormalizes it so that it has unit Euclidean norm, leading to an embedding vector for each time frame and frequency).
Allowable Subject Matter
s 3, 10, and 17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
For example, US 2017/0357896 A1 teaches a neural network for embedding documents (Abstract) where the network can be trained by propagating back errors (i.e., loss) of an objective function to continuously adjust the weights of the neural network based on loss L: 
    PNG
    media_image1.png
    37
    365
    media_image1.png
    Greyscale
 where the loss function can be summarized or abbreviated as an average over all of the output vectors yiu of [D(yt, ys) – D (yt, yiu)]] (¶99).
Prior arts of record do not disclose, alone or in combination, the subject matter of Claims 3, 10, and 17 as follow:
wherein the instructions, upon execution, further cause the one or more computers to transform unstructured text into structured form by: using a loss, HLdiff, to perform back-propagation to adjust weights of the bidirectional RNN, wherein HLdiff=average(yt*(1-yp)+(1-yt)*yp), where yt is the vector of true labels and yp is the vector of independent probabilities of predicted labels. 
Conclusion

Prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
US 2019/0228099 A1 discloses word embeddings used for training sequence to sequence models contain part of speech tags where parts of speech tags were added to word embedding / feature vectors of real numbers.
US 2020/0364409 A1 discloses complementing word embeddings with extra linguistic features such as part of speech tag embeddings. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to examiner Richard Z. Zhu whose telephone number is 571-270-1587 or examiner’s supervisor King Poon whose telephone number is 571-272-7440. Examiner Richard Zhu can normally be reached on M-Th, 0730:1700.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RICHARD Z ZHU/Primary Examiner, Art Unit 2675                                                                                                                                                                                                        01/14/2021