DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
Claims 1-5, 8-13, 15-18, and 20 are pending in this application.
Claims 6, 7, 14 and 19 are canceled.
Response to Arguments
Regarding Rejection under 35 U.S.C. 102 and 103 
Applicant’s arguments with respect to claims 1-20 have been considered but are moot because the arguments do not apply to any of the references being used in the current rejection.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-5, 9, 11-13, and 16-18 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Zhang et al., (US Pub. 2019/0073351) in view of Song et al., (US Pub. 20180225281).
Regarding claim 1, Zhang discloses a method, comprising:
receiving a training dataset comprising a plurality of labeled tokens (Fig. 4 and [0054]-[0063] obtaining first data that includes training text sequences and a respective POS tag for each token in the training text sequence);
determining, by the neural network, a second tag associated with a previous token which has been processed by the neural network before processing the current token, and a third tag associated with a next token to be processed by the neural network after processing the current token (Fig. 4 and [0054]-[0063] obtaining second “data that includes training text sequences and, for each training text sequence, a set of parser states that defines a dependency parse of the training text sequence”); 
computing, for the training dataset, a value of a loss function reflecting a first loss value, a second loss value, and a third loss value, wherein the first loss value is represented by a first difference of the first tag and a first label associated with the current token by the training dataset, wherein the second loss value is represented by a second difference of the second tag and a second label associated with the previous token by the training dataset, and wherein the third loss value is represented by a third difference of the third tag and a third label associated with the next token by the training dataset ([0064]-[0070] “the system determines the POS loss from an error between (i) the scores generated by the POS neural network by processing the feature input for the token in the training example and (ii) a set of scores that indicate that the POS tag in the training example is the correct POS tag for the token”); and
adjusting a parameter of the neural network based on the value of the loss function ([0066]-[0069] updating values of parameters of the dependency parsing neural network and the initial neural network layers, i.e., using a stochastic gradient descent with backpropagation training technique).
Zhang does not explicitly teach, however, Song does explicitly teach:
producing a feature vector representing a current token processed by a neural network, wherein the feature vector includes a set of elements representing grammatical attributes of the current token; producing, based on the feature vector, a first tag associated with the current token ([0002][0003] “tagging tokens with Part-of-Speech (POS) tags to tagging tokens with supertags encoding complex grammatical relations …The task of POS tagging is defined as providing categories for each token with grammatical tags according to their syntactic function in a sentence. … State-of-the-art POS tagging systems can include those based on Maximum Entropy models and deep learning architectures, such as LSTM”).
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the method of generating dependency parses of text segments using neural networks as taught by Zhang with the method for automatic semantic token tagging using a threshold and grammatical features by a neural network as taught by Song to provide a solution to a label bias problem, that the labels in a given state only compete against other labels reachable from that state (Song, [0042]).
Regarding claim 2, Zhang in view of Song discloses the method of claim 1, and Zhang further discloses: performing, using the neural network, a natural language processing task ([0038]-[0040] natural language processing using neural network).
Regarding claim 3, Zhang in view of Song discloses the method of claim 1.
Zhang does not explicitly teach, however, Song does explicitly teach:
repeating the determining, computing, and adjusted operations until the value of the loss function falls below a predetermined threshold ([0029][0031][0032] calculating all distinct scores, and identify sequence variations for which the probability score from the language model is more than a specified threshold).
Regarding claim 4, Zhang in view of Song discloses the method of claim 1, and Zhang further discloses: wherein the current token is represented by a natural language word, and wherein the first tag identifies a part of speech associated with the current token (Fig. 1, [0036] “The POS output layer 120 is configured to process a feature representation 112 of a given token in the input text segment to generate an output that defines a POS tag 122 for the given token that defines a part-of-speech assignment for the given token”).
Regarding claim 5, Zhang in view of Song discloses the method of claim 1.
Zhang does not explicitly teach, however, Song does explicitly teach:
wherein the current token is represented by a natural language word, and wherein the first tag identifies one or more grammatical attributes associated with the current token ([0002] “tagging tokens with Part-of-Speech (POS) tags to tagging tokens with supertags encoding complex grammatical relations”).	
Regarding claim 9, Zhang in view of Song discloses the method of claim 1, and Zhang further discloses: 
wherein adjusting the parameter of the neural network further comprises: back-propagating an error reflected by the value of the loss function to one or more previous layers of the neural network; and adjusting an edge weight in order to minimize the loss function (claim 9, “backpropagating the parsing loss through the dependency parsing neural network and the initial neural network layers to update values of parameters of the dependency parsing neural network and the initial neural network layers”).
Regarding claim 11, Zhang discloses a method, comprising
receiving a training dataset comprising a plurality of labeled natural language words (Fig. 4 and [0054]-[0063] obtaining first data that includes training text sequences and a respective POS tag for each token in the training text sequence);
determining, by the neural network, a second tag associated with a previous word which has been processed by the neural network before processing the current word, and a third tag associated with a next word to be processed by the neural network after processing the current word (Fig. 4 and [0054]-[0063] obtaining second “data that includes training text sequences and, for each training text sequence, a set of parser states that defines a dependency parse of the training text sequence”); 
computing, for the training dataset, a value of a loss function reflecting a first loss value, a second loss value, and a third loss value, wherein the first loss value is represented by a first difference of the first tag and a first label associated with the current word by the training dataset, wherein the second loss value is represented by a second difference of the second tag and a second label associated with the previous word by the training dataset, and wherein the third loss value is represented by a third difference of the third tag and a third label associated with the next word by the training dataset ([0064]-[0070] “the system determines the POS loss from an error between (i) the scores generated by the POS neural network by processing the feature input for the token in the training example and (ii) a set of scores that indicate that the POS tag in the training example is the correct POS tag for the token”); and 
adjusting a parameter of the neural network based on the value of the loss function ([0066]-[0069] updating values of parameters of the dependency parsing neural network and the initial neural network layers, i.e., using a stochastic gradient descent with backpropagation training technique).
Zhang does not explicitly teach, however, Song does explicitly teach:
producing a feature vector representing a current word processed by a neural network, wherein the feature vector includes a set of elements representing grammatical attributes of the current word; producing, based on the feature vector, a first tag associated with the current word ([0002][0003] “tagging tokens with Part-of-Speech (POS) tags to tagging tokens with supertags encoding complex grammatical relations …The task of POS tagging is defined as providing categories for each token with grammatical tags according to their syntactic function in a sentence. … State-of-the-art POS tagging systems can include those based on Maximum Entropy models and deep learning architectures, such as LSTM”).
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the method of generating dependency parses of text segments using neural networks as taught by Zhang with the method for automatic semantic token tagging using a threshold and grammatical features by a neural network as taught by Song to provide a solution to a label bias problem, that the labels in a given state only compete against other labels reachable from that state (Song, [0042]).
Regarding claim 12, Zhang in view of Song discloses the method of claim 11, and Zhang further discloses: performing, using the neural network, a natural language processing task ([0038]-[0040] natural language processing using neural network).
Regarding claim 13, Zhang in view of Song discloses the method of claim 11.
Zhang does not explicitly teach, however, Song does explicitly teach:
repeating the determining, computing, and adjusted operations until the value of the loss function falls below a predetermined threshold ([0029][0031][0032] calculating all distinct scores, and identify sequence variations for which the probability score from the language model is more than a specified threshold).
Regarding claim 16, Zhang discloses a computer-readable non-transitory storage medium comprising executable instructions that, when executed by a computer system, cause the computer system to:
receive a training dataset comprising a plurality of labeled natural language words (Fig. 4 and [0054]-[0063] obtaining first data that includes training text sequences and a respective POS tag for each token in the training text sequence);
determine, by the neural network, a second tag associated with a previous word which has been processed by the neural network before processing the current word, and a third tag associated with a next word to be processed by the neural network after processing the current word (Fig. 4 and [0054]-[0063] obtaining second “data that includes training text sequences and, for each training text sequence, a set of parser states that defines a dependency parse of the training text sequence”); 
compute, for the training dataset, a value of a loss function reflecting a first loss value, a second loss value, and a third loss value, wherein the first loss value is represented by a first difference of the first tag and a first label associated with the current word by the training dataset, wherein the second loss value is represented by a second difference of the second tag and a second label associated with the previous word by the training dataset, and wherein the third loss value is represented by a third difference of the third tag and a third label associated with the next word by the training dataset ([0064]-[0070] “the system determines the POS loss from an error between (i) the scores generated by the POS neural network by processing the feature input for the token in the training example and (ii) a set of scores that indicate that the POS tag in the training example is the correct POS tag for the token”); and
adjust a parameter of the neural network based on the value of the loss function ([0066]-[0069] updating values of parameters of the dependency parsing neural network and the initial neural network layers, i.e., using a stochastic gradient descent with backpropagation training technique).
Zhang does not explicitly teach, however, Song does explicitly teach:
produce a feature vector representing a current word processed by a neural network, wherein the feature vector includes a set of elements representing grammatical attributes of the current word; produce, based on the feature vector, a first tag associated with the current word ([0002][0003] “tagging tokens with Part-of-Speech (POS) tags to tagging tokens with supertags encoding complex grammatical relations …The task of POS tagging is defined as providing categories for each token with grammatical tags according to their syntactic function in a sentence. … State-of-the-art POS tagging systems can include those based on Maximum Entropy models and deep learning architectures, such as LSTM”).
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the method of generating dependency parses of text segments using neural networks as taught by Zhang with the method for automatic semantic token tagging using a threshold and grammatical features by a neural network as taught by Song to provide a solution to a label bias problem, that the labels in a given state only compete against other labels reachable from that state (Song, [0042]).
Regarding claims 17-18, Claims 17-18 are the corresponding medium claims to method claims 12-13. Therefore, claims 17-18 are rejected using the same rationale as applied to claims 12-13 above.
Claims 8, 10, 15, and 20 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Zhang et al., (US Pub. 2019/0073351) in view of in view of Song et al., (US Pub. 20180225281) and further in view of Hashimoto et al., (US Pub. 20180121799).
Regarding claim 8, Zhang in view of Song in view of Hashimoto discloses the method of claim 1. Zhang does not explicitly teach, however, Hashimoto does explicitly teach:
wherein the value of the loss function is represented by a linear combination of the first loss value, the second loss value, and the third loss value ([0093] calculating a weighted sum 620 of the linear transformations of the forward and backward state vectors).
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the method of generating dependency parses of text segments using neural networks as taught by Zhang in view of Song with the method of adapting bidirectional LSTM as taught by Hashimoto to reduce overfitting of training a model and to improve processing NLP tasks (Hashimoto, [0072][0073]).
Regarding claim 10, Zhang in view of Song in view of Hashimoto discloses the method of claim 1. Zhang does not explicitly teach, however, Hashimoto does explicitly teach:
wherein the neural network comprises a feature extraction layer, a bi-directional long-short term memory (BiLSTM) layer, and a prediction layer, and wherein the BiLSTM layer further comprises a BiLSTM, a backward LSTM and a forward LSTM, such that a first output of the backward LSTM and a second output of the forward LSTM is fed to the BiLSTM ([0090][0097][0098][0108] producing parent label probability mass vectors by classification and exponential normalization of parent label state vectors  produced by the bi-directional LSTM;  the dependency labels are predicted to another exponential normalizer e.g.,  softmax  with a single ReLU layer; a feature extraction module calculates element-wise differences between the sentence-level relatedness vectors).
Regarding claim 15, Zhang in view of Song in view of Hashimoto discloses the method of claim 11. Zhang does not explicitly teach, however, Hashimoto does explicitly teach:
wherein the neural network comprises a feature extraction layer, a bi-directional long-short term memory (BiLSTM) layer, and a prediction layer, and wherein the BiLSTM layer further comprises a BiLSTM, a backward LSTM and a forward LSTM, such that a first output of the backward LSTM and a second output of the forward LSTM is fed to the BiLSTM ([0090][0097][0098][0108] producing parent label probability mass vectors by classification and exponential normalization of parent label state vectors  produced by the bi-directional LSTM;  the dependency labels are predicted to another exponential normalizer e.g.,  softmax  with a single ReLU layer; a feature extraction module calculates element-wise differences between the sentence-level relatedness vectors).
Regarding claim 20, Claim 20 is the corresponding medium claims to method claim 15. Therefore, claim 20 is rejected using the same rationale as applied to claim 15 above.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see attached form PTO-892.
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEONG-AH A. SHIN whose telephone number is (571)272-5933. The examiner can normally be reached 9 AM-3PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Seong-ah A. Shin
Primary Examiner
Art Unit 2659



/SEONG-AH A SHIN/Primary Examiner, Art Unit 2659