DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The disclosure is objected to because of the following informalities:
In ¶[0015], “using multiple threads” should be “uses multiple threads”.
In ¶[0023], “including code” should be “includes code”.
	In ¶[0027], there is a statement about color drawings, but there do not appear to be any color drawings in this application, so that this statement should be deleted.
In ¶[0046], “Token embedding” should be “token embedding”.
In ¶[0069], there is symbol between ┐(tsijk) and bm, but it is not clear that this symbol is correct.  This appears to be a symbol indicating that ┐(tsijk) ‘is an element of’ bm, but it may be more correct to indicate that ┐(tsijk) ‘is not an element of’ bm.
In ¶[0086], there is an unmatched left parenthesis at “(DEPS”.
Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 


An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office Action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office Action.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have 

Claims 1 to 3, 9, 11, and 15 to 21 are rejected under 35 U.S.C. 103 as being unpatentable over Chatterjee et al. (U.S. Patent Publication 2019/0228073) in view of Liu et al. (“Implicit Disclosure Relation Classification via Multi-Task Neural Networks”).
Concerning independent claims 1 and 20 to 21, Chatterjee et al. discloses a method and system for word embedding of natural language input with dependency labels, comprising:
“one or more processors, and memory including code that, when executed by the one or more processors execute functions including” – an exemplary computer system 400 for implementing embodiments includes a central processing unit 402 disposed in communication with memory 405 including computer-readable storage media utilized in implementing embodiments (¶[0107] - ¶[0115]: Figure 4);
“(a) initializing a list of token embeddings, each of the token embeddings corresponding to a tokenized word from text in a corpus of text” – word embedding representations 211 provide semantic and syntactic significance of each word (¶[0031]: Figure 2); ‘Word2vec’ or ‘GloVe’ may be used for retrieving word embedding representations 211 (¶[0053]: Figure 2); method 300 includes retrieving by a natural language processing system 103, a word embedding representation for each word of one or more words in natural language input 101 from knowledge repository 105 (“a corpus of text”) (¶[0101]: Figures 1 and 3: Step 301); here, each word is a ‘token’, so word embeddings are “token embeddings corresponding to a tokenized word from text in a corpus of text”;

Gs) for a group of consecutive words s from the text 
[, said graph including binary relations between pairs of tokenized words of the group of consecutive words]” – dependency parsing module 225 may be used for generating dependency labels 215 for each word based on a dependency parser tree for the natural language input 101; here, a dependency parser tree generated for a given input sentence is illustrated with dependency labels (¶[0055] - ¶[0066]): Figure 2); these labels are illustrated as being “for a group of consecutive words”, i.e., nsubj includes consecutive words ‘saw’ and ‘I’, amod includes consecutive words ‘elephant’ and ‘white’, nmod:poss includes consecutive words ‘dream’ and ‘my’; method 300 includes generating a dependency label for each word based on a dependency parser tree for the natural language input 101 (¶[0103]: Figure 3: Step 305); here, a dependency parser tree is “a graph (Gs)” (Compare Specification, ¶[0006], and Claim 2);
“(e) computing a loss [using the computed tensor]; (f) optimizing the list of token embeddings using the computed loss; and (g) repeating (b)-(f) until the computed loss is within a predetermined range” – parameters of an artificial neural network classification are trained using a loss function as a categorical cross-entropy; Stochastic Gradient Descent may be used for gradient learning (¶[0091]); implicitly, training a neural network with a loss function is an iterative procedure for optimizing a loss function until a convergence criteria is met (“optimizing . . . using the computed loss . . . until the computed loss is within a predetermined range”).
Concerning independent claims 1 and 20 to 21, Chatterjee et al. is directed to using word embeddings and a graph as a dependency tree to identify places of interest, but does not expressly disclose the limitations of the graph “including binary relations” Chatterjee et al. might be broadly construed to disclose “binary relations” because each of the dependency labels indicate a relation between two words.  (¶[0058] - ¶[0066])  Moreover, Chatterjee et al. does not expressly disclose computing “a tensor of binary relations as a product between a matrix and a tensor representing discourse relations”, but a “tensor” might be considered the same thing a matrix.  Here, Chatterjee et al. discloses that training a neural network includes a word embedding representing of a target word, Wi, and a word embedding of a ‘head’ word in a dependency parse tree, Wh, where a feedforward vector F is represented with a transpose operator T.  Feedforward vector F is analogous to “a tensor”, where a transpose operator T could be understood to imply that F is a matrix or tensor.  However, even if these elements are omitted by Chatterjee et al., they are taught by Liu et al. 
Concerning independent claims 1 and 20 to 21, Liu et al. teaches implicit discourse relation classification via multi-task neural networks.  A main task is to classify implicit PDTB (Penn Discourse Tree Bank) relations.  (Multi-Task Neural Network for Discourse Parsing: Page 2: Right Column)  Each word w is associated with a vector representation xw, which is pre-trained with large unlabeled corpora.  An argument is a sequence of these word vectors, so that an argument pair can be represented as Arg1: x11, x21, . . ., xm1] and Arg2: [x12, x22, . . ., xm2].  (CNNs: Modeling Argument Pairs: Page 2: Right Column to Page 3, Left Column)  To capture a relation between two arguments, h words are taken from each argument, their vectors are concatenated, and a convolution operation is applied on this window pair (“selecting the token embeddings representing the words of the group of consecutive words from the list of token embeddings”).  A feature map c is produced, which is a two-dimensional matrix, and a fixed size matrix, p, is generated to capture the most salient features in c.  With multiple filters, argument pairs can be modeled as a three-dimensional tensor (“computing a tensor of binary relations as a product between a matrix of the selected token embeddings and a tensor representing discourse relations”).  (CNNs: Modeling Argument Pairs: Page 3: Left Column to Right Column)  A ground-truth label vector gt is a binary vector, where if it belongs to a class, one of the i-th dimensions of gt[i] is 1, and other dimensions are set to 0 (“the computed tensor representing the binary relations between the pair of tokenized words”).  Cross-entropy loss is adopted as an optimization function (“optimizing the list of token embeddings using the computed loss”).  Using mini-batch stochastic gradient descent (SGD) to train parameters, one task is selected in each epoch and the model is updated according to a specific task objective.  (Model Training: Page 7, Left and Right Columns)  A main goal is to conduct implicit discourse relation classification of the four top-level implicit discourse relations of Comparison, Contingency, Expansion, and Temporal.  (Experiments: Page 4, Right Column)  Here, a task is determining “binary relations” of Arg1 and Arg2 of the four implicit discourse relations, i.e., a discourse relation is classified as “1” if one of the discourse relations is satisfied, and as “0” if that discourse relation is not satisfied Chatterjee et al. to perform implicit discourse relation classification of Liu et al. to improve classification performance on discourse relations.

Concerning claim 2, Chatterjee et al. discloses dependency parsing module 225 may be used for generating dependency labels 215 for each word based on a dependency parser tree for the natural language input 101; here, dependency parser tree generated for a given input sentence is illustrated with dependency labels (¶[0055] - ¶[0066]): Figure 2); method 300 includes generated a dependency label for each word based on a dependency parser tree for the natural language input 101 (¶[0103]: Figure 3: Step 305); a dependency parser tree is “a graph”.
Concerning claim 3, Chatterjee et al. discloses that word embedding representations derive semantic and syntactic significance of each word with respect to context in natural language input 101.  (¶[0031]; ¶[0101]: Figure 3: Step 301)
Concerning claim 9, Liu et al. teaches that an argument is a sequence of word vectors, so that an argument pair can be represented as Arg1: [x11, x21, . . ., xm1] and Arg2: [x12, x22, . . ., xm2].  (CNNs: Modeling Argument Pairs: Page 2: Right Column to Page 3, Left Column)  Cross-entropy loss is adopted as an optimization function.  Using mini-batch stochastic gradient descent (SGD) to train parameters, one task is selected in each epoch and the model is updated according to a specific task objective.  (Model x11, x21, . . ., xm1] and [x12, x22, . . ., xm2] (“using the token embeddings, performing implicit discourse relation classification on pairs of groups of words extracted from an input text”). 
Concerning claim 11, Liu et al. teaches that one task is selected in each epoch of training.  (Model Training: Page 4: Right Column)  Four tasks are separately trained, and learning rates can be set at λ=0.004 and λe=0.001.  (Model Configuration: Page 5: Left Column)  Here, a learning rate of λe=0.001= 1x10-3 is “in a range between 10-3 and 10-5”, where λe is a learning rate of word embeddings.  There are “multiple threads” corresponding to each of the four tasks that are separately trained in various epochs. 
Concerning claims 15 to 16, Chatterjee et al. discloses that ‘Word2vec’ or ‘GloVe’ may be used for retrieving word embedding representations 211. (¶[0053]: Figure 2)
Concerning claim 17, Chatterjee et al. discloses performing embeddings on “words”, which are “elements of text”.  (Abstract)  Similarly, Liu et al. teaches vector representations of sentences and their constituent word vectors.  (Introduction: Page 1, Right Column; CNNs: Modeling Argument Pairs: Page 2: Right Column to Page 3, Left Column)
Concerning claim 18, Liu et al. teaches that argument pairs can be modeled as a three-dimensional tensor (“wherein the tensor of binary relations includes a rank-3 p in Rnp’xnpxnf, for implicit discourse classification. (CNNs: Modeling Argument Pairs: Page 3: Right Column)  
Concerning claim 19, Chatterjee et al. discloses a computer-readable storage medium for implementing embodiments.  (¶[0115]: Figure 4)

Claims 4, 6, and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Chatterjee et al. (U.S. Patent Publication 2019/0228073) in view of Liu et al. (“Implicit Discourse Relation Classification via Multi-Task Neural Networks”) as applied to claim 1 above, and further in view of Ji et al. (“One Vector is Not Enough: Entity-Augmented Distributed Semantics for Discourse Relations”).
Liu et al. teaches implicit discourse relation classification, but does not include a ranking loss Tsloss or a regularization term Rsloss.  However, Ji et al. teaches a similar way of predicting discourse relations, where an objective function is a regularized hinge loss, L(θ), with a regularization term λ||θ||22, so that parameters are penalized by λ.  Here, λ||θ||22 is equivalent to “a regularization term Rsloss”.  (4. Large-margin learning framework: Page 332: Right Column )  Applicants’ Specification, ¶[0009], provides an equation for a ranking loss Tsloss, which is very similar to a first term of a loss function in Equation (6) of Ji et al.  That is, Applicants’ ranking loss Tsloss = Σ max (0, γ + {ei’s, Rk’, ej’s} - {eis, Rk, ejs}), and Ji et al. discloses that L(θ) = Σ max (0, 1 - ψ{y*} - ψ{y’}) + λ||θ||22.  Comparing these equations, Σy’:y’≠y* max (0, 1 - ψ{y*} - ψ{y’}) is a ranking loss Tsloss.  That is, Applicants define a loss function as including a ranking loss Tsloss plus a regularization term Rsloss, so that a ranking loss Tsloss includes everything with the exception of the regularization loss in Equation (6) of Ji et al.  An objective is to use a Ji et al. to perform implicit discourse classification in Liu et al. for a purpose of obtaining substantial improvements by using a large margin objective.

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Chatterjee et al. (U.S. Patent Publication 2019/0228073) in view of Liu et al. (“Implicit Discourse Relation Classification via Multi-Task Neural Networks”) as applied to claim 1 above, and further in view of Kingma et al. (“Adam: A Method for Stochastic Optimization”).
Liu et al. teaches token embeddings and a loss function, but omits using an Adam optimizer.  However, an Adam optimizer appears to be fairly well known in the prior art.  Specifically, Kingma et al. teaches a method for stochastic optimization using Adam, which is an algorithm for first-order gradient optimization of a stochastic objective function based on adaptive estimates of lower-order moments.  Advantages are that Adam is straightforward to implement, is computationally efficient, has little memory requirement, and is well suited to problems that are large in terms of data and/or parameters.  (Abstract)  It would have been obvious to one having ordinary skill in the art to use an Adam optimizer as taught by Kingma et al. in a loss function of Liu et al. to provide a method for stochastic optimization that compares favorably to other stochastic optimization methods.

s 13 to 14 are rejected under 35 U.S.C. 103 as being unpatentable over Chatterjee et al. (U.S. Patent Publication 2019/0228073) in view of Liu et al. (“Implicit Discourse Relation Classification via Multi-Task Neural Networks”) as applied to claim 1 above, and further in view of Carreras et al. (U.S. Patent Publication 2018/0260381).
Liu et al. teaches that argument pairs can be modeled as a three-dimensional tensor, which is flattened to a vector p in Rnp’xnpxnf, for implicit discourse classification. (CNNs: Modeling Argument Pairs: Page 3: Right Column)  However, Liu et al. does not expressly teach that a tensor is “a predetermined tensor” or “a learned tensor”.  Still, one might construe any tensor as “a predetermined tensor”.  Anyway, Carreras et al. teaches using word embeddings to output a score as a function of a tensor product of a word embedding of a candidate head, a product of word embeddings of prepositions, and a matrix of learned parameters.  (¶[0010] and ¶[0012])  An exemplary model is an unfolding of a 3-D tensor 44 that is generated from a training set 46.  (¶[0024]: Figure 2)  An unfolding of the 3D tensor is subsequently employed in a scoring function 62.  (¶[0030]: Figure 2)  Here, f is a function that scores candidate attachment h for ‘head’ and p for a preposition that is in a syntactic dependency.  A suitable definition of f is based on tensor products of word embeddings, represented as f(h,p,m) = vhTW[vp ʘvm}, where vh, vp , and vm are word embeddings of h, p, and m, vT represents a transpose of v, and matrix W has a parameter value for combination of h, p, and m.  W is referred to as a matrix, but it is essentially a flattened 3D tensor.  The tensor product forms a cube.  Learning the parameters W can include optimizing a loss function with a low rank regularization.  (¶[0067] - ¶[0078])  Carreras et al., then, teaches that a tensor can be learned (“wherein the tensor representing discourse relations is a learned tensor”).  Carreras et al.’s tensor is predetermined by learning.  An objective is to provide an attachment resolution based on tensor products of word vectors to improve over single compositions based on a sum or concatenation, and that remains computationally manageable due to a compact nature of the word embeddings.  (¶[0097])  It would have been obvious to one having ordinary skill in the art to use learned and predetermined tensors as taught by Carreras et al. to perform implicit discourse classification in Liu et al. for a purpose of improving attachment resolution over single compositions and that remains computationally manageable due to a compact nature of word embeddings.

Allowable Subject Matter
Claims 5, 7, and 10 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicants’ disclosure.
Geng et al., Guo et al., Li et al., and Xu et al. are related prior art non-patent literature.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 272-7608. The examiner can normally be reached Monday-Thursday 8:30 AM-6:00 PM.

Information regarding the status of published or unpublished applications may be obtained from Patent Center.  Unpublished application information in Patent Center is available to registered users.   To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format.  For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARTIN LERNER/Primary Examiner
Art Unit 2657                                                                                                                                                                                                        January 18, 2022