DETAILED ACTION
This action is in response to claims filed 03 March, 2021 for application 15/653,007 filed 18 July, 2017. Currently claims 1-20 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 1-3, 5-8, 10-13, and 15 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Kurata et al. (“Improved Neural Network-based Multi-label Classification with Better Initialization Leveraging Label Co-occurrence”) in view of Peters et al. (Iterative Multi-label Multi-relational Classification Algorithm for complex social networks).

Regarding claims 1, 6, and 11, Kurata discloses: A computer-implemented method for training a natural language-based classifier, the method comprising: 
obtaining a query and a … label represented by a binary vector, each of a plurality of elements of the binary vector being associated with at least one instance from a plurality of instances, the … label indicating that the query is classified into a specific instance from the plurality of instances by a value set to a specific element associated with the specific instance (“Let x denote the feature vector of a query, y be the vector representation of the label, o be the output value of the NN, and  be the parameters of the NN. Note that the representation of y differs depending on the loss function. For simplicity in the following explanation, assume that we have a finite set of labels λ = {λ1, λ2, λ3, λ4, λ5} and that a query x has multiple labels {λ1, λ4}:” p522 ¶2, note: the vector of the query is binary as labels are present or not present, see §2.Binary Cross Entropy); 
estimating relationships between the specific instance and instances other than the specific instance from the plurality of instances (“We propose an NN initialization method to treat some of the neurons in the final hidden layer as dedicated neurons for each pattern of label co-occurrence. These dedicated neurons simultaneously activate the co-occurring labels. Figure 2 shows the key idea of the proposed method. We first investigate the training data and list up patterns of label co-occurrence. Then, for each pattern of label co-occurrence, we initialize a matrix row so that the columns corresponding to the co-occurring labels have a constant weight C and the other columns have a weight of 0, as shown in Figure 2 (above). Note that the remaining rows that are not associated with the pattern of label co-occurrence are randomly initialized. This initialization is equivalent to treating some of the neurons in the final hidden layer as dedicated neurons for each pattern of label co-occurrence, where the dedicated neurons have connections to the corresponding co-occurring labels with an initialized weight C and to others with an initialized weight of 0, as shown in Figure 2 (below). Finally, we conduct normal back-propagation using one of the loss functions, as discussed in the previous section. Note that all the connection weights in the NN including the connection weights between the dedicated neurons and all labels are updated through back-propagation.” P522 §3.1 ¶1); 
generating a … label represented by a continuous-valued vector from the first … label by distributing the value set to the specific element to elements other than the specific element from the plurality of elements according to the relationships (“This initialization is equivalent to treating some of the neurons in the final hidden layer as dedicated neurons for each pattern of label co-occurrence, where the dedicated neurons have connections to the corresponding co-occurring labels with an initialized weight C and to others with an initialized weight of 0, as shown in Figure 2 (below).” P523 ¶1, “For the weight value C for initialization, we used the upper bound UB of the normalized initialization (Glorot and Bengio, 2010), which is determined by the number of units in the final hidden layer nh and output layer nc as UB =.  Additionally, we changed this value in accordance with the frequency of the label co-occurrence patterns in the training data. The background idea is that the patterns of label co-occurrence that appear frequently (i.e., the number of queries with this pattern of label co-occurrence is large) are more important than less frequent patterns. Assuming that a specific pattern of label co-occurrence appears in the training data f times, we try f×UB and √f×UB for initialization to emphasize this pattern.” P523 §3.2 ¶1); and 
training the natural language-based classifier using the query and … second label (“This initialization is equivalent to treating some of the neurons in the final hidden layer as dedicated neurons for each pattern of label co-occurrence, where the dedicated neurons have connections to the corresponding co-occurring labels with an initialized weight C and to others with an initialized weight of 0, as shown in Figure 2 (below). Finally, we conduct normal back-propagation using one of the loss functions, as discussed in the previous section. Note that all the connection weights in the NN including the connection weights between the dedicated neurons and all labels are updated through back-propagation.” P523 ¶1).

However, Kurata does not explicitly disclose a document label and a relation label.
(“We propose a new learning algorithm (IMMCA) for solving the problem of multi-label, multi-relational graph classification. This model uses both the content information and the different types of relations among data, and may learn a distinct propagation model for each type of relations and labels.” P18 left column ¶3, “We evaluate the model on several datasets for two different tasks: an image annotation task and a document classification one in both multi-label and single label settings.” P18 left column ¶5, “We also consider that nodes are connected through different types of relations. These relations can correspond, for instance, to a friendship relation between users, an authorship relation between images or documents, or similarities between two nodes.” P20 ¶1, “Let us now move to the propagation schemes. Depending on Φ; the model will take into account different information. The two propagation schemes introduced below correspond to two different Φ representations. For each propagation scheme, we will further consider two models, one of which only propagates labels and does not take into account the node content, and one of which propagates labels but also takes as argument the content of the node. Altogether, this gives us four different models denoted ΦLPS and ΦGPS for the two propagation schemes operating only on relations and ΦcLPS  and ΦcGPS for the corresponding propagation schemes using the content information.” P22 §4.3 ¶3).

Kurata and Peters are both in the same field of endeavor of classification and are analogous. Kurata teaches an exemplary classification method and Peters teaches a 

Regarding claims 2, 7, and 12, Kurata discloses: The method of claim 1, wherein the relationships are similarities (“This initialization is equivalent to treating some of the neurons in the final hidden layer as dedicated neurons for each pattern of label co-occurrence, where the dedicated neurons have connections to the corresponding co-occurring labels with an initialized weight C and to others with an initialized weight of 0, as shown in Figure 2 (below).” P523 ¶1, note: patterns of label co-occurrence are interpreted as similarity relationships).
 
Regarding claims 3, 8, and 13, Kurata discloses: The method of claim 1, wherein training includes training the natural language- based classifier using the … label (“To validate our proposed method, we focus on multi-label Natural Language Query (NLQ) classification in a document retrieval system in which users input queries in natural language and the system returns documents that contain answers to the queries.” P521 ¶1 ¶2, see also P522 §3.1 ¶1).

“We propose a new learning algorithm (IMMCA) for solving the problem of multi-label, multi-relational graph classification. This model uses both the content information and the different types of relations among data, and may learn a distinct propagation model for each type of relations and labels.” P18 left column ¶3, “We evaluate the model on several datasets for two different tasks: an image annotation task and a document classification one in both multi-label and single label settings.” P18 left column ¶5).

Regarding claims 5, 10, and 15, Kurata discloses: The method of claim 3, wherein: 
training includes training the natural language-based classifier using one loss function (“As Nam et al. (2014) indicated, minimizing binary cross entropy is superior for handling multi-labels. By representing the target labels as y = (1, 0, 0, 1, 0), the binary cross entropy loss for the training example (x, y) becomes l(, (x, y)) = −Σ5k=1(yk log(ok) + (1 −yk) log(1 − ok)), where sigmoid activation is used in the output layer.” P522 §2.Binary Cross Entropy); and 
the one loss function is cross-entropy based on the … label and the … label P522 §2.Binary Cross Entropy.

Peters teaches a document label and a relation label (“We propose a new learning algorithm (IMMCA) for solving the problem of multi-label, multi-relational graph classification. This model uses both the content information and the different types of relations among data, and may learn a distinct propagation model for each type of relations and labels.” P18 left column ¶3, “We evaluate the model on several datasets for two different tasks: an image annotation task and a document classification one in both multi-label and single label settings.” P18 left column ¶5, “We also consider that nodes are connected through different types of relations. These relations can correspond, for instance, to a friendship relation between users, an authorship relation between images or documents, or similarities between two nodes.” P20 ¶1, “Let us now move to the propagation schemes. Depending on Φ; the model will take into account different information. The two propagation schemes introduced below correspond to two different Φ representations. For each propagation scheme, we will further consider two models, one of which only propagates labels and does not take into account the node content, and one of which propagates labels but also takes as argument the content of the node. Altogether, this gives us four different models denoted ΦLPS and ΦGPS for the two propagation schemes operating only on relations and ΦcLPS  and ΦcGPS for the corresponding propagation schemes using the content information.” P22 §4.3 ¶3).

Claims 4, 9, and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kurata in view of Peters and further in view of Kumar et al. (Ask Me Anything: Dynamic Memory Networks for Natural Language Processing).


training includes training the natural language-based classifier using two loss functions; and 
the two loss functions are a loss function which is cross-entropy based on the first label, and a loss function which is cross-entropy based on the second label.

Peters teaches a document label and a relation label (“We propose a new learning algorithm (IMMCA) for solving the problem of multi-label, multi-relational graph classification. This model uses both the content information and the different types of relations among data, and may learn a distinct propagation model for each type of relations and labels.” P18 left column ¶3, “We evaluate the model on several datasets for two different tasks: an image annotation task and a document classification one in both multi-label and single label settings.” P18 left column ¶5, “We also consider that nodes are connected through different types of relations. These relations can correspond, for instance, to a friendship relation between users, an authorship relation between images or documents, or similarities between two nodes.” P20 ¶1, “Let us now move to the propagation schemes. Depending on Φ; the model will take into account different information. The two propagation schemes introduced below correspond to two different Φ representations. For each propagation scheme, we will further consider two models, one of which only propagates labels and does not take into account the node content, and one of which propagates labels but also takes as argument the content of the node. Altogether, this gives us four different models denoted ΦLPS and ΦGPS for the two propagation schemes operating only on relations and ΦcLPS  and ΦcGPS for the corresponding propagation schemes using the content information.” P22 §4.3 ¶3).

Kumar teaches: training includes training the natural language-based classifier using two loss functions (“Training on the bAbI dataset uses the following objective function: J = αECE(Gates) +  βECE(Answers), where ECE is the standard cross-entropy cost and α and  β are hyperparameters. In practice, we begin training with α set to 1 and β set to 0, and then later switch β to 1 while keeping α at 1.” P6 §4.1 ¶2); and 
the two loss functions are a loss function which is cross-entropy based on the … label, and a loss function which is cross-entropy based on the … label (“Training on the bAbI dataset uses the following objective function: J = αECE(Gates) +  βECE(Answers), where ECE is the standard cross-entropy cost and α and  β are hyperparameters. In practice, we begin training with α set to 1 and β set to 0, and then later switch β to 1 while keeping α at 1.” P6 §4.1 ¶2).

Kurata, Peters and Kumar are in the same field of endeavor of training query and retrieval systems using label based trained algorithms and are analogous. Kurata teaches the use of binary cross entropy for labels. Kumar explicitly teaches using cross entropy for both questions and answer labels. It would have been obvious to one of .

Claim(s) 16-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Kurata in view of Peters and further in view of Carus et al. (US 2016/0350283).

Regarding claims 16, 18 and 20, Kurata does not explicitly disclose: The method of claim 2, wherein the similarities include a cosine similarity between two documents among the plurality of documents.

Carus teaches: wherein the similarities include a cosine similarity between two documents among the plurality of documents  (“Some embodiments include determining string similarity as the weighted substring similarity of the at least two texts, determining a plurality of sizes of substring, determining substring weighting with tf-idf and determining substring similarity with cosine distance.” [0036])

	Kurata, Peters and Carus are all in the same field of endeavor of classifying information and are analogous. Kurata and Peters teach multilabel multiclass systems. Carus teaches exemplary similarity measurements that can be adapted to any system. Carus specifically teaches a cosine similarity. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the classification system 

Regarding claims 17 and 19, Kurata does not explicitly disclose: The method of claim 16, wherein the similarities are based on a number of words commonly appearing in the two documents.

Carus teaches: wherein the similarities are based on a number of words commonly appearing in the two documents. (“Some embodiments include determining string similarity as the weighted substring similarity of the at least two texts, determining a plurality of sizes of substring, determining substring weighting with tf-idf and determining substring similarity with cosine distance.” [0036], note: tf-idf is a measure of the frequency of words multiplied by the inverse of the words appearing in other documents).

	Kurata, Peters and Carus are all in the same field of endeavor of classifying information and are analogous. Kurata and Peters teach multilabel multiclass systems. Carus teaches exemplary similarity measurements that can be adapted to any system. Carus specifically teaches a word frequency measure. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the classification system of Kurata and Peters with the well-understood tf-idf to yield predictable results.
Response to Arguments








Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.


Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC NILSSON whose telephone number is (571)272-5246.  The examiner can normally be reached on M-F: 9-5.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571)-272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ERIC NILSSON/           Primary Examiner, Art Unit 2122