DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Examiner’s Remarks
	The instant action is in response to a restriction/election requirement response received 03/05/2021. In the applicant’s response, the applicant elected Species I (Claims 1-6, 11-16, and 18-20). The applicant has made this election with traverse.
Due to the election made by the applicant, the instant action examines ONLY Species I, the claims which are included in Species II (claims 7-10 and 17) are NOT examined on their merits.
Allowable Subject Matter
Claim 12 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
The examiner refers to the rejection under 35 U.S.C. 112(b) below for more details. 
The following is a statement of reasons for the indication of allowable subject matter:  
	Claim 12 recites: 
selecting at least one pair of messages from the ranked plurality of pairs of messages based on a highest total similarity score comprises: 
receiving a user input identifying one or more pairs of messages from the ranked plurality of pairs of messages; 
the at least one pair of messages based on a combination of the user input and the highest total similarity score; 


As can be seen, Claim 12 requires that the user selects one pairs of messages from the ranked plurality of pairs of messages and that the selected pair of messages is used as an edge in a message graph.
The prior art of record does not teach or fairly suggest that a user selecting a pair messages from a ranked list of messages is used as an edge of a message graph. 
	The best prior art of record is Buch et al (“Approximate String Matching by End-Users using Active Learning”, NPL 2015). Arguably, Buch discloses that string pairs are ranked (see for example Figure 3 String pairs are aligned according to their similarity value) and that the users selects at least one of the string pairs based on this ranking. However, unlike the requirements of Claim 12, this user selection is NOT used to create an edge of a message graph, rather, this user selection is used to adjust thresholds for determining string similarity and the ranking of such strings. 
	Even when in combination with cited prior art of record Nicosia, Mueller, Mehri, and Domeniconi, Claim 12 would still be non-obvious for at least the reasons above. 
	Therefore, Claim 12 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b).

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:



Claims 6 and 12 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Claim 6 recites, at least in part, “retraining the neural network based on the user input”.
Claim 6 ultimately depends on Claim 1 which, in turn, recites “a neural network” and “a second neural network.” However, when Claim 6 uses the term “the neural network” it is unclear which neural network is being referred to and thus renders the claim indefinite. Appropriate correction is required. 
	Claim 12 recites: 
selecting at least one pair of messages from the ranked plurality of pairs of messages based on a highest total similarity score comprises: 
receiving a user input identifying one or more pairs of messages from the ranked plurality of pairs of messages; 
the at least one pair of messages based on a combination of the user input and the highest total similarity score; 
using the selected at least one pair of messages as the at least one of the edges of a message graph

The phrase “a highest total similarity score” and “a message graph” pose antecedent basis issues with the same phrases in Claim 11, which claim 12 depends on. That is, it is unclear if the “a highest total similarity score” and “a message graph” are the same as the same recited features in Claim 11 or are separate and distinct. Appropriate correction is required. 
It appears that Claim 12 should instead recite 
the highest total similarity score comprises: 
receiving a user input identifying one or more pairs of messages from the ranked plurality of pairs of messages; 
the at least one pair of messages based on a combination of the user input and the highest total similarity score; 
using the selected at least one pair of messages as the at least one of the edges of [[a]] the message graph.


Claim Rejections - 35 USC § 103
For clarity of record and ease of reading, the examiner notes the following: 
Any text that is bolded is a limitation of a claim. 
The “teaching” or reference citation, along with any necessary examiner notes are contained within the parentheses “()” following the bolded claim language. 
Any text that is underlined is emphasized language from reference(s) used and/or particular important examiner notes. While NOT fully reflective of the rejection as a whole, these underlined passages are indicative or otherwise reflective of key evidence.   

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the 

Claims 1-2, 13-14, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Nicosia et al. ("Accurate Sentence Matching with Hybrid Siamese Networks", NPL 2017) in view of Mueller et al. ("Siamese Recurrent Architectures for learning sentence similarity", NPL 2016) and further in view of Mehri et al. ("Chat Disentanglement: Identifying Semantic Reply Relationships with Random Forests and Recurrent Neural networks", NPL 2017). 
With respect to Claim 1, Nicosia teaches a method of analyzing unstructured messages, the method comprising: extracting a content feature from an embedding associated with each message from a pair of unstructured messages using a neural network (Pg. 2236 Section 3.1 "The first module of our deep learning model is the sentence encoder. A sentence of length n is a sequence of words…Each word is represented as a vector…The sentence encoder f that’s a sequence of words in input, embeds and transforms them into a fixed-sized vector. The function f is used to encode both sentences in a pair...[Col. 2] Our full hybrid siamese network furtherly models the interaction between the two sentences by feed the output of the siamese encoder to a multi-layer perceptron (MLP). The MLP takes the concatenation of the sentence representations and their distance...in input, and outputs the probability of a match between the two sentences." The examiner notes that the concatenation of the embedding from each sentence (i.e. message) teaches "extracting a content feature from an embedding associated with each message." The examiner notes that as the 
Nicosia also teaches generating, based on extracted content features, a text similarity score between the pair of messages using a second neural network (Pg. 2236 Section 3.1 and 3.2 " The [multilayer perceptron] MLP takes the concatenation of the sentence representations…and output the probability of a match between the two sentences." Section 3.2 equation 4 and equation 5 discusses "logistic loss" and "global loss." The examiner notes that output a probability that the two sentences match teaches "a text similarity score" and because a multi-layer perceptron computes this similarity score, the similarity score is generated by "a second neural network.").
Nicosia, however, does not appear to explicitly disclose: 
combining the generated text similarity score with additional data associated with the messages to generate a total similarity score
generating a message thread based on the generated total similarity score for the pair of messages selected from the plurality of unstructured messages
Mueller, however, does teach combining the generated text similarity score with additional data associated with the messages to generate a total similarity score (As an initial matter, the examiner notes the extreme breadth of the phrase “additional data associated with the messages.” Mueller Pg. 2789 Col. 1 "Due to the simple construction of our similarity function, the predictions of our model are constrained to follow the exp(-x) curse and thus are note suited for these evaluation metrics. After training our model, we apply an additional nonparameteric regression step The examiner notes that using the "g-relatedness score" (i.e. text similarity score) with a "regression" that use the given labels (i.e. additional data associated with the messages) to obtain produce an adjusted final prediction teaches "combining the text similarity score with additional data associated with the messages to generate a total similarity score."). 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the first and second neural network and similarity score as taught by Nicosia modified with the combination of similarity score and additional data as taught by Mueller because this would produce “better-calibrated predictions” thus improving the accuracy (Mueller Pg. 2789)
The combination of Nicosia and Mueller, however, does not appear to explicitly disclose
generating a message thread based on the generated total similarity score for the pair of messages selected from the plurality of unstructured messages
Mehri, however, does teach generating a message thread based on the generated total similarity score for the pair of messages selected from the plurality of unstructured messages (Pg. 618 Section 3.4.2 Thread Partitioning "Ultimately [message i] mi is assigned to the thread which maximizes the probability outputted by the classifier, provided that the best probability output is above a threshold. 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the neural networks and combined total similarity score as taught by the combination of Nicosia and Mueller modified with the generation of a message thread as taught by Mehri because, automatically generating a message thread based on the similarity score of the messages of that thread would result in a more accurate thread, thus improving the users experience (Mehri Pg. 618 3.4.2). 
	With respect to Claim 2, the combination of Nicosia, Mueller, and Mehri teach wherein the generating at least one message thread comprises: detecting a link between each message of the pair of messages based on the generated total similarity score exceeding a link threshold (Mehri Pg. 618 Section 3.4.2 Thread Partitioning "Ultimately [message i] mi is assigned to the thread which maximizes the probability outputted by the classifier, provided that the best probability output is above a threshold. If the best probability is below the threshold, a new thread is created for the message..." The examiner notes that "best probability" teaches "...based on the generated total similarity score" and creating a new thread for the message teaches "generating a message thread..." Pg. 616 Section 3.1 "given two input messages, the reply classifier outputs the likelihood of the first message being a reply to the second. Given a child and a parent message, a feature vector is generated in order to describe Further, from Pg. 618 because the at least one message thread is generated based on a particular message falling below a threshold and/or a particular messaged is assigned a message thread (i.e. exceeding a threshold), the message thread is generated based on a link threshold, as is required.). 
With respect to Claim 13, Nicosia teaches a non-transitory computer readable medium having stored therein a program for making a computer execute a method of analyzing a corpus comprising a plurality of unstructured messages, the method comprising: extracting a content feature from an embedding associated with each message from a pair of unstructured messages using a neural network (Pg. 2236 Section 3.1 "The first module of our deep learning model is the sentence encoder. A sentence of length n is a sequence of words…Each word is represented as a vector…The sentence encoder f that’s a sequence of words in input, embeds and transforms them into a fixed-sized vector. The function f is used to encode both sentences in a pair...[Col. 2] Our full hybrid siamese network furtherly models the interaction between the two sentences by feed the output of the siamese encoder to a multi-layer perceptron (MLP). The MLP takes the concatenation of the sentence representations and their distance...in input, and outputs the probability of a match between the two sentences." The examiner notes that the concatenation of the embedding from each sentence (i.e. message) teaches "extracting a content feature 
Nicosia also teaches generating, based on extracted content features, a text similarity score between the pair of messages using a second neural network (Pg. 2236 Section 3.1 and 3.2 " The [multilayer perceptron] MLP takes the concatenation of the sentence representations…and output the probability of a match between the two sentences." Section 3.2 equation 4 and equation 5 discusses "logistic loss" and "global loss." The examiner notes that output a probability that the two sentences match teaches "a text similarity score" and because a multi-layer perceptron computes this similarity score, the similarity score is generated by "a second neural network.").
Nicosia, however, does not appear to explicitly disclose: 
combining the generated text similarity score with additional data associated with the messages to generate a total similarity score
generating a message thread based on the generated total similarity score for the pair of messages selected from the plurality of unstructured messages
Mueller, however, does teach combining the generated text similarity score with additional data associated with the messages to generate a total similarity score (As an initial matter, the examiner notes the extreme breadth of the phrase “additional data associated with the messages.” Mueller Pg. 2789 Col. 1 "Due to the simple construction of our similarity function, the predictions of our model are constrained to follow the exp(-x) curse and thus are note suited for these evaluation The examiner notes that using the "g-relatedness score" (i.e. text similarity score) with a "regression" that use the given labels (i.e. additional data associated with the messages) to obtain produce an adjusted final prediction teaches "combining the text similarity score with additional data associated with the messages to generate a total similarity score."). 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the first and second neural network and similarity score as taught by Nicosia modified with the combination of similarity score and additional data as taught by Mueller because this would produce “better-calibrated predictions” thus improving the accuracy (Mueller Pg. 2789)
The combination of Nicosia and Mueller, however, does not appear to explicitly disclose
generating a message thread based on the generated total similarity score for the pair of messages selected from the plurality of unstructured messages
Mehri, however, does teach generating a message thread based on the generated total similarity score for the pair of messages selected from the plurality of unstructured messages (Pg. 618 Section 3.4.2 Thread Partitioning "Ultimately [message i] mi is assigned to the thread which maximizes the probability 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the neural networks and combined total similarity score as taught by the combination of Nicosia and Mueller modified with the generation of a message thread as taught by Mehri because, automatically generating a message thread based on the similarity score of the messages of that thread would result in a more accurate thread, thus improving the users experience (Mehri Pg. 618 3.4.2). 
	With respect to Claim 14, the combination of Nicosia, Mueller, and Mehri teach wherein the generating at least one message thread comprises: detecting a link between each message of the pair of messages based on the generated total similarity score exceeding a link threshold (Mehri Pg. 618 Section 3.4.2 Thread Partitioning "Ultimately [message i] mi is assigned to the thread which maximizes the probability outputted by the classifier, provided that the best probability output is above a threshold. If the best probability is below the threshold, a new thread is created for the message..." The examiner notes that "best probability" teaches "...based on the generated total similarity score" and creating a new thread for the message teaches "generating a message thread..." Pg. 616 Section 3.1 "given two input messages, the reply classifier outputs the likelihood of the first message being a reply to the second. Further, from Pg. 618 because the at least one message thread is generated based on a particular message falling below a threshold and/or a particular messaged is assigned a message thread (i.e. exceeding a threshold), the message thread is generated based on a link threshold, as is required.). 

With respect to Claim 19, Nicosia teaches A computer apparatus configured to analyze a corpus comprising a plurality of unstructured messages, the computer apparatus comprising:
A memory storing a plurality of unstructured messages and an embedding associated with each of the plurality of unstructured messages; and a processor executing a process comprising (Nicosia Pg. 2237 Section 4.3 “The Quora dataset contains pairs of questions from the Quora website…for the evaluation, we use the dataset splits and word embeddings…”). 
 extracting a content feature from an embedding associated with each message from a pair of unstructured messages using a neural network (Pg. 2236 Section 3.1 "The first module of our deep learning model is the sentence encoder. A sentence of length n is a sequence of words…Each word is represented as a vector…The sentence encoder f that’s a sequence of words in input, embeds and transforms them into a fixed-sized vector. The function f is used to encode both sentences in a pair...[Col. 2] Our full hybrid siamese network furtherly models the interaction between the two sentences by feed the output of the siamese encoder to a multi-layer perceptron (MLP). The MLP takes the concatenation of the sentence representations and their distance...in input, and outputs the probability of a match between the two sentences." The examiner notes that the concatenation of the embedding from each sentence (i.e. message) teaches "extracting a content feature from an embedding associated with each message." The examiner notes that as the title suggests this embedding and extracting process is done through a siamese network (i.e. two neural networks).). 
Nicosia also teaches generating, based on extracted content features, a text similarity score between the pair of messages using a second neural network (Pg. 2236 Section 3.1 and 3.2 " The [multilayer perceptron] MLP takes the concatenation of the sentence representations…and output the probability of a match between the two sentences." Section 3.2 equation 4 and equation 5 discusses "logistic loss" and "global loss." The examiner notes that output a probability that the two sentences match teaches "a text similarity score" and because a multi-layer perceptron computes this similarity score, the similarity score is generated by "a second neural network.").
Nicosia, however, does not appear to explicitly disclose: 
combining the generated text similarity score with additional data associated with the messages to generate a total similarity score
generating a message thread based on the generated total similarity score for the pair of messages selected from the plurality of unstructured messages
Mueller, however, does teach combining the generated text similarity score with additional data associated with the messages to generate a total similarity score (As an initial matter, the examiner notes the extreme breadth of the phrase “additional data associated with the messages.” Mueller Pg. 2789 Col. 1 "Due to the simple construction of our similarity function, the predictions of our model are constrained to follow the exp(-x) curse and thus are note suited for these evaluation metrics. After training our model, we apply an additional nonparameteric regression step to obtain better-calibrated predictions...Over the training set, the given labels...are regressed against the univariate MaLSTM g-predicted relatedness as the sole covariate, and the fitted regression function is evaluated on the MaLSTM-predicted relatedness of the test pairs to produce adjusted final predictions." The examiner notes that using the "g-relatedness score" (i.e. text similarity score) with a "regression" that use the given labels (i.e. additional data associated with the messages) to obtain produce an adjusted final prediction teaches "combining the text similarity score with additional data associated with the messages to generate a total similarity score."). 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the first and second neural network and similarity score as taught by Nicosia modified with the combination of similarity score and additional data as taught by Mueller because this would produce “better-calibrated predictions” thus improving the accuracy (Mueller Pg. 2789)

generating a message thread based on the generated total similarity score for the pair of messages selected from the plurality of unstructured messages
Mehri, however, does teach generating a message thread based on the generated total similarity score for the pair of messages selected from the plurality of unstructured messages (Pg. 618 Section 3.4.2 Thread Partitioning "Ultimately [message i] mi is assigned to the thread which maximizes the probability outputted by the classifier, provided that the best probability output is above a threshold. If the best probability is below the threshold, a new thread is created for the message..." The examiner notes that "best probability" teaches "...based on the generated total similarity score" and creating a new thread for the message teaches "generating a message thread..."). 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the neural networks and combined total similarity score as taught by the combination of Nicosia and Mueller modified with the generation of a message thread as taught by Mehri because, automatically generating a message thread based on the similarity score of the messages of that thread would result in a more accurate thread, thus improving the users experience (Mehri Pg. 618 3.4.2). 


Claims 3-5 and 15-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Nicosia et al. ("Accurate Sentence Matching with Hybrid Siamese Networks", NPL 2017) in view of Mueller et al. ("Siamese Recurrent Architectures for learning sentence similarity", NPL 2016) and further in view of Mehri et al. ("Chat Disentanglement: Identifying Semantic Reply Relationships with Random Forests and Recurrent Neural networks", NPL 2017) and further in view of Broadbent et al. ("Record Linkage at NASS using Automatch", NPL 1999).  
With respect to Claim 3, the combination of Nicosia, Mueller, and Mehri teach all of the limitations of Claim 2 as described above. 
The combination of Nicosia, Mueller, and Mehri, however, do not appear to explicitly disclose 
wherein the generating the message thread comprises receiving a user input indicative of a link based on the generated total similarity being in an uncertainty range that is less than the link threshold and greater than a non-link threshold
Broadbent, however, does teach wherein the generating the message thread comprises receiving a user input indicative of a link based on the generated total similarity being in an uncertainty range that is less than the link threshold and greater than a non-link threshold (Pg. 2 "[Automatch] assigns a weight to each component of the records being comparing. These component weights are then summed to calculate an aggregate weight for the record pair. The aggregate weight represents the probability that the record pair is a true match. The aggregate weight is completed against two thresholds...to classify each case a match (above the upper The examiner notes that "state office user" resolving "possible matches" teaches "receiving a user input indicative of a link..." The examiner further notes that "possible match" teaches "...based on the generated total similarity being in an uncertainty range that is less than the link threshold [i.e. true match cutoff] and greater than a non-link threshold [i.e. nonmatch cutoff]."). 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the generation of the message thread as taught by the combination of Nicosia, Mueller, and Mehri modified with the user input and thresholds as taught by Broadbent because this would provide ground truth labels for “possible matches”, this in turn would reduce the training time needed because it would reduce the number of iterations that a new message would incorrectly linked (Broadbent Pg. 4).
With respect to Claim 4, the combination of Nicosia, Mueller, Mehri, and Broadbent teach wherein the user input indicative of the link is received in response to a user query transmitted to a user in response to the generated total similarity being calculated to be in the uncertainty range (Broadbent Pg. 2 "[Automatch] assigns a weight to each component of the records being comparing. These component weights are then summed to calculate an aggregate weight for the record pair. The aggregate weight represents the probability that the record pair is a he examiner notes that presenting a user with "possible matches" and unresolved links teaches "...in response to a user query transmitted to a user in response to the generated total similarity being calculated to be in the uncertainty range." That is, under the BRI, the display showing the unresolved links where the user resolves those links teaches "a user query transmitted to a user..."). 
With respect to Claim 5, the combination of Nicosia, Mueller, Mehri, and Broadbent teach wherein the transmitted user query comprises a proposed link and a request that the user provide a confirmation (Broadbent Pg. 2 "[Automatch] assigns a weight to each component of the records being comparing. These component weights are then summed to calculate an aggregate weight for the record pair. The aggregate weight represents the probability that the record pair is a true match. The aggregate weight is completed against two thresholds...to classify each case a match (above the upper cutoff), nonmatch (below the lower cutoff) or possible match (between the upper and lower cutoff)....After running the merge program, the link groups are populated into a resolution database where possible match resolution is performed by state office users who are familiar with the farm operations in their state." The examiner notes that displaying “possible matches” and having the resolution performed by a user 
With respect to Claim 15, the combination of Nicosia, Mueller, and Mehri teach all of the limitations of Claim 14 as described above. 
The combination of Nicosia, Mueller, and Mehri, however, do not appear to explicitly disclose 
wherein the generating the message thread comprises receiving a user input indicative of a link based on the generated total similarity being in an uncertainty range that is less than the link threshold and greater than a non-link threshold
Broadbent, however, does teach wherein the generating the message thread comprises receiving a user input indicative of a link based on the generated total similarity being in an uncertainty range that is less than the link threshold and greater than a non-link threshold (Pg. 2 "[Automatch] assigns a weight to each component of the records being comparing. These component weights are then summed to calculate an aggregate weight for the record pair. The aggregate weight represents the probability that the record pair is a true match. The aggregate weight is completed against two thresholds...to classify each case a match (above the upper cutoff), nonmatch (below the lower cutoff) or possible match (between the upper and lower cutoff)....After running the merge program, the link groups are populated into a resolution database where possible match resolution is performed by state office users who are familiar with the farm operations in their state." The examiner notes that "state office user" resolving "possible matches" teaches "receiving a user input indicative of a link..." The examiner further notes that "possible match" teaches "...based on the generated total similarity being in an uncertainty range that is less than the link threshold [i.e. true match cutoff] and greater than a non-link threshold [i.e. nonmatch cutoff]."). 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the generation of the message thread as taught by the combination of Nicosia, Mueller, and Mehri modified with the user input and thresholds as taught by Broadbent because this would provide ground truth labels for “possible matches”, this in turn would reduce the training time needed because it would reduce the number of iterations that a new message would incorrectly linked (Broadbent Pg. 4).
With respect to Claim 16, the combination of Nicosia, Mueller, Mehri, and Broadbent teach wherein the user input indicative of the link is received in response to a user query transmitted to a user in response to the generated total similarity being calculated to be in the uncertainty range (Broadbent Pg. 2 "[Automatch] assigns a weight to each component of the records being comparing. These component weights are then summed to calculate an aggregate weight for the record pair. The aggregate weight represents the probability that the record pair is a true match. The aggregate weight is completed against two thresholds...to classify each case a match (above the upper cutoff), nonmatch (below the lower cutoff) or possible match (between the upper and lower cutoff)....After running the merge program, the link groups are populated into a resolution database where possible match resolution is performed by state office users who are familiar with the farm operations in their state." he examiner notes that presenting a user with "possible matches" and unresolved links teaches "...in response to a user query transmitted to a user in response to the generated total similarity being calculated to be in the uncertainty range." That is, under the BRI, the display showing the unresolved links where the user resolves those links teaches "a user query transmitted to a user..."). 

Claims 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Nicosia et al. ("Accurate Sentence Matching with Hyrbrid Siamese Networks", NPL 2017) in view of Mueller et al. ("Siamese Recurrent Architectures for learning sentence similarity", NPL 2016) and further in view of Mehri et al. ("Chat Disentanglement: Identifying Semantic Reply Relationships with Random Forests and Recurrent Neural networks", NPL 2017) and further in view of Broadbent et al. ("Record Linkage at NASS using Automatch", NPL 1999) and further in view of Feng, Chong ("Improve Record Linkage Using Active Learning Techniques", NPL 2016). 
With respect to Claim 6, the combination of Nicosia, Mueller, Mehri, and Broadbent teach all of the limitations of Claim 4 as described above. 
   The combination of Nicosia, Mueller, Mehri, and Broadbent, however, do not appear to explicitly disclose retraining the neural network based on the user input. 
	Feng, however, does teach retraining the neural network based on the user input (Pg. 10 Figure 3.1. Note specifically that user feedback (i.e. user input) is used as training data. A person of ordinary skill in the art would readily interpret that if training data includes the user's input, the classifier (e.g neural network) would be trained "based on the user input..." Further note Pg. 19 Algorithm 2. Note especially Line 11 which shows that the classifier is trained based on the training data, which includes the user's feedback (i.e. label).). 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the user input and message thread generation as taught by the combination of Nicosia, Mueller, Mehri, and Broadbent modified with the retraining of the neural network based on the user input as taught by Feng because using user feedback for the training data of the neural network corrects errors the neural network made during the initial training. This, predictably, would result in a more accurate neural network model (Feng Pg. 16). 


Claims 11, 18, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Nicosia et al. ("Accurate Sentence Matching with Hybrid Siamese Networks", NPL 2017) in view of Mueller et al. ("Siamese Recurrent Architectures for learning sentence similarity", NPL 2016) and further in view of Mehri et al. ("Chat Disentanglement: Identifying Semantic Reply Relationships with Random Forests and Recurrent Neural networks", NPL 2017) and further in view of Domeniconi et al. ("A novel method for unsupervised and supervised conversational message thread detection", NPL 2016).
With respect to Claim 11, the combination of Nicosia, Mueller, and Mehri teach all of the limitations of Claim 1 as described above. 

wherein the generating the at least one message thread based on the generated total similarity score comprises: for an unstructured message, ranking a plurality of pairs of messages associated with the unstructured message based on the total similarity score
selecting at least one pair of messages from the ranked plurality of pairs of messages based on a highest total similarity score and using the selected at least one pair of messages as at least one of the edges of a message graph
dividing the message graph into connected components, each connected component representing a conversation thread
Domeniconi, however, does teach wherein the generating the at least one message thread based on the generated total similarity score comprises: for an unstructured message, ranking a plurality of pairs of messages associated with the unstructured message based on the total similarity score (Pg. 47-48. Note Eq(1). Equation 1 shows a "combined" (i.e. total) similarity score for a message pair (mi, mj). Pg. 48. Col. 1 "In our study, we use messages as points and we use weighted edges that connect each message to the other messages. An edge (mi,mj) between two messages...is weighted with the similarity measure...greater weight on an edge indicates that the connected message are more similar, and thus they are closer." The examiner notes that weighting a pair of messages based on their similarity teaches "ranking a plurality of pairs of messages associated with the unstructured message based on the total similarity score."). 
Domeniconi also teaches selecting at least one pair of messages from the ranked plurality of pairs of messages based on a highest total similarity score and using the selected at least one pair of messages as at least one of the edges of a message graph (Pg. 47-48. Note Eq(1). Equation 1 shows a "combined" (i.e. total) similarity score for a message pair (mi, mj). Pg. 48. Col. 1 "In our study, we use messages as points and we use weighted edges that connect each message to the other messages. An edge (mi,mj) between two messages...is weighted with the similarity measure...When DBSCAN tries to retrieve the θ-neighborhood of a message m, it gets all messages that are adjacent to m with a weight in their edge greater or equal to θ. Greater weight on an edge indicates that the connected message are more similar, and thus they are closer." The examiner notes that retrieving (i.e. selecting) only the messages that are greater or equal to the threshold teaches “selecting at least one pair of messages from the ranked plurality of pairs of messages based on a highest total similarity score and using the selected at least one pair of messages as at least one of the edges of a message graph”. ).
Domeniconi further teaches dividing the message graph into connected components, each connected component representing a conversation thread (Pg. 50 Figure 3. "Conversation extraction example. Each wk refers to extracted thread and each cj corresponds to the real conversation of the message." That is, within each extracted thread there are at least one connected component (i.e. conversation). Because they are clustered together they must be connected; for example, they are similar.). 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the message thread generation as taught by the combination of Nicosia, Mueller, and Mehri modified with the ranking of the similarity scores and dividing of the message graph as taught by Domeniconi because this would lead to a more complete view of the importance of each feature (i.e. message) (Domeniconi Pg. 48 Col. 2 section 3.4). 
With respect to Claim 18, the combination of Nicosia, Mueller, Mehri, and Domeniconi teach wherein the generating the at least one message thread based on the generated total similarity score comprises: for an unstructured message, ranking a plurality of pairs of messages associated with the unstructured message based on the total similarity score (Pg. 47-48. Note Eq(1). Equation 1 shows a "combined" (i.e. total) similarity score for a message pair (mi, mj). Pg. 48. Col. 1 "In our study, we use messages as points and we use weighted edges that connect each message to the other messages. An edge (mi,mj) between two messages...is weighted with the similarity measure...greater weight on an edge indicates that the connected message are more similar, and thus they are closer." The examiner notes that weighting a pair of messages based on their similarity teaches "ranking a plurality of pairs of messages associated with the unstructured message based on the total similarity score."). 
The combination of Nicosia, Mueller, Mehri, and Domeniconi also teaches selecting at least one pair of messages from the ranked plurality of pairs of messages based on a highest total similarity score and using the selected at least one pair of messages as at least one of the edges of a message graph (Pg. 47-48. Note Eq(1). Equation 1 shows a "combined" (i.e. total) similarity score for a message pair (mi, mj). Pg. 48. Col. 1 "In our study, we use messages as points and we use weighted edges that connect each message to the other messages. An edge (mi,mj) between two messages...is weighted with the similarity measure...When DBSCAN tries to retrieve the θ-neighborhood of a message m, it gets all messages that are adjacent to m with a weight in their edge greater or equal to θ. Greater weight on an edge indicates that the connected message are more similar, and thus they are closer." The examiner notes that retrieving (i.e. selecting) only the messages that are greater or equal to the threshold teaches “selecting at least one pair of messages from the ranked plurality of pairs of messages based on a highest total similarity score and using the selected at least one pair of messages as at least one of the edges of a message graph”. ).
The combination of Nicosia, Mueller, Mehri, and Domeniconi further teaches dividing the message graph into connected components, each connected component representing a conversation thread (Pg. 50 Figure 3. "Conversation extraction example. Each wk refers to extracted thread and each cj corresponds to the real conversation of the message." That is, within each extracted thread there are at least one connected component (i.e. conversation). Because they are clustered together they must be connected; for example, they are similar.). 

With respect to Claim 20, the combination of Nicosia, Mueller, Mehri, and Domeniconi teach wherein the generating the at least one message thread based on the generated total similarity score comprises: for an unstructured message, ranking a plurality of pairs of messages associated with the unstructured message based on the total similarity score (Pg. 47-48. Note Eq(1). Equation 1 shows a "combined" (i.e. total) similarity score for a message pair (mi, mj). Pg. 48. Col. 1 "In our study, we use messages as points and we use weighted edges that connect each message to the other messages. An edge (mi,mj) between two messages...is weighted with the similarity measure...greater weight on an edge indicates that the connected message are more similar, and thus they are closer." The examiner notes that weighting a pair of messages based on their similarity teaches "ranking a plurality of pairs of messages associated with the unstructured message based on the total similarity score."). 
The combination of Nicosia, Mueller, Mehri, and Domeniconi also teaches selecting at least one pair of messages from the ranked plurality of pairs of messages based on a highest total similarity score and using the selected at least one pair of messages as at least one of the edges of a message graph (Pg. 47-48. Note Eq(1). Equation 1 shows a "combined" (i.e. total) similarity score for a message pair (mi, mj). Pg. 48. Col. 1 "In our study, we use messages as points and we use weighted edges that connect each message to the other messages. An edge (mi,mj) between two messages...is weighted with the similarity measure...When DBSCAN tries to retrieve the θ-neighborhood of a message m, it gets all messages that are adjacent to m with a weight in their edge greater or equal to θ. Greater weight on an edge indicates that the connected message are more similar, and thus they are closer." The examiner notes that retrieving (i.e. selecting) only the messages that are greater or equal to the threshold teaches “selecting at least one pair of messages from the ranked plurality of pairs of messages based on a highest total similarity score and using the selected at least one pair of messages as at least one of the edges of a message graph”. ).
The combination of Nicosia, Mueller, Mehri, and Domeniconi further teaches dividing the message graph into connected components, each connected component representing a conversation thread (Pg. 50 Figure 3. "Conversation extraction example. Each wk refers to extracted thread and each cj corresponds to the real conversation of the message." That is, within each extracted thread there are at least one connected component (i.e. conversation). Because they are clustered together they must be connected; for example, they are similar.). 


Prior Art 
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
1. Das, Arpita et al. “Together we stand: Siamese Networks for Similar Question Retrieval.” NPL 2016. Teaches uses a Siamese network for retrieving similar questions. Also specifically discloses combining a similarity score and “additional data” for a total similarity score. Further discloses that the Siamese Network is a convolutional Siamese network. 
2. Bhatia, Sumit et al. “Adopting Inference Networks for Online Thread Retrieval” NPL 2010. Discusses building and constructing online conversation threads based on 
3. Singh, Amit et al. “Retrieving Similar Discussion Forum Threads: A structure based approach” NPL 2012. Discusses retrieving similar threads by looking at the structure of threads using a message graph. Specifically discloses that threads are made up of connected posts using a similarity metric. 
4. Seo, Jangwon et al. “Online community search using conversational structures” NPL 2011. 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FEN CHRISTOPHER TAMULONIS whose telephone number is (571)272-0934.  The examiner can normally be reached on 7:30AM-5:30PM MON-FRI EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on (571)270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  
/MICHAEL J HUNTLEY/Primary Examiner, Art Unit 2116