DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 10,936,950. 
Although the claims at issue are not identical, they are not patentably distinct from each other because the USPAT’s claims anticipate the applicant’s claims. For example, Instant claim 1 is entirely anticipated by USPAT 1, Instant claim 2 is anticipated by USPAT claim 1, Instant claim 3 is anticipated by UPSAT claim 2, Instant claim 4 is anticipated by USPAT claim 3, etc.
Applicant is cautioned that should the indicated allowable subject matter be amended into the independent claims without any other amendments, the claims of the instant application would be duplicates of the claims in the USPAT, thus necessitating a statutory 101 double patenting rejection.
As a reminder, it is noted that should it be requested that the double patenting rejection be held in abeyance, the response can be considered non-responsive as per MPEP § 804(I)(B)(1):
“A complete response to a nonstatutory double patenting (NSDP) rejection is either a reply by applicant showing that the claims subject to the rejection are patentably distinct from the reference claims or the filing of a terminal disclaimer in accordance with 37 CFR 1.321 in the pending application(s) with a reply to the Office action (see MPEP § 1490 for a discussion of terminal disclaimers). Such a response is required even when the nonstatutory double patenting rejection is provisional.” 
and:
“As filing a terminal disclaimer, or filing a showing that the claims subject to the rejection are patentably distinct from the reference application’s claims, is necessary for further consideration of the rejection of the claims, such a filing should not be held in abeyance. Only objections or requirements as to form not necessary for further consideration of the claims may be held in abeyance until allowable subject matter is indicated. Replies with an omission should be treated as provided in MPEP § 714.03”
	Examiner suggests Applicant file a terminal disclaimer or amending the claims in order to overcome the double patenting rejection lest their response be considered non-responsive.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4-6, 12-14 and 17-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Harer, Jacob, Chris Reale, and Peter Chin. "Tree-transformer: A transformer-based method for correction of tree-structured data." in view of  Wang, Chao, et al. "Incorporating non-sequential behavior into click models."
Regarding claims 1, 17, and 18, Harer teaches . “a computer-implemented method for processing sequential interaction data, comprising: 
obtaining a dynamic interaction graph constructed based on a dynamic interaction sequence, wherein” (abstract “Many common sequential data sources, such as source code and natural language, have a natural tree-structured representation. These trees can be generated by fitting a sequence to a grammar, yielding a hierarchical ordering of the tokens in the sequence” wherein a tree is a graph and it is dynamic because it can vary): 
 “the dynamic interaction graph comprises a plurality of nodes including, for each interaction feature group, a first node that represents the first object of the interaction feature group and a second node that represents the second object of the interaction feature group” (pg. 5 figure 3 
    PNG
    media_image1.png
    348
    1018
    media_image1.png
    Greyscale
); and 
“for at least a portion of the plurality of nodes, the first node or second node is connected, by edges, to two leaf nodes that represent two objects in a previous interaction feature group corresponding to a previous interaction event in which an interaction object corresponding to the first node or the second node was involved” (see previous citation); 
“determining, in the dynamic interaction graph, a current sequence corresponding to a current node to be analyzed,wherein the current sequence comprises one or more nodes within a predetermined range reachable from the current node through edges and positional encoding of each of the one or more nodes relative to the current node in the dynamic interaction graph” (pg. 4 §3.5 “The Transformer architecture utilizes a positional encoding to help localization of the attention mechanisms” note that while the reference states they do not specifically use optional encoding, it still states that Transformer architecture uses it); and 
“inputting the current sequence into a Transformer-based neural network model, wherein the neural network model comprises an embedding layer one or more attention layers” (pg. 3 §3.1 “In addition to the TCB used in each sub layer, we also use a TCB at the input to both the encoder and decoder. In the encoder this input block combines the embeddings from the parent, the sibling and the current node (p, s, and t). In the decoder, the current node is unknown since it has not yet been produced. Therefor,e this block only combines the parent and sibling embeddings, leaving out the input xt from the equation above” and figure 1 
    PNG
    media_image2.png
    666
    665
    media_image2.png
    Greyscale
which shows attention layers); 
“obtaining, at the embedding layer, N embedded vectors based on, for each of the one or more nodes, node features comprising attribute features of an object represented by the node and the positional encoding of the node” (pg. 3 §3.1 
    PNG
    media_image3.png
    118
    866
    media_image3.png
    Greyscale
where the learned vector is the embedded vector); 
“combining, at each attention layer, input vectors based on a degree of correlation between N input vectors obtained from a preceding layer to generate N output vectors, wherein the N input vectors to a first attention layer of the one or more attention layers comprises the N embedded vectors” (pg. 3 §3.1 “In the encoder this input block combines the embeddings from the parent, the sibling and the current node (p, s, and t).”); and 
“determining, by the neural network model, a feature vector corresponding to the current node based on the N output vectors generated by the one or more attention layers” (pg. 5 §4.2 “The Tree-Transformer’s structure helps the network produce grammatically correct outputs. However, for translation/correction tasks we must additionally ensure that each output, y¯t, is conditionally dependent on both the input, x, and on previous outputs” wherein the output is the feature vector, as is generally the case with Transformer neural networks )
Harer however does not explicitly teach the sequence being an interaction sequence. Wang however teaches “the dynamic interaction sequence comprises a plurality of interaction feature groups corresponding to a plurality of interaction events arranged in chronological order” (pg. 287 figure 3 
    PNG
    media_image4.png
    405
    492
    media_image4.png
    Greyscale
which shows sequential interaction data); 
“each interaction feature group comprises a first object, a second object, and an interaction time of an interaction event that involved the first object and the second object” (previous citation)
It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Harer with that of Wang since a combination of known methods would yield predictable results that is, it is known in the art to model sequential interaction data as shown in Wang. As Harer uses a Transformer neural network to process sequential data, it would operate normally and predictable with the sequential data of Wang.
Note that independent claims 17 and 18 recite the same substantial subject matter as independent claim 1, only differing in embodiment. As such, they are subject to the same rejection. The difference in embodiment including the non-transitory computer readable medium and computer/processor would be inherent to the computing systems of Harer and Wang as all computer contain the above components.
Regarding claim 4, the Harer and Wang references have been addressd above. Harer further teaches “wherein the current node is a node that has no edge in the dynamic interaction graph” ” (pg. 5 figure 3 
    PNG
    media_image1.png
    348
    1018
    media_image1.png
    Greyscale
which shows multiple nodes that do not have an edge such as dog and hole (no edge leading from it))
Regarding claim 5, the Harer and Wang references have been addressed above. Harer further teaches “wherein the one or more nodes within the predetermined range comprise: any node within a predetermined quantity K of edges; and/or  28 any node whose interaction time is within a specified range of the interaction time of the interaction feature group corresponding to the current node” (pg. 5 figure 3 
    PNG
    media_image1.png
    348
    1018
    media_image1.png
    Greyscale
and pg. 4 last ¶ “An alternative to the RNN type architecture is a convolutional or attention-based one, where nodes in layer L are given access to prior nodes in layer L − 1. With the dependence in the same layer removed, the gather operation can be batched over all nodes in a tree” which is a specified range) 
Regarding claim 6, the Harer and Wang references have been addressed above. Harer further teaches “wherein: the first object of each interaction feature group is an object of a first classification and the second object of teach interaction feature group is an object of a second classification” (pg. 5 figure 3
    PNG
    media_image1.png
    348
    1018
    media_image1.png
    Greyscale
the sentence is to be classified);
“for each node in the portion of the plurality of nodes: the two leaf nodes to which the node is connected comprise a left node and a right node; the left node corresponds to the first object of a the first classification in the previous interaction event; and the right node corresponds to the second object of the second classification in the previous interaction event” (pg. 5 figure 3 
    PNG
    media_image1.png
    348
    1018
    media_image1.png
    Greyscale
which shows how the nodes correspond to each other)
Regarding claim 12, the Harer and Wang references have been addressed above. Harer further teaches “wherein: the one or more attention layer comprises a plurality of attention layers comprising a 30first attention layer connected to the embedding layer and one or more subsequent attention layers” (pg. 3 figure 1
    PNG
    media_image5.png
    678
    704
    media_image5.png
    Greyscale
); 
“the first attention layer obtains, as the N input vectors obtained from the preceding layer, the N embedded vectors from the embedding layer” (see previous citation); and 
“each subsequent attention layer obtains, as the N input vectors obtained from the preceding layer, the N output vectors generated by a preceding attention layer that precedes the subsequent attention layer” (as shown in the figure, each layer feeds into the next layer)
Regarding claim 13, the Harer and Wang references have been addressed above. Harer further teaches “wherein the neural network model combines N output vectors obtained by each of the plurality of attention layers to obtain the feature vector corresponding to the current node” (pg. 3 §3.1 “In addition to the TCB used in each sub layer, we also use a TCB at the input to both the encoder and decoder. In the encoder this input block combines the embeddings from the parent, the sibling and the current node (p, s, and t). In the decoder, the current node is unknown since it has not yet been produced. Therefor,e this block only combines the parent and sibling embeddings, leaving out the input xt from the equation above”).
Regarding claim 14, the Harer and Wang references have been addressed above. Harer further teaches “wherein the neural network model combines the N output vectors obtained by a final attention layer in the plurality of attention layers to obtain the feature vector of the current node” (pg. 3 §3.1 “In addition to the TCB used in each sub layer, we also use a TCB at the input to both the encoder and decoder. In the encoder this input block combines the embeddings from the parent, the sibling and the current node (p, s, and t). In the decoder, the current node is unknown since it has not yet been produced. Therefor,e this block only combines the parent and sibling embeddings, leaving out the input xt from the equation above”).
Claims 7-9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Harer, Jacob, Chris Reale, and Peter Chin. "Tree-transformer: A transformer-based method for correction of tree-structured data." in view of  Wang, Chao, et al. "Incorporating non-sequential behavior into click models." further in view of Keskar et al. US 2019/0130273.
Regarding claim 7, the Harer and Wang references have been addressed above. Both do not explicitly teach the positional encoding in specific detail. Wang however teaches “wherein the positional encoding for each of the one or more nodes comprises (i) a quantity of edges between the node and the current node and (ii) whether the node is the left node or the right node” ([0024] “In some embodiments, input stage 210 may perform positional encoding, such that input representation 215 includes positional information (e.g., information pertaining to the ordering of items in input sequence 202 ). For example, input stage 210 may perform additive encoding.” wherein the ordering would be the number of nodes and edges)
It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Harer and Wang with that of Keskar since a combination of known methods would yield predictable results. While Harer generally teaches positional encoding, the Keskar reference goes into more detail. As such the positional encoding in both references would operate in a predictable manner with each other since one reference simply describes it in more detail.
Regarding claim 8, the Harer, Wang, and Keskar references have been addressed above. Wang further teaches “wherein: each interaction feature group further comprises a behavior feature of the interaction event of the interactive feature group” (pg. 287 §4 “At first, click sequence is divided into adjacent click pairs.” wherein clicking is a behavior feature as it represents user action); and 
Keskar further teaches “the node features and the positional encoding of each of the one or more nodes comprise behavior features of an interaction feature group corresponding to the node” ([0024] “In some embodiments, input stage 210 may perform positional encoding, such that input representation 215 includes positional information (e.g., information pertaining to the ordering of items in input sequence 202 ). For example, input stage 210 may perform additive encoding. In this regard, model 200 may retain sensitivity to the ordering of items in input sequence 202 without the use of recurrence (e.g., recurrent neural network layers) in model 200”)
Regarding claim 9, the Harer, Wang, and Keskar references have been addressed above. Keskar further teaches “wherein obtaining the N embedded vectors comprises: embedding the node features of each of the one or more nodes to obtain N node embedded vectors” ([0031] “In some embodiments, branched transformer model 300 may include an embedding layer 350 that generates an output representation 355 based on output sequence 304. In general, embedding layer 350 may perform similar embedding operations based on output sequence 304 to those that input stage 310 performs based on input sequence 302. For example, when output sequence 304 includes a sequence of text, embedding layer 350 may map each word (and/or other suitable token) into a word vector space”) 
“embedding the positional encoding of each of the one or more nodes to obtain N position embedded vectors” ([0031] “Likewise, embedding layer 350 may perform positional encoding. Output representation 355 is then received by the first branched attention decoder layer 330a”); and 
“synchronizing the N node embedded vectors and the N position embedded vectors to obtain the N embedded vectors” ([0031] “Output representation 355 is then received by the first branched attention decoder layer 330a”).
Allowable Subject Matter
No art has been cited for claims 2-3, 10-11, 15-16, and 19-20. In particular no prior art teaches these limitations of obtaining the dynamic interaction graph or the disclosed method of combining input vectors based on the degree of correlation between the input vectors. 
Conclusion


Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEVIN W FIGUEROA whose telephone number is (571)272-4623. The examiner can normally be reached Monday-Friday, 10AM-6PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MIRANDA HUANG can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

KEVIN W FIGUEROA
Primary Patent Examiner
Art Unit 2124



/Kevin W Figueroa/Primary Examiner, Art Unit 2124