Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-4, 6-14, and 16-20 are pending for examination. Claims 1, 11, and 20 are independent.

Response to Amendment
This office action is responsive to the amendment filed on 02/11/2022. As directed by the amendment, claims 1, 10-11, and 19-20 are amended. Claims 5 and 15 are cancelled.

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-4, 6-14, and 16-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1-4, 6-14, and 16-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over  Agrawal et al. ("Sort Story: Sorting Jumbled Images and Captions into Stories", hereinafter "Agrawal") in view of Oliner et al. (US20180032862, hereinafter "Oliner"), and Lee et al. ("Distributed Representations of Sentences and Documents", hereinafter "Lee").

Regarding Claim 1
Agrawal discloses: A computer-implemented method, comprising: accessing, a training set of positive and negative event pairs (Section 3.2 Pairwise Models page 3 left col last para and Equation 3] “we develop pairwise scoring models that given a pair of elements (i, j), learn to assign a score: S([[σi < σj ]] | i, j) indicating whether element i should be placed before element j in the permutation σ…                         
                            
                                
                                    ∑
                                    
                                        1
                                        <
                                        =
                                        i
                                        <
                                        j
                                        <
                                        =
                                        n
                                    
                                
                                
                                    S
                                    (
                                    [
                                    [
                                    σ
                                    i
                                     
                                    <
                                     
                                    σ
                                    j
                                     
                                    ]
                                    ]
                                    )
                                     
                                    -
                                     
                                    S
                                    (
                                    [
                                    [
                                    σ
                                    j
                                     
                                    <
                                     
                                    σ
                                    i
                                     
                                    ]
                                    ]
                                    )
                                
                            
                        
                    ” Examiner reads the pair i<j as a positive pair and j<i as the negative pair from pairs of the training sentences); 
calculating, (i) positive similarity scores between an input pair of events and the positive event pairs in the training set, and (ii) negative similarity scores between the input pair of events and the negative event pairs in the training set ([Section 3.2 Pairwise Models Page 3 Right Col Equation 2-3] “we use an asymmetric penalty that encourages sentences appearing early in the story to be placed closer to the origin than sentences appearing later in the story. Lij = || max(0, α − (xj − xi))||2 Loss =                         
                            
                                
                                    ∑
                                    
                                        1
                                        <
                                         
                                        =
                                        i
                                        <
                                        j
                                        =
                                        n
                                    
                                
                                
                                    
                                        
                                            L
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                        
                    ” Examiner reads positive similarity scores between an input pair of events and negative similarity scores between the input pair of events are the xj-xi for the input pair (at test time calculated during the calculation of S(σj < σi) ) and the xi-xj for the input pair at test time (calculated during the calculation of S(σj < σi) and positive and negative similarity scores between the pairs in the trainng set as xj-xi and xi-xj for the training pairs at training time);
applying, a Softmax process to (i) the positive similarity scores to produce an overall positive similarity score for the input pair of events relative to the negative event pairs, and (ii) the negative similarity scores to produce an overall negative similarity score for the input pair of events relative to the positive event pairs ([Section 3.2 Pairwise Models Page 3 Right Col Equation 2-3] “we use an asymmetric penalty that encourages sentences appearing early in the story to be placed closer to the origin than sentences appearing later in the story. Lij = || max(0, α − (xj − xi))||2 Loss =                         
                            
                                
                                    ∑
                                    
                                        1
                                        <
                                         
                                        =
                                        i
                                        <
                                        j
                                        =
                                        n
                                    
                                
                                
                                    
                                        
                                            L
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                        
                    ” Examiner reads the max score as a SoftMax calculation for overall positive/negative similarity scores where the overall scores are S(σi < σj) =Lij and S(σj< σi) =Lji); 
calculating, the difference between the overall positive similarity score and the overall negative similarity score to obtain a future event prediction score indicating a future occurrence likelihood of pair of events as happening before or after another one of the two constituent events ([Section 3.2 Pairwise Models page 3 right col last para] “Each of these three pairwise approaches assigns a score S(σi , σj |i, j) to an ordered pair of elements (i,j), which is used to construct a pairwise scoring model: Sp(σ) =                         
                            
                                
                                    ∑
                                    
                                        1
                                        <
                                        =
                                        i
                                        <
                                        j
                                        <
                                        =
                                        n
                                    
                                
                                
                                    S
                                    (
                                    [
                                    [
                                    σ
                                    i
                                     
                                    <
                                     
                                    σ
                                    j
                                     
                                    ]
                                    ]
                                    )
                                     
                                    -
                                     
                                    S
                                    (
                                    [
                                    [
                                    σ
                                    j
                                     
                                    <
                                     
                                    σ
                                    i
                                     
                                    ]
                                    ]
                                    )
                                
                            
                        
                    , (3) by summing over the scores for all possible ordered pairs in the permutation. This pairwise score captures local contextual information in stories. Finding the best permutation σ ∗ = arg maxσ∈Σn Sp(σ) under this pairwise model is NP-hard so approximations will be required.” Examiner reads the pairwise score as the difference between the overall scores that tells which event happens before/after. [Conclusion and Fig 1] “We propose the task of “sequencing” in a set of image-caption pairs, with the motivation of learning temporal common sense.”); 
wherein the training set is trained by collecting labels for the positive event pairs and the negative event pairs in the training set, sampling a given one of the positive event pairs and a given one of the negative event pairs from the training set, and calculating a loss function value relating at least to the given one of the positive event pairs and the given one of the negative event pairs ([Section 3.2 Pariwise Models Page 3 Right Col Equation 2] “Similar to the max-margin loss that is applied to negative examples by Vendrov et al. (2016), we use an asymmetric penalty that encourages sentences appearing early in the story to be placed closer to the origin than sentences appearing later in the story. Lij = || max(0, α − (xj − xi))||2 Loss =                         
                            
                                
                                    ∑
                                    
                                        1
                                        <
                                         
                                        =
                                        i
                                        <
                                        j
                                        =
                                        n
                                    
                                
                                
                                    
                                        
                                            L
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                        
                    ”),
Agrawal does not explicitly disclose: performing, by the processor, an action responsive to the future event prediction score.
However, Oliner discloses in the same field of endeavor: performing, by the processor ([Para 0065]), an action responsive to the future event prediction score ([Para 266-277] “Once the next character arrives, the untrained engine 1832 compares/contrasts the expectation value of the character that actually arrived next and the expectation value that the engine calculated for that character before it arrived. When the compare is great (e.g., falls outside an acceptable expectation range or above/below an acceptable expectation threshold), that character is declared as "surprising.” The system generates an alert or notification to highlight the surprising character (or group of characters). This is classified as an anomaly. A human should examine the anomaly and determine if any corrective action needs to be taken.” Examiner interprets the expectation value as a future event prediction score and the corrective action as the responsive action.).
It would be obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention to modify the sorting method taught by Agrawal with the method for anomaly detection taught by Oliner. Doing so determines an anomaly to perform any corrective action (Para 266, Oliner).
Agrawal in view of Oliner does not explicitly disclose: applying backpropagation to reduce the loss function value.
However, lee discloses in the same field of endeavor: applying backpropagation to reduce the loss function value ([Section 2.2] “The paragraph vectors and word vectors are trained using stochastic gradient descent and the gradient is obtained via backpropagation”).
It would be obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention to modify Agrawal, Oliner, and the training method taught by Lee. Doing so can train machine algorithms to predict words in the document (Abstract, Lee).

Regarding Claim 11
Agrawal in view of Oliner, and Lee discloses: computer program product, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith ([Para 0065], Oliner), the program instructions executable by a computer to cause the computer to perform a method comprising: (Claim 11 is a computer program product claim that corresponds to method claim 1 and the rest of the limitations are rejected on the same ground)

Regarding Claim 20
Agrawal in view of Oliner, and Lee discloses: A computer processing system ([Para 0065], Oliner), comprising: a processor (Claim 20 is a system claim that corresponds to method claim 1 and the rest of the limitations are rejected on the same ground)

Regarding Claim 2
Agrawal in view of Oliner, and Lee discloses: implemented method of claim 1, wherein the input pair of events are represented by embedding vectors of the two constituent events forming the input pair of events ([Section 3.2], Agrawal “Specifically, each sentence in the story is embedded X = (x1, . . . , xn), xi ∈ R d +, via an LSTM (Hochreiter and Schmidhuber, 1997) with ReLU non-linearities.”).

Regarding Claim 3
Agrawal in view of Oliner, and Lee discloses: The computer-implemented method of claim 1, wherein the overall positive similarity score and the overall negative similarity score are produced as respective weighted averages ([Section 2 Algorithms and Figures 1-2], Lee “The paragraph vector and word vectors are averaged or concatenated to predict the next word in a context.”).

Regarding Claim 4
Agrawal in view of Oliner, and Lee discloses: The computer-implemented method of claim 1, wherein the overall positive similarity score and the overall negative similarity score are produced using distinctly executable instances of the Softmax process running in parallel ([Section 2.2], Agrawal “Each of these three pairwise approaches assigns a score S(σi , σj |i, j) to an ordered pair of elements (i,j), which is used to construct a pairwise scoring model: Sp(σ) =                         
                            
                                
                                    ∑
                                    
                                        1
                                        <
                                        =
                                        i
                                        <
                                        j
                                        <
                                        =
                                        n
                                    
                                
                                
                                    S
                                    (
                                    [
                                    [
                                    σ
                                    i
                                     
                                    <
                                     
                                    σ
                                    j
                                     
                                    ]
                                    ]
                                    )
                                     
                                    -
                                     
                                    S
                                    (
                                    [
                                    [
                                    σ
                                    j
                                     
                                    <
                                     
                                    σ
                                    i
                                     
                                    ]
                                    ]
                                    )
                                
                            
                        
                    , (3) by summing over the scores for all possible ordered pairs in the permutation. This pairwise score captures local contextual information in stories. Finding the best permutation σ ∗ = arg maxσ∈Σn Sp(σ) under this pairwise model is NP-hard so approximations will be required.” These calculations are done independently/in parallel.).

Regarding Claim 6 
Agrawal in view of Oliner, and Lee discloses: The computer-implemented method of claim 1, wherein constituent events forming each of the positive event pairs and the negative event pairs in the training set have at least one relation there between selected from the group consisting of a temporal relation and a logical relation ([Page 5 Section 4.4 Qualitative Analysis], Agrawal “We visualize our model’s temporal common sense, in Fig. 2. The word clouds show discriminative words – the words that the model believes are indicative of sentence positions in a story.”).

Regarding Claim 7
Agrawal in view of Oliner, and Lee discloses: The computer-implemented method of claim 1, wherein the input pair of events and the event pairs in the training set are triplets having a form of (subject, verb, object) ([Page 5 Section 4.3 and Fig 1], Agrawal “For this, we first parse our story sentences to extract SVO (subject, verb, object) tuples (using the Stanford Parser (Chen and Manning, 2014)).”).

Regarding Claim 8
Agrawal in view of Oliner, and Lee discloses: The computer-implemented method of claim 1, wherein event pair similarity between compared event pairs is based on word embeddings derived for the compared event pairs ([Section 3.2], Agrawal “Specifically, each sentence in the story is embedded X = (x1, . . . , xn), xi ∈ R d +, via an LSTM (Hochreiter and Schmidhuber, 1997) with ReLU non-linearities.”).

Regarding Claim 9
Agrawal in view of Oliner, and Lee discloses: The computer-implemented method of claim 1, wherein the method further comprising providing supporting evident for the future prediction score as a triple having a form of (subject, verb, object) ([Page 5 Section 4.3 and Fig 1], Agrawal “For this, we first parse our story sentences to extract SVO (subject, verb, object) tuples (using the Stanford Parser (Chen and Manning, 2014)).” Figure 1(b) discloses outputs (i.e. future predications) with the form (subject, verb, object).).

Regarding Claim 10
Agrawal in view of Oliner, and Lee discloses: The computer-implemented method of claim 1, wherein the triple provided as the supporting evidence comprises an event pair from the training data having a highest similarity to the input pair of events, the event pair selected from the group consisting of positive event pairs and negative event pairs in the training data ([Section 3.2 Pairwise Models page 3 right col last para and Fig 1] Agrawal “Each of these three pairwise approaches assigns a score S(σi , σj |i, j) to an ordered pair of elements (i,j), which is used to construct a pairwise scoring model: Sp(σ) =                         
                            
                                
                                    ∑
                                    
                                        1
                                        <
                                        =
                                        i
                                        <
                                        j
                                        <
                                        =
                                        n
                                    
                                
                                
                                    S
                                    (
                                    [
                                    [
                                    σ
                                    i
                                     
                                    <
                                     
                                    σ
                                    j
                                     
                                    ]
                                    ]
                                    )
                                     
                                    -
                                     
                                    S
                                    (
                                    [
                                    [
                                    σ
                                    j
                                     
                                    <
                                     
                                    σ
                                    i
                                     
                                    ]
                                    ]
                                    )
                                
                            
                        
                    , (3) by summing over the scores for all possible ordered pairs in the permutation. This pairwise score captures local contextual information in stories. Finding the best permutation σ ∗ = arg maxσ∈Σn Sp(σ) under this pairwise model is NP-hard so approximations will be required.”).

Regarding Claim 12
(CLAIM 12 IS A COMPUTER PROGRAM PRODUCT CLAIM THAT CORRESPONDS TO METHOD CLAIM 2 AND IS REJECTED ON THE SAME GROUND)

Regarding Claim 13
(CLAIM 13 IS A COMPUTER PROGRAM PRODUCT CLAIM THAT CORRESPONDS TO METHOD CLAIM 3 AND IS REJECTED ON THE SAME GROUND)

Regarding Claim 14
(CLAIM 14 IS A COMPUTER PROGRAM PRODUCT CLAIM THAT CORRESPONDS TO METHOD CLAIM 4AND IS REJECTED ON THE SAME GROUND)

Regarding Claim 16
(CLAIM 16 IS A COMPUTER PROGRAM PRODUCT CLAIM THAT CORRESPONDS TO METHOD CLAIM 6 AND IS REJECTED ON THE SAME GROUND)

Regarding Claim 17
(CLAIM 17 IS A COMPUTER PROGRAM PRODUCT CLAIM THAT CORRESPONDS TO METHOD CLAIM 9 AND IS REJECTED ON THE SAME GROUND)

Regarding Claim 18
(CLAIM 18 IS A COMPUTER PROGRAM PRODUCT CLAIM THAT CORRESPONDS TO METHOD CLAIM 8 AND IS REJECTED ON THE SAME GROUND)

Regarding Claim 19
(CLAIM 19 IS A COMPUTER PROGRAM PRODUCT CLAIM THAT CORRESPONDS TO METHOD CLAIM 10 AND IS REJECTED ON THE SAME GROUND)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Vincent et al. ("Temporal Relation Identification and Classification.
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TEWODROS E MENGISTU whose telephone number is (571)270-7714. The examiner can normally be reached Mon-Fri 9:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ABDULLAH KAWSAR can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



                                                                                                                                                                         /TEWODROS E MENGISTU/Examiner, Art Unit 2127              

/BRIAN M SMITH/Primary Examiner, Art Unit 2122