DETAILED ACTION
Response to Arguments
Applicant’s arguments filed 03/03/2022, with respect to 35 USC §112(b) have been fully considered and are persuasive. Furthermore, Applicant’s amendments of canceling “one or more” has made the 35 USC § 112(d) rejection inapplicable. Accordingly, the rejection under  35 USC §112(b) and 35 USC § 112(d) have been withdrawn. 
Applicant's arguments filed 03/03/2022 have been fully considered but they are not persuasive in regards to the rejection under 35 USC § 103. Applicant argues that the amended independent claims overcome the prior art rejection of Jurgovsky in view of Guo and in view of Chen. See pages of 9-13 of Applicant’s Arguments submitted on 03/03/2022(“[N]one of the references cited in the Office Action discloses the above-referenced features of the amended independent claims”). 
Examiner respectfully disagrees. while algorithm one of Guo does not detail the elements of the amended independent claims, figure one and algorithm 2 of Guo details all elements of the amended independent claims. 
Figure 1

    PNG
    media_image1.png
    218
    526
    media_image1.png
    Greyscale

Using just Figure 1, Guo teaches: that each of the sequential feature vectors                         
                            
                                
                                    x
                                
                                
                                    (
                                    i
                                    ,
                                     
                                    1
                                    )
                                
                            
                             
                            
                                
                                    x
                                
                                
                                    (
                                    i
                                    ,
                                    2
                                    )
                                
                            
                             
                            
                                
                                    x
                                
                                
                                    (
                                    i
                                    ,
                                     
                                    3
                                    )
                                
                            
                             
                            
                                
                                    x
                                
                                
                                    (
                                    i
                                    ,
                                     
                                    T
                                    )
                                
                            
                        
                     are feed into the GRU based encoder and outputs the latent vector to the GM latent space  (i.e. determining, using a sequence encoder machine learning model and based at least in part on the one or more sequential feature vectors for the given agent, one or more internal behavior vectors for the given agent). Then, the GRU based decoder takes the GM latent vector from the GM Latent space and outputs the reconstructed sequential feature vectors                         
                            
                                
                                    
                                        
                                            x
                                        
                                        ^
                                    
                                
                                
                                    (
                                    i
                                    ,
                                     
                                    1
                                    )
                                
                            
                             
                             
                            
                                
                                    
                                        
                                            x
                                        
                                        ^
                                    
                                
                                
                                    (
                                    i
                                    ,
                                     
                                    2
                                    )
                                
                            
                             
                             
                            
                                
                                    
                                        
                                            x
                                        
                                        ^
                                    
                                
                                
                                    (
                                    i
                                    ,
                                     
                                    3
                                    )
                                
                            
                             
                             
                            
                                
                                    
                                        
                                            x
                                        
                                        ^
                                    
                                
                                
                                    (
                                    i
                                    ,
                                     
                                    T
                                    )
                                
                            
                        
                     (i.e. determining, using a sequence decoder machine learning model and based at least in part on the one or more internal behavior vectors for the given agent, one or more reconstructed sequential feature vectors for the given agent).  
After outputting the reconstructed sequential feature vectors, in Figure 1, the Reconstruction Probability is used to compare  each sequential feature vector                         
                            
                                
                                    x
                                
                                
                                    (
                                    i
                                    ,
                                     
                                    1
                                    )
                                
                            
                             
                            
                                
                                    x
                                
                                
                                    (
                                    i
                                    ,
                                    2
                                    )
                                
                            
                             
                            
                                
                                    x
                                
                                
                                    (
                                    i
                                    ,
                                     
                                    3
                                    )
                                
                            
                             
                            
                                
                                    x
                                
                                
                                    (
                                    i
                                    ,
                                     
                                    T
                                    )
                                
                            
                        
                     and each reconstructed sequential feature vector                         
                            
                                
                                    
                                        
                                            x
                                        
                                        ^
                                    
                                
                                
                                    (
                                    i
                                    ,
                                     
                                    1
                                    )
                                
                            
                             
                             
                            
                                
                                    
                                        
                                            x
                                        
                                        ^
                                    
                                
                                
                                    (
                                    i
                                    ,
                                     
                                    2
                                    )
                                
                            
                             
                             
                            
                                
                                    
                                        
                                            x
                                        
                                        ^
                                    
                                
                                
                                    (
                                    i
                                    ,
                                     
                                    3
                                    )
                                
                            
                             
                             
                            
                                
                                    
                                        
                                            x
                                        
                                        ^
                                    
                                
                                
                                    (
                                    i
                                    ,
                                     
                                    T
                                    )
                                
                            
                        
                     (i.e. determining a dimensionality reduction accuracy measure for the given agent based at least in part on a comparison measure for the one or more reconstructed sequential feature vectors for the given agent and the one or more sequential feature vectors for the given agent). Lastly if the Reconstruction Probability is greater than the threshold alpha then the sequence vector for a given feature vector is labeled normal rather than abnormal (i.e. in response to determining that the dimensionality reduction accuracy measure for the given agent satisfies a dimensionality reduction accuracy measure threshold, generating [the] dimensionality-reduced behavior vector for the given agent based at least in part on the one or more internal behavior vectors for the given agent); See also the current Office Action. 
Accordingly, Guo  teaches all of the elements of the amended independent claims and the 35 USC § 103 rejection is maintained. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-6, 8-13, and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Jurgovsky et al. "Sequence classification for credit-card fraud detection." Expert Systems with Applications 100 (2018)( “Jurgovsky”) in view of Guo et al. "Multidimensional time series anomaly detection: A gru-based gaussian mixture variational autoencoder approach." Asian Conference on Machine Learning. PMLR, 2018(“Guo”) and in view of Chen et al. "VISTA: Validating and refining clusters via visualization." Information Visualization 3.4 (2004)(“Chen”). 
Regarding claim 1, Jurgovsky teaches a computer-implemented method for detecting data abnormalities within agent-specific datasets, the method comprising: receiving, using one or more processor(Jurgovsky, pg. 240, left-column, “Our implementation is based on the deep learning library Keras… [w]e used the random forests implementation from SciKit-Learn….” Note: Libraries such as Keras and SciKit-Learn require as a pre-requisite one or more processors), a plurality of agent-specific data sets, wherein each of the plurality of agent-specific data sets is associated with a given agent of a plurality of agents and comprises (a) one or more continuous features for the  given agent of a plurality of agents and (b) one or more discrete features for the given agent(Jurgovsky, pg. 239, left-column, “Since the data collected during a credit-card transaction must comply with international financial reporting standards, the set of raw features is quite similar throughout the literature. Therefore, we removed all business-specific features and kept only a small set that is comparable to the features used in other studies… Also, we removed all identifiers and quasi-identifiers for card holders and merchants to encourage generalization to new frauds. Table 2 shows the list of features… In order to assess the impact of additional features on the classification accuracy, we defined three feature sets: The first feature set (BASE) contains all raw features after removing business- specific variables and identifiers (see Table 2 ).” Jurgovsky teaches Table 2 shows the list of features. In order to assess the impact of additional features on the classification accuracy, we defined three feature sets: The first feature set (BASE) contains all raw features after removing business- specific variables and identifiers (see Table 2 ) (i.e. a plurality of agent-specific data sets, wherein each of the plurality of agent-specific data sets is associated with a given agent of a plurality of agents and comprises (a) one or more continuous features for the  given agent of a plurality of agents and (b) one or more discrete features for the given agent)); 
for each given agent, using the one or more processors, encoding the one or more discrete features for the given agent into one or more discrete feature vectors for the given agent(Jurgovsky, pg. 239, right-column, “In case of neural networks, we wanted to avoid having very high-dimensional one-hot encoded feature vectors. Therefore, we employed a label encoding mechanism which is quite popular in the domain of natural language processing and neural networks… and is applicable to arbitrary other categorical variables apart from words… For a categorical variable with its set of values C , we assigned each value a random d -dimensional weight vector v , that was drawn from a multivariate uniform distribution                         
                            v
                            ~
                            U
                            
                                
                                    
                                        
                                            
                                                
                                                    -
                                                    0.05
                                                    ,
                                                     
                                                    0.05
                                                
                                            
                                        
                                        
                                            d
                                        
                                    
                                
                            
                            ,
                        
                     with                         
                            d
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    log
                                                
                                                
                                                    2
                                                
                                            
                                        
                                        ⁡
                                        
                                            (
                                            |
                                            C
                                            |
                                            )
                                        
                                    
                                
                            
                            .
                        
                     The feature values and their corresponding vectors (vector embeddings of the feature values) are stored in a dictionary. To encode a particular value of the categorical variable, we look up the value of the feature in the dictionary and retrieve its vector. The embedding vectors are part of the model’s parameters and can be adjusted jointly during parameter estimation.” Jurgovsky teaches For a categorical variable with its set of values C , we assigned each value a random d -dimensional weight vector v , that was drawn from a multivariate uniform distribution                         
                            v
                            ~
                            U
                            
                                
                                    
                                        
                                            
                                                
                                                    -
                                                    0.05
                                                    ,
                                                     
                                                    0.05
                                                
                                            
                                        
                                        
                                            d
                                        
                                    
                                
                            
                            ,
                        
                     with                         
                            d
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    log
                                                
                                                
                                                    2
                                                
                                            
                                        
                                        ⁡
                                        
                                            (
                                            |
                                            C
                                            |
                                            )
                                        
                                    
                                
                            
                            .
                        
                     The feature values and their corresponding vectors (vector embeddings of the feature values) are stored in a dictionary. To encode a particular value of the categorical variable, we look up the value of the feature in the dictionary and retrieve its vector (i.e. for each given agent, using the one or more processors, encoding the one or more discrete features for the given agent into one or more discrete feature vectors for the given agent)); 
for each given agent, using the one or more processors constructing one or more sequential feature vectors for the given agent based at least in part on the one or more discrete feature vectors for the given agent and one or more continuous feature vectors for the given agent corresponding to the one or more continuous features for the given agent(Jurgovsky, pg. 239, left-column, “The goal with the sequence learning approach is to characterize how frauds appear in a non-fraudulent context/sequence. To study the impact of the length of the input sequence on the LSTM prediction accuracy, we defined two scenarios: In one scenario the LSTM was trained with sequences of length 5 (SHORT), in the other scenario the LSTM was trained with sequences of length 10 (LONG). The choice of these lengths is motivated from a statistical analysis in which we observed that a compromised account contains, on average, about 4 fraudulent transactions. Therefore, a sequence length of 5 transactions seemed to be an adequate lower limit to see both frauds and non-frauds within the context. Based on this lower limit, we doubled the sequence length to arrive at the second scenario with a sequence length of 10.” Jurgovsky teaches To study the impact of the length of the input sequence on the LSTM prediction accuracy, we defined two scenarios: In one scenario the LSTM was trained with sequences of length 5 (SHORT), in the other scenario the LSTM was trained with sequences of length 10 (LONG) (i.e. for each given agent, using the one or more processors constructing one or more sequential feature vectors for the given agent based at least in part on the one or more discrete feature vectors for the given agent and one or more continuous feature vectors for the given agent corresponding to the one or more continuous features for the given agent)). 
Jurgovsky does not teach: reducing, dimensionality of the one or more sequential feature vectors for each given agent to generate a plurality of dimensionality-reduced behavior vectors each corresponding to one of the plurality of agents, comprising, for each given agent: determining, using a sequence encoder machine learning model and based at least in part on the one or more sequential feature vectors for the given agent, one or more  internal behavior vectors for the given agent, determining, using a sequence decoder machine learning model and based at least in part on the one or more internal behavior vectors for the given agent, one or more reconstructed sequential feature vectors for the given agent, determining a dimensionality reduction accuracy measure for the given agent based at least in part on a comparison measure for the one or more reconstructed sequential feature vectors for the given agent and the one or more sequential feature vectors for the given agent,  and  in response to determining that the dimensionality reduction accuracy measure for given agent satisfies a dimensionality reduction accuracy measure threshold, generating a dimensionality-reduced2 of 14 LEGAL02/41294413vlAppl. No. 16/257,265 behavior vector for the given agent based at least in part on the one or more internal behavior vector for the given agent.
However Guo teaches: reducing, dimensionality of the one or more sequential feature vectors for each given agent to generate a plurality of dimensionality-reduced behavior vectors each corresponding to one of the plurality of agents(Guo, pg. 98, see also fig. 1, “[I]n this paper we propose an unsupervised GRU-based Gaussian Mixture VAE, called GGM-VAE, for anomaly detection. In particular, Gated Recurrent Unit (GRU) cells are employed to discover the correlations among time sequences. Then we use Gaussian Mixture prior in the latent space to characterize the multimodal data. The VAE infers the latent embedding and reconstruction probability in a variational manner by optimizing the variational lower bound.” Guo teaches: Gated Recurrent Unit (GRU) cells are employed to discover the correlations among time sequences. Then we use Gaussian Mixture prior in the latent space to characterize the multimodal data. The VAE infers the latent embedding (i.e. reducing, dimensionality of the one or more sequential feature vectors for each given agent to generate a plurality of dimensionality-reduced behavior vectors each corresponding to one of the plurality of agents)), comprising, for each given agent: 
determining, using a sequence encoder machine learning model and based at least in part on the one or more sequential feature vectors for the given agent, one or more internal behavior vectors for the given agent(Guo, pgs. 103-106, see also fig. 1 and algorithm 2, “We group the time series system statuses into time windows of size T and each group of time series data, say the ith, is a                         
                            D
                            ×
                            T
                             
                        
                    matrix denoted as                         
                            
                                
                                    X
                                
                                
                                    (
                                    i
                                    )
                                
                            
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    x
                                                
                                                
                                                    
                                                        
                                                            i
                                                            ,
                                                             
                                                            1
                                                        
                                                    
                                                
                                            
                                            ;
                                            
                                                
                                                    x
                                                
                                                
                                                    
                                                        
                                                            i
                                                            ,
                                                             
                                                            2
                                                        
                                                    
                                                
                                            
                                            ;
                                            …
                                            ;
                                            
                                                
                                                    x
                                                
                                                
                                                    
                                                        
                                                            i
                                                            ,
                                                            T
                                                        
                                                    
                                                
                                            
                                        
                                    
                                
                                
                                    D
                                    ×
                                    T
                                
                            
                        
                    … [w]e first feed                         
                            
                                
                                    X
                                
                                
                                    (
                                    i
                                    )
                                
                            
                        
                     into the GRU-based encoder….”), 
determining, using a sequence decoder machine learning model and based at least in part on the one or more internal behavior vectors for the given agent, one or more reconstructed sequential feature vectors for the given agent(Guo, pgs. 103-106, see also fig. 1 and algorithm 2,  “Afterwards, the output of the GRU-based encoder will be mapped to the Gaussian Mixture latent space. The corresponding output will be further transported to the GRU-based decoder part to reconstruct the original input.”), 
determining a dimensionality reduction accuracy measure for the given agent based at least in part on a comparison measure for the one or more reconstructed sequential feature vectors for the given agent and the one or more sequential feature vectors for the given agent(Guo, pg. 106, see also fig. 1 and algorithm 2, “By fitting the input data                         
                            
                                
                                    x
                                
                                
                                    (
                                    i
                                    ,
                                     
                                    t
                                    )
                                
                            
                        
                     into the multivariate Gaussian distribution with the learned reconstructed mean vector and reconstructed standard deviation vector, we can get the corresponding reconstruction probability                         
                            N
                            (
                            
                                
                                    x
                                
                                
                                    
                                        
                                            i
                                            ,
                                             
                                            t
                                        
                                    
                                
                            
                            |
                            
                                
                                    μ
                                
                                
                                    
                                        
                                            x
                                            |
                                        
                                        ^
                                    
                                    z
                                    ,
                                     
                                    
                                        
                                            w
                                        
                                        
                                            
                                                
                                                    k
                                                
                                                
                                                    *
                                                
                                            
                                        
                                    
                                
                            
                            
                                
                                    i
                                    ,
                                     
                                    t
                                    ,
                                     
                                    l
                                
                            
                            ,
                             
                            
                                
                                    σ
                                
                                
                                    
                                        
                                            x
                                            |
                                        
                                        ^
                                    
                                    z
                                    ,
                                     
                                    
                                        
                                            w
                                        
                                        
                                            
                                                
                                                    k
                                                
                                                
                                                    *
                                                
                                            
                                        
                                    
                                
                            
                            [
                            i
                            ,
                             
                            t
                            ,
                             
                            l
                            ]
                            )
                        
                    for the lth generated latent vector. After averaging over all the L reconstruction probabilities, we can get the final reconstruction probability                         
                            R
                            P
                            
                                
                                    x
                                
                                
                                    
                                        
                                            x
                                        
                                        ^
                                    
                                
                            
                            [
                            i
                            ,
                             
                            t
                            ]
                        
                    for each input                         
                            
                                
                                    x
                                
                                
                                    (
                                    i
                                    ,
                                     
                                    t
                                    )
                                
                            
                        
                    .”),  
and in response to determining that the dimensionality reduction accuracy measure for given agent satisfies a dimensionality reduction accuracy measure threshold(Guo, pg. 106, see also fig. 1 and algorithm 2, “Therefore, the system can determine whether the data sample is anomalous by checking if the reconstruction probability is smaller than the learned threshold                         
                            α
                        
                    .”), generating a dimensionality-reduced2 of 14 LEGAL02/41294413vlAppl. No. 16/257,265 behavior vector for the given agent based at least in part on the one or more internal behavior vector for the given agent(Guo, pg. 106, see also fig. 1, As algorithm 2 details in the first if  statement, if                         
                            R
                            P
                            
                                
                                    x
                                
                                
                                    
                                        
                                            x
                                        
                                        ^
                                    
                                
                            
                            
                                
                                    i
                                    ,
                                     
                                    t
                                
                            
                            <
                            α
                             
                        
                     then the sequence vector S  for the group of data i at time t-, i.e. S[i, t]  is labeled as being Anomalous otherwise it is labeled as Normal behavior).
 It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the Jurgovsky’s method in view of Guo the motivation to do so would be to combine the sequential efficiency of recurrent neural networks with the dimensionality reduction techniques of autoencoders with Gaussian Mixture prior distributions for better anomaly prediction (Guo, pgs. 97-98, “Anomaly detection has been a widely researched problem in machine learning and is of paramount importance in many areas such as intrusion detection… fraud detection…health monitoring…The importance of anomaly detection lies in the fact that anomalies in data translate to significant (and often critical) information in a wide variety of application domains… We leverage the Gaussian Mixture prior in the latent representation to characterize the intrinsic multimodality in time series data…Gated Recurrent Unit (GRU) cells are employed under the VAE framework to discover the correlations among the time series data…Experimental results on real world datasets show that the proposed scheme outperforms the state-of-the-art schemes.”).  
Jurgovsky does not teach: constructing, an interface that displays a visual representation of one or more of the plurality of dimensionality-reduced behavior vectors and the abnormal data characteristics; and transmitting, the interface to a user device.
However Chen teaches: constructing, an interface that displays a visual representation of one or more of the plurality of  dimensionality-reduced behavior vectors and the abnormal data characteristics(Chen, pg. 261, left-column, “The k coordinates are equidistantly distributed on the circumference of the circle C, as in Figure 6…We formally describe                         
                            α
                        
                    -mapping as follows. Let a 2D point Q(x, y) represent the image of a k-dimensional max–min normalized (with normalization bounds [-1, 1]) data point P(                        
                            
                                
                                    x
                                
                                
                                    1
                                
                            
                            ,
                            …
                            
                                
                                    x
                                
                                
                                    k
                                
                            
                        
                    ) on the 2D star coordinates. Q(x, y)is determined by the average of the vector sum of the k vectors                         
                            
                                
                                    α
                                
                                
                                    i
                                
                            
                            ⋅
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                            ⋅
                            
                                
                                    
                                        
                                            s
                                        
                                        
                                            i
                                        
                                    
                                
                                →
                            
                             
                        
                    , (i=1..k), where                         
                            
                                
                                    α
                                
                                
                                    i
                                
                            
                        
                     are the k adjustable parameters.” Chen teaches The k coordinates are equidistantly distributed on the circumference of the circle C, as in Figure 6 (i.e. constructing, an interface) Let a 2D point Q(x, y) represent the image of a k-dimensional max–min normalized  data point. Q(x, y) is determined by the average of the vector sum of the k vectors                         
                            
                                
                                    α
                                
                                
                                    i
                                
                            
                            ⋅
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                            ⋅
                            
                                
                                    
                                        
                                            s
                                        
                                        
                                            i
                                        
                                    
                                
                                →
                            
                             
                        
                    , (i=1..k), where                         
                            
                                
                                    α
                                
                                
                                    i
                                
                            
                        
                     are the k adjustable parameters (i.e. that displays a visual representation of one or more of the plurality of dimensionality-reduced behavior vectors and the abnormal data characteristics)); 
and transmitting, the interface to a user device(Chen, pg. 261, right-column, “The VISTA system looks like Figure 7. The task of the VISTA cluster rendering system is to provide the interactive visualization techniques to help the users find and separate the overlapping clusters through continuously changed visualization. We have designed and implemented a set of interactive rendering operations in VISTA.” Chen teaches Figure 7 (i.e. transmitting, the interface to a user device)).  
 It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the Jurgovsky’s method in view of Chen the motivation to do so would be to manually adjust irregular shaped clusters for better predictions (Chen, pgs. 257-259, “Over the past decades most of the clustering research has been focused on automatic clustering algorithms and statistical validity indices. The autoinatic methods are known to work well in dealing with clusters of regular shapes, for example, compact spherical shapes, but incur high error when dealing with arbitrarily shaped clusters… we propose a visual framework that allows the user to be involved into the clustering process via interactive visualization. The core of the visual framework is the visual cluster rendering system VISTA…VISTA imports the algorithmic clustering result into the visual cluster rendering system, and then lets the user participate in the following ‘clustering-evaluation’ iterations interactively. With the reliable mapping mechanism employed by VISTA system, the user can visually validate the defined clusters via interactive operations. The interactive operations also allow the user to refine the clusters or incorporate domain knowledge to define better cluster structure.”).     
Regarding dependent claim 2, Jurgovsky in view of Guo and in view of Chen teaches the method of claim 1, wherein organization of the plurality of dimensionality-reduced behavior vectors is determined using a supervised learning algorithm(Guo, pg. 103, “Figure 1 shows the system architecture, where GRU cells are introduced in both the encoder and the decoder of a Gaussian Mixture VAE… The system works as follows. We first feed                         
                            
                                
                                    X
                                
                                
                                    (
                                    i
                                    )
                                
                            
                        
                     into the GRU-based encoder…Afterwards, the output of the GRU-based encoder will be mapped to the Gaussian Mixture latent space. The corresponding output will be further transported to the GRU-based decoder part to reconstruct the original input. The loss function measures the difference of the reconstructed data from the original input.” Guo teaches Afterwards, the output of the GRU-based encoder will be mapped to the Gaussian Mixture latent space. The corresponding output will be further transported to the GRU-based decoder part to reconstruct the original input. The loss function measures the difference of the reconstructed data from the original input (i.e. organization of the plurality of dimensionality-reduced behavior vectors is determined using a supervised learning algorithm)). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Jurgovsky with the above teachings of Guo for the same rationale stated at Claim 1. 
Regarding dependent claim 3, Jurgovsky in view of Guo and in view of Chen teaches the method of claim 1, wherein organization of the plurality of dimensionality-reduced behavior vectors is determined using an unsupervised learning algorithm(Guo, pgs. 105-106, “In this section, we demonstrate how our GGM-VAE model can be used to detect anomalies with anomaly detection algorithm in detail. Algorithm 2 describes the GGM-VAE based anomaly detection algorithm in an unsupervised learning fashion.”).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Jurgovsky with the above teachings of Guo for the same rationale stated at Claim 1. 
Regarding dependent claim 4, Jurgovsky in view of Guo and in view of Chen teaches the method of claim 1, wherein the plurality of agents are grouped using a filter system(Jurgovsky, pg. 238, right-column, “Our study is based on a dataset of credit-card transactions, recorded from March to May 2015. Each transaction in the dataset has a boolean label assigned, indicating whether the transaction was in fact a fraudulent act.”).  
Regarding dependent claim 5, Jurgovsky in view of Guo and in view of Chen teaches the method of claim 4, wherein the filter system groups the plurality of agents by industry type(Jurgovsky, pg. 238, right-column, “Based on this dataset, we create datasets in the following way: We group all transactions by card holder ID and sort the transactions of each card holder by time. As a result, we obtain a temporally ordered sequence of transactions for each card holder…we denote such sequence as the account of a card holder and the whole set of all accounts as the sequence dataset. We further split the sequence dataset into two mutually exclusive sets: One sequence dataset contains only e-commerce transactions (ECOM) and the other only face-to-face transactions (F2F).”).  
Regarding dependent claim 6, Jurgovsky in view of Guo and in view of Chen teaches the method of claim 1, wherein encoding the one or more discrete features for the given agent comprises encoding the one or more discrete features via at least one of: fixed embeddings lookup, embeddings initialization and evolution during model training, target encoding, one-hot-encoding, or feature hashing(Jurgovsky, pg. 239, right-column, “In case of neural networks, we wanted to avoid having very high-dimensional one-hot encoded feature vectors. Therefore, we employed a label encoding mechanism which is quite popular in the domain of natural language processing and neural networks… and is applicable to arbitrary other categorical variables apart from words… For a categorical variable with its set of values C , we assigned each value a random d -dimensional weight vector v , that was drawn from a multivariate uniform distribution                         
                            v
                            ~
                            U
                            
                                
                                    
                                        
                                            
                                                
                                                    -
                                                    0.05
                                                    ,
                                                     
                                                    0.05
                                                
                                            
                                        
                                        
                                            d
                                        
                                    
                                
                            
                            ,
                        
                     with                         
                            d
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    log
                                                
                                                
                                                    2
                                                
                                            
                                        
                                        ⁡
                                        
                                            (
                                            |
                                            C
                                            |
                                            )
                                        
                                    
                                
                            
                            .
                        
                     The feature values and their corresponding vectors (vector embeddings of the feature values) are stored in a dictionary. To encode a particular value of the categorical variable, we look up the value of the feature in the dictionary and retrieve its vector. The embedding vectors are part of the model’s parameters and can be adjusted jointly during parameter estimation.” Jurgovsky teaches The feature values and their corresponding vectors (vector embeddings of the feature values) are stored in a dictionary. To encode a particular value of the categorical variable, we look up the value of the feature in the dictionary and retrieve its vector (i.e. fixed embeddings lookup) For a categorical variable with its set of values C , we assigned each value a random d -dimensional weight vector v , that was drawn from a multivariate uniform distribution                         
                            v
                            ~
                            U
                            
                                
                                    
                                        
                                            
                                                
                                                    -
                                                    0.05
                                                    ,
                                                     
                                                    0.05
                                                
                                            
                                        
                                        
                                            d
                                        
                                    
                                
                            
                            ,
                        
                     with                         
                            d
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    log
                                                
                                                
                                                    2
                                                
                                            
                                        
                                        ⁡
                                        
                                            (
                                            |
                                            C
                                            |
                                            )
                                        
                                    
                                
                            
                            .
                        
                     The embedding vectors are part of the model’s parameters and can be adjusted jointly during parameter estimation (i.e. embeddings initialization and evolution during model training)).1  
Referring to independent claims 8 and 15, they are rejected on the same basis as independent claim 1 since they are analogous claims.
Referring to dependent claims 9-13 and 16-20, they are rejected on the same basis as dependent claims 2-6 since they are analogous claims.
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Adam Clark Standke whose telephone number is (571)270-1806. The examiner can normally be reached 10AM-8PM M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Adam Clark Standke
Assistant Examiner
Art Unit 2129



/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.