DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination - 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17[e], was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17[e] has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR1.114. The applicant’s submission for RCE filed on 16 April 2021 has been entered. 

Remarks
This action is in response to the applicant’s RCE filed 16 April 2021, which is in response to the USPTO office action mailed 12 May 2020. Claims 1, 7, 8, 14 and 15 are amended. Claims 1-20 are currently pending.

Response to Arguments
With respect to the 35 USC §103 rejection of claims 1, 6-8, 13-15 and 20, the applicant’s arguments are moot in view of a new grounds of rejection, as necessitated by the applicant's amendments.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 6-11, 13-18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zarar et al., US 20180314937 A1 (hereinafter “Zarar”) in view of Fokoue-Nkoutche et al., US 20190303535 A1 (hereinafter “Fokoue-Nkoutche”) in further view of Hariharan et al., US 20170116524 A1 (hereinafter “Hariharan”) in further view of Shen et al., “Deep Asymmetric Pairwise Hashing” (hereinafter “Shen”).

Claim 1: Zarar teaches a computer-implemented method executed on a processor for employing deep learning for time series representation and retrieval, the method comprising:
retrieving multivariate time series segments from a plurality of sensors (Zarar, [0023] note sensor unit includes an inertial measurement unit ("IMU"), which reports on its own motion. In some embodiments, the IMU reports such information as specific force, angular rate, and magnetic field, using, for example, accelerometers, gyroscopes, and/or magnetometers);
storing the multivariate time series segments in a multivariate time series database constructed by a sliding window over a raw time series of data (Zarar, [0039] note a sliding time window is used to apply the DCLSTM, in multiple iterations, to all of the sensor data samples that are in the window in each position of its transit through the time index range of the data, [0042] note storing the obtained version of the accessed time-series data);
applying an input long short-tern memory (LSTM) network employing an input encoder to encode temporal features extracted from the multivariate time series segments and a deterministic attention model to extract real value features and corresponding hash codes (Zarar, [0017] note he facility uses one or more recurrent neural networks as the temporal component, such as one or more long short-term memory neural networks or one or more simple recurrent networks, [Fig. 7] note LSTM units 703 and 705, [0034] note original time-series sensor data 701 is first processed by convolutional neural-network layers 702, then by encoding long short-term memory units 703 to obtain a representation 704 of the sensor data in latent space. The facility processes this representation through encoding LSTM units 705);
executing similarity measurements by an objective function Zarar, [0036] note to train the parameters of DSTSAE, the facility uses a weighted-variant of the standard VAE loss function… α and (1- α) are weights for the generative loss (or reconstruction quality characterized by the expectation term E[·]) and latent loss (or distribution similarity characterized by the Kullback-Leibler divergence DKL[·]), respectively).
Zarar does not explicitly teach attention based; the objective function being a pairwise loss, the pairwise loss ensuring that similar pairs produce similar hash codes and that dissimilar pairs produce dissimilar hash codes, wherein the pairwise loss is given by:                         
                            p
                            
                                
                                    
                                        
                                            S
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                                
                                    B
                                
                            
                            =
                             
                            σ
                            (
                            
                                
                                    Ω
                                
                                
                                    i
                                    j
                                
                            
                            )
                        
                    , where                        
                             
                            
                                
                                    Ω
                                
                                
                                    i
                                    j
                                
                            
                        
                     is an inner product of hash codes of a query i and a sample j,                         
                            σ
                        
                     is a logistic sigmoid function,                         
                            
                                
                                    S
                                
                                
                                    i
                                    j
                                
                            
                        
                     is a similar pair, and                         
                            B
                        
                     is a parameter to learn, and wherein the pairwise loss is used to encode index information by generating discriminative binary codes; given the query, obtaining a relevant time series segment from the multivariate time series segments retrieved from the plurality of sensors; and generating an output including a visual representation of the relevant time series segment on a user interface.
However, Fokoue-Nkoutche teaches attention based (Fokoue-Nkoutche, [0048] note Long Short Term Memory Recurrent Neural Networks (LSTM RNNs) and graph-based convolutional neural networks are used… two-way attention mechanism (shown as                         
                            
                                
                                    α
                                
                                
                                    p
                                    i
                                
                            
                        
                     and                         
                            
                                
                                    α
                                
                                
                                    d
                                    i
                                
                            
                        
                    ) is used to calculate how the pair interact and thus enable the interpretability. Finally, the attention-based vector representations are used by a classifier, a simple sigmoid function, to make a prediction, [0051] note at each time step t, the LSTM unit takes the t-th input token embedding                         
                            
                                
                                    x
                                
                                
                                    t
                                     
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    M
                                
                            
                             
                        
                    and the cell states from the previous time step                         
                            
                                
                                    
                                        
                                            h
                                        
                                        
                                            (
                                            t
                                            -
                                            1
                                            )
                                        
                                    
                                
                                
                                     
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    H
                                
                            
                        
                    ;                         
                            
                                
                                    c
                                
                                
                                    (
                                    t
                                    -
                                    1
                                    )
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    H
                                
                            
                        
                     and produces a hidden state                         
                            
                                
                                    h
                                
                                
                                    t
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    H
                                
                            
                        
                    , [0051] note hyperbolic tangent functions used in equations (3), (5) and (6)).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the neural network of Zarar with the two-way attention based LTSM RNN of Fokoue-Nkoutche according to known methods (i.e. using a two-way attention mechanism to calculate how pairs of data interact). Motivation for doing so is that neural networks with attention mechanism have been effectively applied to vision tasks such as image captioning and natural language processing tasks such as machine translation, where the output components selectively choose information from the input based on the attention weight (Fokoue-Nkoutche, [0055]).
Zarar and Fokoue-Nkoutche do not explicitly teach the objective function being a pairwise loss, the pairwise loss ensuring that similar pairs produce similar hash codes and that dissimilar pairs produce dissimilar hash codes, wherein the pairwise loss is given by:                         
                            p
                            
                                
                                    
                                        
                                            S
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                                
                                    B
                                
                            
                            =
                             
                            σ
                            (
                            
                                
                                    Ω
                                
                                
                                    i
                                    j
                                
                            
                            )
                        
                    , where                        
                             
                            
                                
                                    Ω
                                
                                
                                    i
                                    j
                                
                            
                        
                     is an inner product of hash codes of a query i and a sample j,                         
                            σ
                        
                     is a logistic sigmoid function,                         
                            
                                
                                    S
                                
                                
                                    i
                                    j
                                
                            
                        
                     is a similar pair, and                         
                            B
                        
                     is a parameter to learn, and wherein the pairwise loss is used to encode index information by generating discriminative binary codes; given the query, obtaining a relevant time series segment from the multivariate time series segments retrieved from the plurality of sensors; and generating an output including a visual representation of the relevant time series segment on a user interface.
However, Hariharan teaches given the query, obtaining a relevant time series segment from the multivariate time series segments retrieved from the plurality of sensors; and generating an output including a visual representation of the relevant time series segment on a user interface (Hariharan, [0026] note time series data can be obtained from many devices… devices can include sensors or other components for capturing data over time to generate time series data associated with or related to the device, [0056] note FIG. 8 is an exemplary user interface 800 for providing results of feature engineering time series or machine data and performing a user custom query… from this graph 805, a user can select a region of interest 810. Based on this selection, a query can be performed for data within or matching the selected region 810. The matching data can be presented along with ERP context information 820 via a matching data graph 815).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the sensor data processing of Zarar and Fokoue-Nkoutche with the interface for viewing and querying sensor data of Hariharan according to known methods (i.e. providing an interface for a user to view and query sensor data). Motivation for doing so is that data can be manipulated and analyzed easily to, for example, cluster the data for anomaly detection (Hariharan, [0033]).
Zarar, Fokoue-Nkoutche and Hariharan do not explicitly teach the objective function being a pairwise loss, the pairwise loss ensuring that similar pairs produce similar hash codes and that dissimilar pairs produce dissimilar hash codes, wherein the pairwise loss is given by:                         
                            p
                            
                                
                                    
                                        
                                            S
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                                
                                    B
                                
                            
                            =
                             
                            σ
                            (
                            
                                
                                    Ω
                                
                                
                                    i
                                    j
                                
                            
                            )
                        
                    , where                        
                             
                            
                                
                                    Ω
                                
                                
                                    i
                                    j
                                
                            
                        
                     is an inner product of hash codes of a query i and a sample j,                         
                            σ
                        
                     is a logistic sigmoid function,                         
                            
                                
                                    S
                                
                                
                                    i
                                    j
                                
                            
                        
                     is a similar pair, and                         
                            B
                        
                     is a parameter to learn, and wherein the pairwise loss is used to encode index information by generating discriminative binary codes
However, Shen teaches this (Shen, [Section 1] note Hashing methods are proposed to map images to compact binary codes that approximately preserve the data structure or semantic affinity in original space, [Section 3.2] note deep hash functions using deep neural networks has more powerful learning capability than hand-crafted features extracted in advance and thus is able to learn feature representations for semantic similarity search, [Section 3.3] note In this work, inner product is utilized to be a good surrogate of the Hamming distance to quantify the pairwise similarity… the pairwise logistic function defined by equation (4); i.e.                         
                            p
                            
                                
                                    
                                        
                                            s
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                                
                                    
                                        
                                            b
                                        
                                        
                                            i
                                            j
                                        
                                    
                                    ,
                                    
                                        
                                            h
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                            =
                             
                            σ
                            
                                
                                    
                                        
                                            Θ
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                        
                    , when                         
                            
                                
                                    s
                                
                                
                                    i
                                    j
                                
                            
                            =
                            1
                        
                    , where                         
                            σ
                            
                                
                                    x
                                
                            
                        
                     is the sigmoid function, [Section 3.3] note a good hashing method should produce binary codes with the properties: (1) independence, i.e., different bits in the binary codes are independent to each other; (2) balance, i.e. each bit hash a 50% chance of being 1 or -1).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the neural network of Zarar, Fokoue-Nkoutche and Hariharan with the deep hash functions of Shen according to known methods (i.e. calculating pairwise logistic function). Motivation for doing so is that deep hash functions using deep neural networks has more powerful learning capability than hand-crafted features extracted in advance and thus is able to learn feature representations for semantic similarity search (Shen, [Section 3.2]).

Claim 2: Zarar, Fokoue-Nkoutche, Hariharan and Shen teach the method of claim 1, wherein the input attention-based encoder is given by:                         
                            
                                
                                    e
                                
                                
                                    t
                                
                                
                                    k
                                
                            
                            =
                            
                                
                                    v
                                
                                
                                    e
                                
                                
                                    ⊺
                                
                            
                            
                                
                                    tanh
                                
                                ⁡
                                
                                    
                                        
                                            
                                                
                                                    W
                                                
                                                
                                                    e
                                                
                                            
                                            
                                                
                                                    
                                                        
                                                            h
                                                        
                                                        
                                                            t
                                                            -
                                                            1
                                                        
                                                    
                                                    ;
                                                    
                                                        
                                                            s
                                                        
                                                        
                                                            t
                                                            -
                                                            1
                                                        
                                                    
                                                
                                            
                                            +
                                            
                                                
                                                    U
                                                
                                                
                                                    e
                                                
                                            
                                            
                                                
                                                    x
                                                
                                                
                                                    k
                                                
                                            
                                        
                                    
                                
                            
                        
                    , where                         
                            
                                
                                    h
                                
                                
                                    t
                                    -
                                    1
                                
                            
                        
                     is a previous hidden state,                         
                            
                                
                                    s
                                
                                
                                    t
                                    -
                                    1
                                
                            
                        
                     is a cell state,                         
                            
                                
                                    v
                                
                                
                                    e
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    T
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    e
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    T
                                    x
                                    2
                                    m
                                
                            
                        
                     and                         
                            
                                
                                    U
                                
                                
                                    e
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    T
                                    x
                                    T
                                
                            
                        
                     are parameters to learn, and                         
                            
                                
                                    x
                                
                                
                                    k
                                
                            
                        
                     is a time series of length T, where T is a length of a window size (Fokoue-Nkoutche, [0048] note Long Short Term Memory Recurrent Neural Networks (LSTM RNNs) and graph-based convolutional neural networks are used… two-way attention mechanism (shown as                         
                            
                                
                                    α
                                
                                
                                    p
                                    i
                                
                            
                        
                     and                         
                            
                                
                                    α
                                
                                
                                    d
                                    i
                                
                            
                        
                    ) is used to calculate how the pair interact and thus enable the interpretability. Finally, the attention-based vector representations are used by a classifier, a simple sigmoid function, to make a prediction, [0051] note at each time step t, the LSTM unit takes the t-th input token embedding                         
                            
                                
                                    x
                                
                                
                                    t
                                     
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    M
                                
                            
                             
                        
                    and the cell states from the previous time step                         
                            
                                
                                    
                                        
                                            h
                                        
                                        
                                            (
                                            t
                                            -
                                            1
                                            )
                                        
                                    
                                
                                
                                     
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    H
                                
                            
                        
                    ;                         
                            
                                
                                    c
                                
                                
                                    (
                                    t
                                    -
                                    1
                                    )
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    H
                                
                            
                        
                     and produces a hidden state                         
                            
                                
                                    h
                                
                                
                                    t
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    H
                                
                            
                        
                    , [0051] note hyperbolic tangent functions used in equations (3), (5) and (6)).

Claim 3: Zarar, Fokoue-Nkoutche, Hariharan and Shen teach the method of claim 2, wherein the deterministic attention model includes an attention weight given by:                         
                            
                                
                                    α
                                
                                
                                    t
                                
                                
                                    k
                                
                            
                            =
                            
                                
                                    e
                                    x
                                    p
                                    ⁡
                                    (
                                    
                                        
                                            e
                                        
                                        
                                            t
                                        
                                        
                                            k
                                        
                                    
                                    )
                                
                                
                                    
                                        
                                            ∑
                                            
                                                i
                                                =
                                                1
                                            
                                            
                                                n
                                            
                                        
                                        
                                            e
                                            x
                                            p
                                            ⁡
                                            (
                                            
                                                
                                                    e
                                                
                                                
                                                    t
                                                
                                                
                                                    i
                                                
                                            
                                            )
                                        
                                    
                                
                            
                        
                     (Fokoue-Nkoutche, [0060] note Finally,                         
                            
                                
                                    α
                                
                                
                                    p
                                
                            
                        
                     and                         
                            
                                
                                    α
                                
                                
                                    d
                                
                            
                        
                     are exponentially normalized by a softmax function… where the softmax function is defined as:                         
                            
                                
                                    [
                                    s
                                    o
                                    f
                                    t
                                    m
                                    a
                                    x
                                    
                                        
                                            v
                                        
                                    
                                    ]
                                
                                
                                    i
                                
                            
                            =
                            
                                
                                    
                                        
                                            e
                                        
                                        
                                            
                                                
                                                    v
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            ∑
                                            
                                                j
                                            
                                        
                                        
                                            
                                                
                                                    e
                                                
                                                
                                                    
                                                        
                                                            v
                                                        
                                                        
                                                            j
                                                        
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                    ).

Claim 4: Zarar, Fokoue-Nkoutche, Hariharan and Shen teach method of claim 3, wherein the input attention-based encoder to ensure that all attention weights sum to 1 (Fokoue-Nkoutche, [0060] note Finally,                         
                            
                                
                                    α
                                
                                
                                    p
                                
                            
                        
                     and                         
                            
                                
                                    α
                                
                                
                                    d
                                
                            
                        
                     are exponentially normalized by a softmax function… where the softmax function is defined as:                         
                            
                                
                                    [
                                    s
                                    o
                                    f
                                    t
                                    m
                                    a
                                    x
                                    
                                        
                                            v
                                        
                                    
                                    ]
                                
                                
                                    i
                                
                            
                            =
                            
                                
                                    
                                        
                                            e
                                        
                                        
                                            
                                                
                                                    v
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            ∑
                                            
                                                j
                                            
                                        
                                        
                                            
                                                
                                                    e
                                                
                                                
                                                    
                                                        
                                                            v
                                                        
                                                        
                                                            j
                                                        
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                    ; i.e. the examiner interprets the “normalized” function sums to 1).

Claim 6: Zarar, Fokoue-Nkoutche, Hariharan and Shen teach the method of claim 1, further comprising obtaining the hash codes by employing a tanh() function and a sign() function (Shen, [Section 3.2] note we utilize the hyperbolic tangent (tanh) function as the activation function. For the topmost layer, we define the binary code as:                         
                            
                                
                                    b
                                
                                
                                    i
                                
                            
                            =
                            s
                            i
                            g
                            n
                            
                                
                                    
                                        
                                            u
                                        
                                        
                                            i
                                        
                                    
                                
                            
                             
                            
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    h
                                
                                
                                    i
                                
                            
                            =
                            s
                            i
                            g
                            n
                            
                                
                                    
                                        
                                            z
                                        
                                        
                                            j
                                        
                                    
                                
                            
                             
                            
                                
                                    2
                                
                            
                        
                    ).

Claim 7: Zarar, Fokoue-Nkoutche, Hariharan and Shen teach the method of claim 1, wherein the input attention based LSTM network assists in indicating which sensor of the plurality of sensors is more important for the produced multivariate time series segment representation (Fokoue-Nkoutche, [0055] note Neural networks with attention mechanism have been effectively applied to vision tasks such as image captioning and natural language processing tasks such as machine translation, where the output components selectively choose information from the input based on the attention weights).

Claim 8: Zarar teaches a system for employing deep learning for time series representation and retrieval, the system comprising: a memory; and a processor in communication with the memory, wherein the processor runs program code to:
retrieve multivariate time series segments from a plurality of sensors (Zarar, [0023] note sensor unit includes an inertial measurement unit ("IMU"), which reports on its own motion. In some embodiments, the IMU reports such information as specific force, angular rate, and magnetic field, using, for example, accelerometers, gyroscopes, and/or magnetometers);
store the multivariate time series segments in a multivariate time series database constructed by a sliding window over a raw time series of data (Zarar, [0023] note sensor unit includes an inertial measurement unit ("IMU"), which reports on its own motion. In some embodiments, the IMU reports such information as specific force, angular rate, and magnetic field, using, for example, accelerometers, gyroscopes, and/or magnetometers);
apply an input attention based recurrent neural network employing an input attention- based encoder and a deterministic attention model to extract real value features and corresponding hash codes (Zarar, [0017] note the artifact-reduction neural network includes a spatial component for encoding and decoding spatial aspects of the sensor signal, which is discrete from a temporal component for encoding and decoding time-based aspects of the sensor signal, [Fig. 7], [0034] note original time-series sensor data 701 is first processed by convolutional neural-network layers 702, then by encoding long short-term memory units 703 to obtain a representation 704 of the sensor data in latent space); and
execute similarity measurements by an objective function (Zarar, [0036] note to train the parameters of DSTSAE, the facility uses a weighted-variant of the standard VAE loss function… α and (1- α) are weights for the generative loss (or reconstruction quality characterized by the expectation term E[·]) and latent loss (or distribution similarity characterized by the Kullback-Leibler divergence DKL[·]), respectively).
Zarar does not explicitly teach attention based; the objective function being a pairwise loss, the pairwise loss ensuring that similar pairs produce similar hash codes and that dissimilar pairs produce dissimilar hash codes, wherein the pairwise loss is given by:                         
                            p
                            
                                
                                    
                                        
                                            S
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                                
                                    B
                                
                            
                            =
                             
                            σ
                            (
                            
                                
                                    Ω
                                
                                
                                    i
                                    j
                                
                            
                            )
                        
                    , where                        
                             
                            
                                
                                    Ω
                                
                                
                                    i
                                    j
                                
                            
                        
                     is an inner product of hash codes of a query i and a sample j,                         
                            σ
                        
                     is a logistic sigmoid function,                         
                            
                                
                                    S
                                
                                
                                    i
                                    j
                                
                            
                        
                     is a similar pair, and                         
                            B
                        
                     is a parameter to learn, and wherein the pairwise loss is used to encode index information by generating discriminative binary codes; given the query, obtaining a relevant time series segment from the multivariate time series segments retrieved from the plurality of sensors; and generating an output including a visual representation of the relevant time series segment on a user interface.
However, Fokoue-Nkoutche teaches attention based (Fokoue-Nkoutche, [0048] note Long Short Term Memory Recurrent Neural Networks (LSTM RNNs) and graph-based convolutional neural networks are used… two-way attention mechanism (shown as                         
                            
                                
                                    α
                                
                                
                                    p
                                    i
                                
                            
                        
                     and                         
                            
                                
                                    α
                                
                                
                                    d
                                    i
                                
                            
                        
                    ) is used to calculate how the pair interact and thus enable the interpretability. Finally, the attention-based vector representations are used by a classifier, a simple sigmoid function, to make a prediction, [0051] note at each time step t, the LSTM unit takes the t-th input token embedding                         
                            
                                
                                    x
                                
                                
                                    t
                                     
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    M
                                
                            
                             
                        
                    and the cell states from the previous time step                         
                            
                                
                                    
                                        
                                            h
                                        
                                        
                                            (
                                            t
                                            -
                                            1
                                            )
                                        
                                    
                                
                                
                                     
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    H
                                
                            
                        
                    ;                         
                            
                                
                                    c
                                
                                
                                    (
                                    t
                                    -
                                    1
                                    )
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    H
                                
                            
                        
                     and produces a hidden state                         
                            
                                
                                    h
                                
                                
                                    t
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    H
                                
                            
                        
                    , [0051] note hyperbolic tangent functions used in equations (3), (5) and (6)).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the neural network of Zarar with the two-way attention based LTSM RNN of Fokoue-Nkoutche according to known methods (i.e. using a two-way attention mechanism to calculate how pairs of data interact). Motivation for doing so is that neural networks with attention mechanism have been effectively applied to vision tasks such as image captioning and natural language processing tasks such as machine translation, where the output components selectively choose information from the input based on the attention weight (Fokoue-Nkoutche, [0055]).
Zarar and Fokoue-Nkoutche do not explicitly teach the objective function being a pairwise loss, the pairwise loss ensuring that similar pairs produce similar hash codes and that dissimilar pairs produce dissimilar hash codes, wherein the pairwise loss is given by:                         
                            p
                            
                                
                                    
                                        
                                            S
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                                
                                    B
                                
                            
                            =
                             
                            σ
                            (
                            
                                
                                    Ω
                                
                                
                                    i
                                    j
                                
                            
                            )
                        
                    , where                        
                             
                            
                                
                                    Ω
                                
                                
                                    i
                                    j
                                
                            
                        
                     is an inner product of hash codes of a query i and a sample j,                         
                            σ
                        
                     is a logistic sigmoid function,                         
                            
                                
                                    S
                                
                                
                                    i
                                    j
                                
                            
                        
                     is a similar pair, and                         
                            B
                        
                     is a parameter to learn, and wherein the pairwise loss is used to encode index information by generating discriminative binary codes; given the query, obtaining a relevant time series segment from the multivariate time series segments retrieved from the plurality of sensors; and generating an output including a visual representation of the relevant time series segment on a user interface.
However, Hariharan teaches given the query, obtaining a relevant time series segment from the multivariate time series segments retrieved from the plurality of sensors; and generating an output including a visual representation of the relevant time series segment on a user interface (Hariharan, [0026] note time series data can be obtained from many devices… devices can include sensors or other components for capturing data over time to generate time series data associated with or related to the device, [0056] note FIG. 8 is an exemplary user interface 800 for providing results of feature engineering time series or machine data and performing a user custom query… from this graph 805, a user can select a region of interest 810. Based on this selection, a query can be performed for data within or matching the selected region 810. The matching data can be presented along with ERP context information 820 via a matching data graph 815).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the sensor data processing of Zarar and Fokoue-Nkoutche with the interface for viewing and querying sensor data of Hariharan according to known methods (i.e. providing an interface for a user to view and query sensor data). Motivation for doing so is that data can be manipulated and analyzed easily to, for example, cluster the data for anomaly detection (Hariharan, [0033]).
Zarar, Fokoue-Nkoutche and Hariharan do not explicitly teach the objective function being a pairwise loss, the pairwise loss ensuring that similar pairs produce similar hash codes and that dissimilar pairs produce dissimilar hash codes, wherein the pairwise loss is given by:                         
                            p
                            
                                
                                    
                                        
                                            S
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                                
                                    B
                                
                            
                            =
                             
                            σ
                            (
                            
                                
                                    Ω
                                
                                
                                    i
                                    j
                                
                            
                            )
                        
                    , where                        
                             
                            
                                
                                    Ω
                                
                                
                                    i
                                    j
                                
                            
                        
                     is an inner product of hash codes of a query i and a sample j,                         
                            σ
                        
                     is a logistic sigmoid function,                         
                            
                                
                                    S
                                
                                
                                    i
                                    j
                                
                            
                        
                     is a similar pair, and                         
                            B
                        
                     is a parameter to learn, and wherein the pairwise loss is used to encode index information by generating discriminative binary codes
However, Shen teaches this (Shen, [Section 1] note Hashing methods are proposed to map images to compact binary codes that approximately preserve the data structure or semantic affinity in original space, [Section 3.2] note deep hash functions using deep neural networks has more powerful learning capability than hand-crafted features extracted in advance and thus is able to learn feature representations for semantic similarity search, [Section 3.3] note In this work, inner product is utilized to be a good surrogate of the Hamming distance to quantify the pairwise similarity… the pairwise logistic function defined by equation (4); i.e.                         
                            p
                            
                                
                                    
                                        
                                            s
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                                
                                    
                                        
                                            b
                                        
                                        
                                            i
                                            j
                                        
                                    
                                    ,
                                    
                                        
                                            h
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                            =
                             
                            σ
                            
                                
                                    
                                        
                                            Θ
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                        
                    , when                         
                            
                                
                                    s
                                
                                
                                    i
                                    j
                                
                            
                            =
                            1
                        
                    , where                         
                            σ
                            
                                
                                    x
                                
                            
                        
                     is the sigmoid function, [Section 3.3] note a good hashing method should produce binary codes with the properties: (1) independence, i.e., different bits in the binary codes are independent to each other; (2) balance, i.e. each bit hash a 50% chance of being 1 or -1).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the neural network of Zarar, Fokoue-Nkoutche and Hariharan with the deep hash functions of Shen according to known methods (i.e. calculating pairwise logistic function). Motivation for doing so is that deep hash functions using deep neural networks has more powerful learning capability than hand-crafted features extracted in advance and thus is able to learn feature representations for semantic similarity search (Shen, [Section 3.2]).
		
Claim 9: Zarar, Fokoue-Nkoutche, Hariharan and Shen teach the system of claim 8, wherein the input attention-based encoder is given by:                         
                            
                                
                                    e
                                
                                
                                    t
                                
                                
                                    k
                                
                            
                            =
                            
                                
                                    v
                                
                                
                                    e
                                
                                
                                    ⊺
                                
                            
                            
                                
                                    tanh
                                
                                ⁡
                                
                                    
                                        
                                            
                                                
                                                    W
                                                
                                                
                                                    e
                                                
                                            
                                            
                                                
                                                    
                                                        
                                                            h
                                                        
                                                        
                                                            t
                                                            -
                                                            1
                                                        
                                                    
                                                    ;
                                                    
                                                        
                                                            s
                                                        
                                                        
                                                            t
                                                            -
                                                            1
                                                        
                                                    
                                                
                                            
                                            +
                                            
                                                
                                                    U
                                                
                                                
                                                    e
                                                
                                            
                                            
                                                
                                                    x
                                                
                                                
                                                    k
                                                
                                            
                                        
                                    
                                
                            
                        
                    , where                         
                            
                                
                                    h
                                
                                
                                    t
                                    -
                                    1
                                
                            
                        
                     is a previous hidden state,                         
                            
                                
                                    s
                                
                                
                                    t
                                    -
                                    1
                                
                            
                        
                     is a cell state,                         
                            
                                
                                    v
                                
                                
                                    e
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    T
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    e
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    T
                                    x
                                    2
                                    m
                                
                            
                        
                     and                         
                            
                                
                                    U
                                
                                
                                    e
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    T
                                    x
                                    T
                                
                            
                        
                     are parameters to learn, and                         
                            
                                
                                    x
                                
                                
                                    k
                                
                            
                        
                     is a time series of length T, where T is a length of a window size (Fokoue-Nkoutche, [0048] note Long Short Term Memory Recurrent Neural Networks (LSTM RNNs) and graph-based convolutional neural networks are used… two-way attention mechanism (shown as                         
                            
                                
                                    α
                                
                                
                                    p
                                    i
                                
                            
                        
                     and                         
                            
                                
                                    α
                                
                                
                                    d
                                    i
                                
                            
                        
                    ) is used to calculate how the pair interact and thus enable the interpretability. Finally, the attention-based vector representations are used by a classifier, a simple sigmoid function, to make a prediction, [0051] note at each time step t, the LSTM unit takes the t-th input token embedding                         
                            
                                
                                    x
                                
                                
                                    t
                                     
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    M
                                
                            
                             
                        
                    and the cell states from the previous time step                         
                            
                                
                                    
                                        
                                            h
                                        
                                        
                                            (
                                            t
                                            -
                                            1
                                            )
                                        
                                    
                                
                                
                                     
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    H
                                
                            
                        
                    ;                         
                            
                                
                                    c
                                
                                
                                    (
                                    t
                                    -
                                    1
                                    )
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    H
                                
                            
                        
                     and produces a hidden state                         
                            
                                
                                    h
                                
                                
                                    t
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    H
                                
                            
                        
                    , [0051] note hyperbolic tangent functions used in equations (3), (5) and (6)).

Claim 10: Zarar, Fokoue-Nkoutche, Hariharan and Shen teach the system of claim 9, wherein the deterministic attention model includes an attention weight given by:                         
                            
                                
                                    α
                                
                                
                                    t
                                
                                
                                    k
                                
                            
                            =
                            
                                
                                    e
                                    x
                                    p
                                    ⁡
                                    (
                                    
                                        
                                            e
                                        
                                        
                                            t
                                        
                                        
                                            k
                                        
                                    
                                    )
                                
                                
                                    
                                        
                                            ∑
                                            
                                                i
                                                =
                                                1
                                            
                                            
                                                n
                                            
                                        
                                        
                                            e
                                            x
                                            p
                                            ⁡
                                            (
                                            
                                                
                                                    e
                                                
                                                
                                                    t
                                                
                                                
                                                    i
                                                
                                            
                                            )
                                        
                                    
                                
                            
                        
                     (Fokoue-Nkoutche, [0060] note Finally,                         
                            
                                
                                    α
                                
                                
                                    p
                                
                            
                        
                     and                         
                            
                                
                                    α
                                
                                
                                    d
                                
                            
                        
                     are exponentially normalized by a softmax function… where the softmax function is defined as:                         
                            
                                
                                    [
                                    s
                                    o
                                    f
                                    t
                                    m
                                    a
                                    x
                                    
                                        
                                            v
                                        
                                    
                                    ]
                                
                                
                                    i
                                
                            
                            =
                            
                                
                                    
                                        
                                            e
                                        
                                        
                                            
                                                
                                                    v
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            ∑
                                            
                                                j
                                            
                                        
                                        
                                            
                                                
                                                    e
                                                
                                                
                                                    
                                                        
                                                            v
                                                        
                                                        
                                                            j
                                                        
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                    ).

Claim 11: Zarar, Fokoue-Nkoutche, Hariharan and Shen teach system of claim 10, wherein the input attention-based encoder to ensure that all attention weights sum to 1 (Fokoue-Nkoutche, [0060] note Finally,                         
                            
                                
                                    α
                                
                                
                                    p
                                
                            
                        
                     and                         
                            
                                
                                    α
                                
                                
                                    d
                                
                            
                        
                     are exponentially normalized by a softmax function… where the softmax function is defined as:                         
                            
                                
                                    [
                                    s
                                    o
                                    f
                                    t
                                    m
                                    a
                                    x
                                    
                                        
                                            v
                                        
                                    
                                    ]
                                
                                
                                    i
                                
                            
                            =
                            
                                
                                    
                                        
                                            e
                                        
                                        
                                            
                                                
                                                    v
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            ∑
                                            
                                                j
                                            
                                        
                                        
                                            
                                                
                                                    e
                                                
                                                
                                                    
                                                        
                                                            v
                                                        
                                                        
                                                            j
                                                        
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                    ; i.e. the examiner interprets the “normalized” function sums to 1).

Claim 13: Zarar, Fokoue-Nkoutche, Hariharan and Shen teach the system of claim 8, wherein the hash codes are obtained by employing a tanh() function and a sign() function (Shen, [Section 3.2] note we utilize the hyperbolic tangent (tanh) function as the activation function. For the topmost layer, we define the binary code as:                         
                            
                                
                                    b
                                
                                
                                    i
                                
                            
                            =
                            s
                            i
                            g
                            n
                            
                                
                                    
                                        
                                            u
                                        
                                        
                                            i
                                        
                                    
                                
                            
                             
                            
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    h
                                
                                
                                    i
                                
                            
                            =
                            s
                            i
                            g
                            n
                            
                                
                                    
                                        
                                            z
                                        
                                        
                                            j
                                        
                                    
                                
                            
                             
                            
                                
                                    2
                                
                            
                        
                    ).

Claim 14: Zarar, Fokoue-Nkoutche, Hariharan and Shen teach the system of claim 8, wherein the input attention based LSTM network assists in indicating which sensor of the plurality of sensors is more important for the produced multivariate time series segment representation (Fokoue-Nkoutche, [0055] note Neural networks with attention mechanism have been effectively applied to vision tasks such as image captioning and natural language processing tasks such as machine translation, where the output components selectively choose information from the input based on the attention weights).

Claim 15: Zarar teaches a non-transitory computer-readable storage medium comprising a computer- readable program for employing deep learning for time series representation and retrieval, wherein the computer-readable program when executed on a computer causes the computer to perform the steps of:
retrieving multivariate time series segments from a plurality of sensors (Zarar, [0023] note sensor unit includes an inertial measurement unit ("IMU"), which reports on its own motion. In some embodiments, the IMU reports such information as specific force, angular rate, and magnetic field, using, for example, accelerometers, gyroscopes, and/or magnetometers);
storing the multivariate time series segments in a multivariate time series database constructed by a sliding window over a raw time series of data (Zarar, [0039] note a sliding time window is used to apply the DCLSTM, in multiple iterations, to all of the sensor data samples that are in the window in each position of its transit through the time index range of the data, [0042] note storing the obtained version of the accessed time-series data);
applying an input based recurrent neural network employing an input based encoder and a deterministic attention mode to extract real value features and corresponding hash codes (Zarar, [0017] note the artifact-reduction neural network includes a spatial component for encoding and decoding spatial aspects of the sensor signal, which is discrete from a temporal component for encoding and decoding time-based aspects of the sensor signal, [Fig. 7], [0034] note original time-series sensor data 701 is first processed by convolutional neural-network layers 702, then by encoding long short-term memory units 703 to obtain a representation 704 of the sensor data in latent space); and
executing similarity measurements by an objective function (Zarar, [0036] note to train the parameters of DSTSAE, the facility uses a weighted-variant of the standard VAE loss function… α and (1- α) are weights for the generative loss (or reconstruction quality characterized by the expectation term E[·]) and latent loss (or distribution similarity characterized by the Kullback-Leibler divergence DKL[·]), respectively).
Zarar does not explicitly teach attention based; the objective function being a pairwise loss, the pairwise loss ensuring that similar pairs produce similar hash codes and that dissimilar pairs produce dissimilar hash codes, wherein the pairwise loss is given by:                         
                            p
                            
                                
                                    
                                        
                                            S
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                                
                                    B
                                
                            
                            =
                             
                            σ
                            (
                            
                                
                                    Ω
                                
                                
                                    i
                                    j
                                
                            
                            )
                        
                    , where                        
                             
                            
                                
                                    Ω
                                
                                
                                    i
                                    j
                                
                            
                        
                     is an inner product of hash codes of a query i and a sample j,                         
                            σ
                        
                     is a logistic sigmoid function,                         
                            
                                
                                    S
                                
                                
                                    i
                                    j
                                
                            
                        
                     is a similar pair, and                         
                            B
                        
                     is a parameter to learn, and wherein the pairwise loss is used to encode index information by generating discriminative binary codes; given the query, obtaining a relevant time series segment from the multivariate time series segments retrieved from the plurality of sensors; and generating an output including a visual representation of the relevant time series segment on a user interface.
However, Fokoue-Nkoutche teaches attention based (Fokoue-Nkoutche, [0048] note Long Short Term Memory Recurrent Neural Networks (LSTM RNNs) and graph-based convolutional neural networks are used… two-way attention mechanism (shown as                         
                            
                                
                                    α
                                
                                
                                    p
                                    i
                                
                            
                        
                     and                         
                            
                                
                                    α
                                
                                
                                    d
                                    i
                                
                            
                        
                    ) is used to calculate how the pair interact and thus enable the interpretability. Finally, the attention-based vector representations are used by a classifier, a simple sigmoid function, to make a prediction, [0051] note at each time step t, the LSTM unit takes the t-th input token embedding                         
                            
                                
                                    x
                                
                                
                                    t
                                     
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    M
                                
                            
                             
                        
                    and the cell states from the previous time step                         
                            
                                
                                    
                                        
                                            h
                                        
                                        
                                            (
                                            t
                                            -
                                            1
                                            )
                                        
                                    
                                
                                
                                     
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    H
                                
                            
                        
                    ;                         
                            
                                
                                    c
                                
                                
                                    (
                                    t
                                    -
                                    1
                                    )
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    H
                                
                            
                        
                     and produces a hidden state                         
                            
                                
                                    h
                                
                                
                                    t
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    H
                                
                            
                        
                    , [0051] note hyperbolic tangent functions used in equations (3), (5) and (6)).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the neural network of Zarar with the two-way attention based LTSM RNN of Fokoue-Nkoutche according to known methods (i.e. using a two-way attention mechanism to calculate how pairs of data interact). Motivation for doing so is that neural networks with attention mechanism have been effectively applied to vision tasks such as image captioning and natural language processing tasks such as machine translation, where the output components selectively choose information from the input based on the attention weight (Fokoue-Nkoutche, [0055]).
Zarar and Fokoue-Nkoutche do not explicitly teach the objective function being a pairwise loss, the pairwise loss ensuring that similar pairs produce similar hash codes and that dissimilar pairs produce dissimilar hash codes, wherein the pairwise loss is given by:                         
                            p
                            
                                
                                    
                                        
                                            S
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                                
                                    B
                                
                            
                            =
                             
                            σ
                            (
                            
                                
                                    Ω
                                
                                
                                    i
                                    j
                                
                            
                            )
                        
                    , where                        
                             
                            
                                
                                    Ω
                                
                                
                                    i
                                    j
                                
                            
                        
                     is an inner product of hash codes of a query i and a sample j,                         
                            σ
                        
                     is a logistic sigmoid function,                         
                            
                                
                                    S
                                
                                
                                    i
                                    j
                                
                            
                        
                     is a similar pair, and                         
                            B
                        
                     is a parameter to learn, and wherein the pairwise loss is used to encode index information by generating discriminative binary codes; given the query, obtaining a relevant time series segment from the multivariate time series segments retrieved from the plurality of sensors; and generating an output including a visual representation of the relevant time series segment on a user interface.
However, Hariharan teaches given the query, obtaining a relevant time series segment from the multivariate time series segments retrieved from the plurality of sensors; and generating an output including a visual representation of the relevant time series segment on a user interface (Hariharan, [0026] note time series data can be obtained from many devices… devices can include sensors or other components for capturing data over time to generate time series data associated with or related to the device, [0056] note FIG. 8 is an exemplary user interface 800 for providing results of feature engineering time series or machine data and performing a user custom query… from this graph 805, a user can select a region of interest 810. Based on this selection, a query can be performed for data within or matching the selected region 810. The matching data can be presented along with ERP context information 820 via a matching data graph 815).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the sensor data processing of Zarar and Fokoue-Nkoutche with the interface for viewing and querying sensor data of Hariharan according to known methods (i.e. providing an interface for a user to view and query sensor data). Motivation for doing so is that data can be manipulated and analyzed easily to, for example, cluster the data for anomaly detection (Hariharan, [0033]).
Zarar, Fokoue-Nkoutche and Hariharan do not explicitly teach the objective function being a pairwise loss, the pairwise loss ensuring that similar pairs produce similar hash codes and that dissimilar pairs produce dissimilar hash codes, wherein the pairwise loss is given by:                         
                            p
                            
                                
                                    
                                        
                                            S
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                                
                                    B
                                
                            
                            =
                             
                            σ
                            (
                            
                                
                                    Ω
                                
                                
                                    i
                                    j
                                
                            
                            )
                        
                    , where                        
                             
                            
                                
                                    Ω
                                
                                
                                    i
                                    j
                                
                            
                        
                     is an inner product of hash codes of a query i and a sample j,                         
                            σ
                        
                     is a logistic sigmoid function,                         
                            
                                
                                    S
                                
                                
                                    i
                                    j
                                
                            
                        
                     is a similar pair, and                         
                            B
                        
                     is a parameter to learn, and wherein the pairwise loss is used to encode index information by generating discriminative binary codes
However, Shen teaches this (Shen, [Section 1] note Hashing methods are proposed to map images to compact binary codes that approximately preserve the data structure or semantic affinity in original space, [Section 3.2] note deep hash functions using deep neural networks has more powerful learning capability than hand-crafted features extracted in advance and thus is able to learn feature representations for semantic similarity search, [Section 3.3] note In this work, inner product is utilized to be a good surrogate of the Hamming distance to quantify the pairwise similarity… the pairwise logistic function defined by equation (4); i.e.                         
                            p
                            
                                
                                    
                                        
                                            s
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                                
                                    
                                        
                                            b
                                        
                                        
                                            i
                                            j
                                        
                                    
                                    ,
                                    
                                        
                                            h
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                            =
                             
                            σ
                            
                                
                                    
                                        
                                            Θ
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                        
                    , when                         
                            
                                
                                    s
                                
                                
                                    i
                                    j
                                
                            
                            =
                            1
                        
                    , where                         
                            σ
                            
                                
                                    x
                                
                            
                        
                     is the sigmoid function, [Section 3.3] note a good hashing method should produce binary codes with the properties: (1) independence, i.e., different bits in the binary codes are independent to each other; (2) balance, i.e. each bit hash a 50% chance of being 1 or -1).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the neural network of Zarar, Fokoue-Nkoutche and Hariharan with the deep hash functions of Shen according to known methods (i.e. calculating pairwise logistic function). Motivation for doing so is that deep hash functions using deep neural networks has more powerful learning capability than hand-crafted features extracted in advance and thus is able to learn feature representations for semantic similarity search (Shen, [Section 3.2]).

Claim 16: Zarar, Fokoue-Nkoutche, Hariharan and Shen teach the computer-readable storage medium of claim 15, wherein the input attention-based encoder is given by:                         
                            
                                
                                    e
                                
                                
                                    t
                                
                                
                                    k
                                
                            
                            =
                            
                                
                                    v
                                
                                
                                    e
                                
                                
                                    ⊺
                                
                            
                            
                                
                                    tanh
                                
                                ⁡
                                
                                    
                                        
                                            
                                                
                                                    W
                                                
                                                
                                                    e
                                                
                                            
                                            
                                                
                                                    
                                                        
                                                            h
                                                        
                                                        
                                                            t
                                                            -
                                                            1
                                                        
                                                    
                                                    ;
                                                    
                                                        
                                                            s
                                                        
                                                        
                                                            t
                                                            -
                                                            1
                                                        
                                                    
                                                
                                            
                                            +
                                            
                                                
                                                    U
                                                
                                                
                                                    e
                                                
                                            
                                            
                                                
                                                    x
                                                
                                                
                                                    k
                                                
                                            
                                        
                                    
                                
                            
                        
                    , where                         
                            
                                
                                    h
                                
                                
                                    t
                                    -
                                    1
                                
                            
                        
                     is a previous hidden state,                         
                            
                                
                                    s
                                
                                
                                    t
                                    -
                                    1
                                
                            
                        
                     is a cell state,                         
                            
                                
                                    v
                                
                                
                                    e
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    T
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    e
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    T
                                    x
                                    2
                                    m
                                
                            
                        
                     and                         
                            
                                
                                    U
                                
                                
                                    e
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    T
                                    x
                                    T
                                
                            
                        
                     are parameters to learn, and                         
                            
                                
                                    x
                                
                                
                                    k
                                
                            
                        
                     is a time series of length T, where T is a length of a window size (Fokoue-Nkoutche, [0048] note Long Short Term Memory Recurrent Neural Networks (LSTM RNNs) and graph-based convolutional neural networks are used… two-way attention mechanism (shown as                         
                            
                                
                                    α
                                
                                
                                    p
                                    i
                                
                            
                        
                     and                         
                            
                                
                                    α
                                
                                
                                    d
                                    i
                                
                            
                        
                    ) is used to calculate how the pair interact and thus enable the interpretability. Finally, the attention-based vector representations are used by a classifier, a simple sigmoid function, to make a prediction, [0051] note at each time step t, the LSTM unit takes the t-th input token embedding                         
                            
                                
                                    x
                                
                                
                                    t
                                     
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    M
                                
                            
                             
                        
                    and the cell states from the previous time step                         
                            
                                
                                    
                                        
                                            h
                                        
                                        
                                            (
                                            t
                                            -
                                            1
                                            )
                                        
                                    
                                
                                
                                     
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    H
                                
                            
                        
                    ;                         
                            
                                
                                    c
                                
                                
                                    (
                                    t
                                    -
                                    1
                                    )
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    H
                                
                            
                        
                     and produces a hidden state                         
                            
                                
                                    h
                                
                                
                                    t
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    H
                                
                            
                        
                    , [0051] note hyperbolic tangent functions used in equations (3), (5) and (6)).

Claim 17: Zarar, Fokoue-Nkoutche, Hariharan and Shen teach the computer-readable storage medium of claim 15, wherein the deterministic attention model includes an attention weight given by:                         
                            
                                
                                    α
                                
                                
                                    t
                                
                                
                                    k
                                
                            
                            =
                            
                                
                                    e
                                    x
                                    p
                                    ⁡
                                    (
                                    
                                        
                                            e
                                        
                                        
                                            t
                                        
                                        
                                            k
                                        
                                    
                                    )
                                
                                
                                    
                                        
                                            ∑
                                            
                                                i
                                                =
                                                1
                                            
                                            
                                                n
                                            
                                        
                                        
                                            e
                                            x
                                            p
                                            ⁡
                                            (
                                            
                                                
                                                    e
                                                
                                                
                                                    t
                                                
                                                
                                                    i
                                                
                                            
                                            )
                                        
                                    
                                
                            
                        
                     (Fokoue-Nkoutche, [0060] note Finally,                         
                            
                                
                                    α
                                
                                
                                    p
                                
                            
                        
                     and                         
                            
                                
                                    α
                                
                                
                                    d
                                
                            
                        
                     are exponentially normalized by a softmax function… where the softmax function is defined as:                         
                            
                                
                                    [
                                    s
                                    o
                                    f
                                    t
                                    m
                                    a
                                    x
                                    
                                        
                                            v
                                        
                                    
                                    ]
                                
                                
                                    i
                                
                            
                            =
                            
                                
                                    
                                        
                                            e
                                        
                                        
                                            
                                                
                                                    v
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            ∑
                                            
                                                j
                                            
                                        
                                        
                                            
                                                
                                                    e
                                                
                                                
                                                    
                                                        
                                                            v
                                                        
                                                        
                                                            j
                                                        
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                    ).

Claim 18: Zarar, Fokoue-Nkoutche, Hariharan, Shen and Fokoue-Nkoutche teach computer-readable storage medium of claim 16, wherein the input attention-based encoder to ensure that all attention weights sum to 1 (Fokoue-Nkoutche, [0060] note Finally,                         
                            
                                
                                    α
                                
                                
                                    p
                                
                            
                        
                     and                         
                            
                                
                                    α
                                
                                
                                    d
                                
                            
                        
                     are exponentially normalized by a softmax function… where the softmax function is defined as:                         
                            
                                
                                    [
                                    s
                                    o
                                    f
                                    t
                                    m
                                    a
                                    x
                                    
                                        
                                            v
                                        
                                    
                                    ]
                                
                                
                                    i
                                
                            
                            =
                            
                                
                                    
                                        
                                            e
                                        
                                        
                                            
                                                
                                                    v
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            ∑
                                            
                                                j
                                            
                                        
                                        
                                            
                                                
                                                    e
                                                
                                                
                                                    
                                                        
                                                            v
                                                        
                                                        
                                                            j
                                                        
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                    ; i.e. the examiner interprets the “normalized” function sums to 1).

Claim 20: Zarar, Fokoue-Nkoutche, Hariharan and Shen teach the non-transitory computer-readable storage medium of claim 15, wherein the hash codes are obtained by employing a tanh() function and a sign() function (Shen, [Section 3.2] note we utilize the hyperbolic tangent (tanh) function as the activation function. For the topmost layer, we define the binary code as:                         
                            
                                
                                    b
                                
                                
                                    i
                                
                            
                            =
                            s
                            i
                            g
                            n
                            
                                
                                    
                                        
                                            u
                                        
                                        
                                            i
                                        
                                    
                                
                            
                             
                            
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    h
                                
                                
                                    i
                                
                            
                            =
                            s
                            i
                            g
                            n
                            
                                
                                    
                                        
                                            z
                                        
                                        
                                            j
                                        
                                    
                                
                            
                             
                            
                                
                                    2
                                
                            
                        
                    ).

Claims 5, 12 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Zarar, Fokoue-Nkoutche, Hariharan and Shen in further view of Aizawa et al., US 20190362233 A1 (hereinafter “Aizawa”).

Claim 5: Zarar, Fokoue-Nkoutche, Hariharan and Shen do not explicitly teach the method of claim 1, wherein the triplet loss ensures that a triplet of anchor, positive, and negative, and that a hamming distance between the anchor and the positive is less than a hamming distance between the anchor and negative.
However, Aizawa teaches this (Aizawa, [0026] note the neural network is trained using triplets that each include an anchor image of an item, a positive image of a similar item, and a negative image of a dissimilar item, [0076] note to generate similarity rankings, the similarity ranking engine may follow this process: for each n data points, it calculates (e.g., using a computer processor, graphics processor, or cloud-based processor) the distance between the image and all n-1 images… the similarity engine may compute this distance using Euclidean distance, cosine similarity, Manhattan distance, Hamming distance, or any other suitable distance metric).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the neural network of Zara, Fokoue-Nkoutche, Hariharan and Shen with the neural network optimized using a metric learning objective function (e.g., triplet loss, n-pair loss, or another suitable metric learning objective) of Aizawa according to known methods (i.e. optimizing the neural network using a metric learning objective function such as triplet loss). Motivation for doing so is that training a neural network with triplets of similar objects instead of triplets of identical objects relaxes constraints on the size and content of the data set, making training easier (Aizawa, [Abstract]).

Claim 12: Zarar, Fokoue-Nkoutche, Hariharan and Aizawa teach the system of claim 8, wherein the triplet loss ensures that a triplet of anchor, positive, and negative, and that a hamming distance between the anchor and the positive is less than a hamming distance between the anchor and negative (Aizawa, [0026] note the neural network is trained using triplets that each include an anchor image of an item, a positive image of a similar item, and a negative image of a dissimilar item, [0076] note to generate similarity rankings, the similarity ranking engine may follow this process: for each n data points, it calculates (e.g., using a computer processor, graphics processor, or cloud-based processor) the distance between the image and all n-1 images… the similarity engine may compute this distance using Euclidean distance, cosine similarity, Manhattan distance, Hamming distance, or any other suitable distance metric).

Claim 19: Zarar, Fokoue-Nkoutche, Hariharan and Shen do not explicitly teach the non-transitory computer-readable storage medium of claim 15, wherein the triplet loss ensures that a triplet of anchor, positive, and negative, and that a hamming distance between the anchor and the positive is less than a hamming distance between the anchor and negative.
However, Aizawa teaches this (Aizawa, [0026] note the neural network is trained using triplets that each include an anchor image of an item, a positive image of a similar item, and a negative image of a dissimilar item, [0076] note to generate similarity rankings, the similarity ranking engine may follow this process: for each n data points, it calculates (e.g., using a computer processor, graphics processor, or cloud-based processor) the distance between the image and all n-1 images… the similarity engine may compute this distance using Euclidean distance, cosine similarity, Manhattan distance, Hamming distance, or any other suitable distance metric).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the neural network of Zara, Fokoue-Nkoutche, Hariharan and Shen with the neural network optimized using a metric learning objective function (e.g., triplet loss, n-pair loss, or another suitable metric learning objective) of Aizawa according to known methods (i.e. optimizing the neural network using a metric learning objective function such as triplet loss). Motivation for doing so is that training a neural network with triplets of similar objects instead of triplets of identical objects relaxes constraints on the size and content of the data set, making training easier (Aizawa, [Abstract]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Kaiser et al., US 20170192956 A1 – LSTM neural network including an attention based decoder.
Xu et al., “Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval” –discrete cross-modal hashing (DCH), which directly learns discriminative binary codes while retaining the discrete constraints.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Giuseppi Giuliani whose telephone number is (571)270-7128. The examiner can normally be reached Monday-Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aleksandr Kerzhner can be reached on (571)270-1760. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/GIUSEPPI GIULIANI/Primary Examiner, Art Unit 2165