DETAILED ACTION
This is the first office action regarding application number 16/233,779, filed December 27, 2018.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(4) because of the following:
Figure 1: reference character “108” has been used to designate both Service Provider Device Client Application 108 and Requester Device Client Application 108.
Figure 7: reference character “724” has been used to designate Processor Instructions 724, Main Memory Instructions 724, and Storage Unit Machine-Storage Medium Instructions 724.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant 

Specification
The disclosure is objected to because of the following informalities:
a. 	Cover page, missing inventor’s name: Theodore Russell Sumers. Appropriate correction is required.
b. 	Paragraphs [0026], [0027]: referring to the corresponding objection listed in the Drawings section, the specification needs to distinguish between the Service Provider Device Client Application 108 and Requester Device Client Application 108. Appropriate correction is required.
c.	Paragraphs [0070]-[0073], [0077], [0081]: referring to the corresponding objection listed in the above Drawings section, the specification needs to distinguish between the Processor Instructions 724, Memory Instructions 724, and Storage Unit Machine-Storage Medium Instructions 724. Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claim 6 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Regarding Claim 6, the claim recites “The system of claim 1, wherein the event comprises co-presence of a driver and rider, fraud, dangerous driving, detection of an accident, phone handling issue, or a trip state.” Paragraph [0015] states: “During runtime, the trained models are used to detect events based on sensor data received from one or more user devices. The detected events can include, for example, co- presence of a driver and a rider, fraud, dangerous driving, an accident, phone handling issues, or a trip state.”, but the specification does not describe what constitutes a “phone handling issue”. Paragraph [0049] further states that “noisy inputs” from handling a phone is considered a weak label, but it does not describe what is considered a “noisy input from phone handling” (“In one embodiment, the triplets are assembled, by the assembly module 304, to train the embedding using a weak label. In one example, the weak label is co-presence of the driver and rider based on driver and rider sensor data. Other weak labels can include, for example, noisy inputs from phone handling or mounting classifier and activities (e.g., walking, driving, idling a vehicle).”). Paragraph [0058] further states that a driver touching their smartphone may be a “phone handling issue” that warrants an event detection, and that heuristics can be used to generate an embedding representation for a “phone handling state” event, but the specification fails to disclose any heuristics for determining a “phone handling issue”, and does not provide any further description to support the terms “phone handling issue” and “phone handling state” (“In yet another example, phone handling (e.g., by a driver) can be an issue. Using heuristics or other classifiers as weak labels, the networked system 102 can generate a best possible representation of a sensor embedding for a "phone handling state."). The specification must describe and support the claims such that the public is informed of the boundaries of what constitutes infringement of the patent, as well as determining whether the claimed invention meets all the criteria for patentability by distinctly claiming the subject matter which the inventor regards as the invention. See MPEP 2163. Given that there is no support of this limitation present in the specification, this claim limitation fails to comply with the written description requirement. For the purposes of examination, this claim limitation “phone handling issue” will not be given any patentable weight in terms of searching for prior art.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  
Claims 1, 8, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Clayton et al., U.S. PGPUB 2017/0206464, filed 1/14/2016, published 7/20/2017 [hereafter referred as Clayton] in view of Higgins et al., β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework, April 24 2017, published as a conference paper at ICLR 2017 [hereafter referred as Higgins].
Regarding Claim 1, Clayton teaches
A system comprising: 
one or more hardware processors ([Clayton paragraph [0099]: a system with one or more processors, with RAM, ROM, flash memory, magnetic/optical disks, other storage media containing a series of executable computer instructions for the methods and procedures (“It will be appreciated that all of the disclosed methods and procedures described herein can be implemented using one or more computer programs, modules, or components. These modules or components may be provided as a series of computer instructions on any conventional computer readable medium or machine readable medium, including volatile or non-volatile memory, such as RAM, ROM, flash memory, magnetic or optical disks, optical memory, or other storage media. The instructions may be provided as software or firmware, and/or may be implemented in whole or in part in hardware components such as ASICs, FPGAs, DSPs or any other similar devices. The instructions may be configured to be executed by one or more processors, which when executing the series of computer instructions, performs or facilitates the performance of all or part of the disclosed methods and procedures.”).]); and 
a memory storing instructions that, when executed by the one or more hardware processors, causes the one or more hardware processors to perform operations ([Clayton paragraph [0099]: a system with one or more processors, with RAM, ROM, flash memory, magnetic/optical disks, other storage (“memory”) media containing a series of executable computer instructions for the methods and procedures (“storing instructions”, “when executed by the one or more hardware processors, causes the one or more hardware processors to perform operations”) (“It will be appreciated that all of the disclosed methods and procedures described herein can be implemented using one or more computer programs, modules, or components. These modules or components may be provided as a series of computer instructions on any conventional computer readable medium or machine readable medium, including volatile or non-volatile memory, such as RAM, ROM, flash memory, magnetic or optical disks, optical memory, or other storage media. The instructions may be provided as software or firmware, and/or may be implemented in whole or in part in hardware components such as ASICs, FPGAs, DSPs or any other similar devices. The instructions may be configured to be executed by one or more processors, which when executing the series of computer instructions, performs or facilitates the performance of all or part of the disclosed methods and procedures.”).])
comprising: 
accessing sensor data from a plurality of user devices ([Clayton Figure 1, element 104; paragraph [0026]: a data collection device in an edge device gathering sensor data (“accessing sensor data”) (“A data collection device 104 may be a sensor, detector, or any device suitable for real time collection of data representative of real world characteristics (e.g., distances, ultrasound levels, speed, acceleration, items in a shopping cart, hand movements, shapes, temperature, angles, voice recognition, word recognition, torque, slip levels). The data collection device 104 may receive a continuous data stream or collect data on a periodic basis (e.g., every millisecond, second, minute), which may generally depend on the type of data being collected and the variability of the data stream. A time series of data type X may be referred to herein as x, where x=<xi, x2 , x3 , ... x,>.”).] [Clayton Figure 1, element 100; paragraph [0024]: edge devices include personal computing devices (“user devices”) (“Edge devices 100 may include a wide variety of devices including … personal computing devices (e.g., mobile phone, tablet, laptop), …”).] [Clayton Figure 2, elements 104a, 104b, 104c; paragraph [0032]: a plurality of data collection devices providing time series data to a time series adaptation module containing a variational inference machine and a sequential data forecast machine pair (VIM-SDFM) (“accessing sensor data from a plurality of user devices”) (“FIG. 2 is a high-level flow diagram illustrating time series data adaptation, according to an example embodiment of the present disclosure. A plurality of data collection devices 104a, 104b, 104c provide time series data to the time series data adaptation module 110. … The time series data adaptation module 110 includes a variational inference machine 202 and a sequential data forecast machine 204. … For example, the time series data adaptation module 110 may receive as inputs single-modal time series data, multi-modal time series data, single-modal distributed representations, and/or multi-modal distributed representations.”).]); 
assembling one or more triplets using the sensor data, the assembling including applying a weak label ([Clayton Figure 9, elements 902, 904, 906; Figure 10, element                         
                            
                                
                                    z
                                    1
                                
                                
                                    t
                                
                            
                        
                    ,                         
                            
                                
                                    z
                                    2
                                
                                
                                    t
                                
                            
                        
                    ,                         
                            
                                
                                    z
                                    3
                                
                                
                                    t
                                
                            
                        
                    ,                         
                            
                                
                                    Z
                                
                                
                                    t
                                
                            
                        
                    ; paragraphs [0057]-[0058]: a sensor fusion machine containing a multi-modal variational inference machine (MMVIM) and multi-modal sequential data forecast machine (MMSDFM) receiving latent distributions (generated from autoencoding) from a series of VIM-SDFM pairs that receive sensor data, producing final latent distribution                         
                            
                                
                                    Z
                                
                                
                                    t
                                
                            
                        
                    , generated by a triplet of variational inference machine-sequential data forecast machine pairs 902, 904, 906, of which each preprocess sensor data from different sources and perform autoencoding to produce the intermediate latent distributions                         
                            
                                
                                    z
                                    1
                                
                                
                                    t
                                
                            
                        
                    ,                         
                            
                                
                                    z
                                    2
                                
                                
                                    t
                                
                            
                        
                    ,                         
                            
                                
                                    z
                                    3
                                
                                
                                    t
                                
                            
                        
                     (“assembling one or more triplets using the sensor data”) (“FIG. 10 is a flow diagram illustrating time series data adaptation including sensor fusion, according to an example embodiment of the present disclosure. In this example embodiment, the sensor fusion machine 908 includes a multi-modal variational inference machine (MMVIM) 910 and a multi-modal recurrent neural network (MMRNN) 912, both of which receive multiple modalities of data as inputs (i.e., data from multiple different domains), to provide a multi-modal sensor fusion output. … The sensor fusion machine 908 may receive the time dependency infused latent distributions                         
                            
                                
                                    z
                                    1
                                
                                
                                    t
                                
                            
                        
                    ,                         
                            
                                
                                    z
                                    2
                                
                                
                                    t
                                
                            
                        
                    , and                         
                            
                                
                                    z
                                    3
                                
                                
                                    t
                                
                            
                        
                    , for example, from the VIM-SDFM pairs 902, 904, 906. … The output layer of the MMVIM 910 outputs a multi-modal time dependency infused latent distribution                         
                            
                                
                                    Z
                                
                                
                                    t
                                
                            
                        
                    .”).] [Clayton Figure 9, elements 110, 902, 904, 906, 910, 912; paragraphs [0055]-[0057]: a plurality of VIM-SDFM pairs measuring and detecting independent but related real-world sensor properties that have shared impacts, with the related real-world sensor properties with shared impacts (e.g., speed of vehicle, distances related to other objects) interpreted as representing a “weak label” (“the assembling including applying a weak label”) (“FIG. 9 illustrates a high-level block diagram of a time series data adaptation module 110, which is configured to perform sensor fusion, according to an example embodiment of the present disclosure. A plurality of sequential data forecast machine pairs (VIM-SDFM pairs) 902, 904, 906, each respectively include a variational inference machine 202 and a sequential data forecast machine 204. … The sensor fusion machine 908 includes a multi-modal variational inference machine 910 and a multi-modal sequential data forecast machine 912. The sensor fusion machine 908 fuses data of different data types to form more robust, information-rich data types that are better suited for producing reliable, accurate, precise and/or quick recognition results in machine learning analysis or the like. … A system that uses multi-modal sensor fusion includes a plurality of different sensors measuring and/or detecting different real world properties, for example, speed of moving vehicle, distances to other objects, road grade, road cant, tire pressure, brake pad wear, rain detection, etc. All of these factors may have an impact on a real world result based on a physical actuation, such as stopping a car, but each modality may also be different and independent of one another.”).]); 
autoencoding the one or more triplets … to generate a … embedding ([Clayton Figure 3; paragraph [0034]: a variational inference machine taking an input X and performing encoding (“autoencoding”) to output a latent distribution Z (“embedding”) based on input X  (“FIG. 3 is high-level block diagram of a variational inference machine 202, according to an example embodiment of the present disclosure. The variational inference machine 202 includes an input layer 302, a plurality of hidden layers 304, and an output layer 306. In an example embodiment, the input layer 302 has 8 nodes and receives the input variable x, which is an 8-tuple, where each element of the 8-tuple represents an attribute. … The hidden layers 304 may be structured to encode the data input into the input layer 302 as a latent distribution z, which is output from the output layer 306. A latent distribution includes one or more distributions, which provide a probabilistic distributed representation of the input variable of the variational inference machine. A probabilistic distributed representation is a set of latent variables defined by probability distributions as opposed to latent variables defined by discrete values. As illustrated in FIG. 3, the output layer has 6 nodes, and the latent distribution z includes a set of three distributions, where each distribution may be defined by a mean                         
                            
                                
                                    μ
                                
                                
                                    z
                                
                            
                        
                    (x) and a standard deviation                         
                            
                                
                                    σ
                                
                                
                                    z
                                
                            
                        
                    (x) ( e.g., for Gaussian distributions).”).] [Clayton Figure 9, elements 902, 904, 906; Figure 10, element                         
                            
                                
                                    z
                                    1
                                
                                
                                    t
                                
                            
                        
                    ,                         
                            
                                
                                    z
                                    2
                                
                                
                                    t
                                
                            
                        
                    ,                         
                            
                                
                                    z
                                    3
                                
                                
                                    t
                                
                            
                        
                    ,                         
                            
                                
                                    Z
                                
                                
                                    t
                                
                            
                        
                    ; paragraphs [0057]-[0058]: a sensor fusion machine containing a multi-modal variational inference machine (MMVIM) and multi-modal sequential data forecast machine (MMSDFM) receiving latent distributions (generated from autoencoding) from a series of VIM-SDFM pairs that receive sensor data, producing final latent distribution                         
                            
                                
                                    Z
                                
                                
                                    t
                                
                            
                        
                     (“autoencoding the one or more triplets … to generate a …  embedding”), generated by a triplet of variational inference machine-sequential data forecast machine pairs 902, 904, 906, of                         
                            
                                
                                    z
                                    1
                                
                                
                                    t
                                
                            
                        
                    ,                         
                            
                                
                                    z
                                    2
                                
                                
                                    t
                                
                            
                        
                    ,                         
                            
                                
                                    z
                                    3
                                
                                
                                    t
                                
                            
                        
                      (“FIG. 10 is a flow diagram illustrating time series data adaptation including sensor fusion, according to an example embodiment of the present disclosure. In this example embodiment, the sensor fusion machine 908 includes a multi-modal variational inference machine (MMVIM) 910 and a multi-modal recurrent neural network (MMRNN) 912, both of which receive multiple modalities of data as inputs (i.e., data from multiple different domains), to provide a multi-modal sensor fusion output. … The sensor fusion machine 908 may receive the time dependency infused latent distributions                         
                            
                                
                                    z
                                    1
                                
                                
                                    t
                                
                            
                        
                    ,                         
                            
                                
                                    z
                                    2
                                
                                
                                    t
                                
                            
                        
                    , and                         
                            
                                
                                    z
                                    3
                                
                                
                                    t
                                
                            
                        
                    , for example, from the VIM-SDFM pairs 902, 904, 906. … The output layer of the MMVIM 910 outputs a multi-modal time dependency infused latent distribution                         
                            
                                
                                    Z
                                
                                
                                    t
                                
                            
                        
                    .”).]); and 
training an inference model … the inference model being used at runtime to detect whether an event associated with the inference model is present ([Clayton paragraph [0028]: a machine learning model is trained based on the distributed latent representation from different sensors (“Machine learning models may separately use one of the different time series as its input, and/or a machine learning model may use more than one of the different time series as its input, which is a multi-modal input of time series data. A machine learning model is trained for a predefined type of input, using one or more particular types of data, which may include time series data from one or more sensors and/or adapted data that is representative of time series data from one or more sensors. The predefined type of input for the machine learning model may be a time series from a sensor, multiple time series from different sensors, a distributed representation of a time series from a sensor, or a distributed representation of multiple time series from different sensors. A distributed representation is a version of time series data that typically has reduced dimensionality but generally preserves the most important information, and in some cases may be nearly lossless.”).] [Clayton Figure 2, elements 108, 206; Figure 10; paragraph [0058]: output                         
                            
                                
                                    Z
                                
                                
                                    t
                                
                            
                        
                     from the multi-modal variational inference machine directed to a machine learning model (MLM) for training (“training an inference model”) (“The multi-modal time dependency infused latent distribution                         
                            
                                
                                    Z
                                
                                
                                    t
                                
                            
                        
                    , may be provided as an input to a machine learning model 206 ( e.g., a collision avoidance model).”).] [Figure 1, element 108; Clayton paragraph [0027]: machine learning module using the machine learning model to output a forecast, prediction, or anomaly detection (“the inference model being used at runtime to detect whether an event is associated with the inference model is present”) (“A machine learning module 108 may execute a machine learning model using time series data collected by one or more data collection devices 104 and stored in memory 106. The machine learning module 108 may receive collected time series data as inputs and/or may receive adapted data that is representative of collected time series data (e.g., sensor fusion data) as inputs, which may also be stored in memory 106. The machine learning module 108 executes the machine learning model using the collected and/or adapted data to make a forecast, a prediction, a classification, a clustering, an anomaly detection, and/or a recognition, which is then output as a result.”).]).  
However, Clayton does not teach
autoencoding … based on a covariate to generate a disentangled embedding … 
training … using the disentangled embedding …  
Higgins teaches
autoencoding … based on a covariate to generate a disentangled embedding ([Higgins p.6 Figure 5, element y=scale; p.6 5th paragraph – p.7 6th paragraph, Section 3 Disentanglement Metric: referring to the algorithm listed in Steps 1-3 on pp.6-7 of Higgins, given a dataset D={X,V, W} (“one or more triplets”), set a target generative factor k = y as fixed, with “y=scale” being a known fact shown in Figure 5 (“based on a covariate”); sample latent distributions                         
                            
                                
                                    v
                                
                                
                                    1
                                    ,
                                    l
                                
                            
                        
                      and                         
                            
                                
                                    v
                                
                                
                                    2,1
                                
                            
                        
                     (keeping k = y); infer                         
                            
                                
                                    z
                                
                                
                                    1
                                    ,
                                    l
                                
                            
                        
                     = 𝛍(                        
                            
                                
                                    x
                                
                                
                                    1
                                    ,
                                    l
                                
                            
                        
                    ) using the encoder q(z|x) ~ N(𝛍(x),                         
                            σ
                        
                    (x)) (“autoencoding … to generate a disentangled embedding”); compute pairwise                         
                            
                                
                                    z
                                
                                
                                    d
                                    i
                                    f
                                    f
                                
                                
                                    l
                                
                            
                            =
                            
                                
                                    |
                                     
                                    z
                                
                                
                                    1
                                    ,
                                    l
                                
                            
                        
                    -                         
                            
                                
                                    z
                                
                                
                                    2
                                    ,
                                    l
                                
                            
                        
                    |; and input the average pairwise distance                         
                            
                                
                                    z
                                
                                
                                    d
                                    i
                                    f
                                    f
                                
                                
                                    b
                                
                            
                        
                     into a classifier to predict P(y|                        
                            
                                
                                    z
                                
                                
                                    d
                                    i
                                    f
                                    f
                                
                                
                                    b
                                
                            
                        
                    ) (“Figure 5: Schematic of the proposed distanglement metric: over a batch of L samples, each pair of images has a fixed value for one target generative factory y (here y = scale) and differs on all others. A linear classifier is then trained to identify the target factor using the average pairwise difference                         
                            
                                
                                    z
                                
                                
                                    d
                                    i
                                    f
                                    f
                                
                                
                                    b
                                
                            
                        
                     in the latent space over L samples.”).]) …
training … using the disentangled embedding ([Higgins p.6 Figure 5, element y=scale; p.6 5th paragraph – p.7 6th paragraph, Section 3 Disentanglement Metric: referring to the algorithm listed in Steps 1-3 on pp.6-7 of Higgins, given a dataset D={X,V, W}, set a target generative factor k = y as fixed, with “y=scale” being a known fact shown in Figure 5; sample latent distributions                         
                            
                                
                                    v
                                
                                
                                    1
                                    ,
                                    l
                                
                            
                        
                      and                         
                            
                                
                                    v
                                
                                
                                    2,1
                                
                            
                        
                     (keeping k = y); infer                         
                            
                                
                                    z
                                
                                
                                    1
                                    ,
                                    l
                                
                            
                        
                     = 𝛍(                        
                            
                                
                                    x
                                
                                
                                    1
                                    ,
                                    l
                                
                            
                        
                    ) using the encoder q(z|x) ~ N(𝛍(x),                         
                            σ
                        
                    (x)); compute pairwise distances                         
                            
                                
                                    z
                                
                                
                                    d
                                    i
                                    f
                                    f
                                
                                
                                    l
                                
                            
                            =
                            
                                
                                    |
                                     
                                    z
                                
                                
                                    1
                                    ,
                                    l
                                
                            
                        
                    -                         
                            
                                
                                    z
                                
                                
                                    2
                                    ,
                                    l
                                
                            
                        
                    |; and input the average pairwise distance                         
                            
                                
                                    z
                                
                                
                                    d
                                    i
                                    f
                                    f
                                
                                
                                    b
                                
                            
                        
                     into a classifier to predict P(y|                        
                            
                                
                                    z
                                
                                
                                    d
                                    i
                                    f
                                    f
                                
                                
                                    b
                                
                            
                        
                    ) (“training … using the disentangled embedding) (“Figure 5: Schematic of the proposed distanglement metric: over a batch of L samples, each pair of images has a fixed value for one target generative factory y (here y = scale) and differs on all others. A linear classifier is then trained to identify the target factor using the average pairwise difference                         
                            
                                
                                    z
                                
                                
                                    d
                                    i
                                    f
                                    f
                                
                                
                                    b
                                
                            
                        
                     in the latent space over L samples.”).]) …  
Both Clayton and Higgins are analogous art as both teach using autoencoders to generate latent distributions/embeddings.
It would have been obvious to a person having ordinary skill in the art before the effective filing date to take autoencoder of Clayton and enhance it with the autoencoder of Higgins in order to generate a disentangled embedding for training an inference model to perform predictions. The motivation to combine is taught in Higgins, as learning disentangled representations enhances performance in artificial intelligence based systems performing in ([Higgins p.1 Section 1. Introduction: “According to Lake et al. (2016), disentangled representations could boost the performance of state-of-the-art AI approaches in situations where they still struggle but where humans excel. Such scenarios include those which require knowledge transfer, where faster learning is achieved by reusing learnt representations for numerous tasks; zero-shot inference, where reasoning about new data is enabled by recombining previously learnt factors; or novelty detection.”]).
Regarding Claim 8, 
A method comprising: 
accessing, by a networked system, sensor data from a plurality of user devices (This claim limitation is similar in scope to the corresponding claim limitation in Claim 1, and hence is rejected under similar rationale); 
assembling, by a processor of the networked system, one or more triplets using the sensor data, the assembling including applying a weak label (This claim limitation is similar in scope to the corresponding claim limitation in Claim 1, and hence is rejected under similar rationale); 
autoencoding the one or more triplets based on a covariate to generate a disentangled embedding (This claim limitation is similar in scope to the corresponding claim limitation in Claim 1, and hence is rejected under similar rationale); and 
training an inference model using the disentangled embedding, the inference model being used at runtime to detect whether an event associated with the inference model is present (This claim limitation is similar in scope to the corresponding claim limitation in Claim 1, and hence is rejected under similar rationale).  
Regarding Claim 15, 
A machine-storage medium storing instructions that when executed by one or more hardware processors of a machine, cause the machine to perform operations comprising: 
accessing sensor data from a plurality of user devices (This claim limitation is similar in scope to the corresponding claim limitation in Claim 1, and hence is rejected under similar rationale); 
assembling one or more triplets using the sensor data, the assembling including applying a weak label (This claim limitation is similar in scope to the corresponding claim limitation in Claim 1, and hence is rejected under similar rationale); 
autoencoding the one or more triplets based on a covariate to generate a disentangled embedding (This claim limitation is similar in scope to the corresponding claim limitation in Claim 1, and hence is rejected under similar rationale); and 
training an inference model using the disentangled embedding, the inference model being used at runtime to detect whether an event associated with the inference model is present (This claim limitation is similar in scope to the corresponding claim limitation in Claim 1, and hence is rejected under similar rationale).  
Claims 2-3, 9-10, and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Clayton et al., U.S. PGPUB 2017/0206464, filed 1/14/2016, published 7/20/2017 [hereafter referred as Clayton] in view of Higgins et al., β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework, April 24 2017, published as a conference paper at ICLR 2017 [hereafter referred as Higgins] as applied to Claim 1, Claim 8, and Claim 15; in further view of Arditi, Gil, U.S. PGPUB 2019/0197430, Personalized Ride Experience Based on Real-Time Signals, filed 12/21/2017 [hereafter referred as Arditi]; and in even further view of Rajkumar, Nareshkumar, U.S. PGPUB 2020/0311616, Evaluating Robot Learning, filed 12/27/2017 [hereafter referred as Rajkumar].
Regarding Claim 2, Clayton in view of Higgins as applied to Claim 1 teaches
The system of claim 1, 
wherein the operations further comprise, during runtime: 
autoencoding runtime sensor data from the real world to generate a runtime embedding ([Clayton Figure 3, paragraph [0034]: a variational inference machine taking an input X and performing encoding (“autoencoding”) to output a latent distribution Z (“embedding”) based on input X (“FIG. 3 is high-level block diagram of a variational inference machine 202, according to an example embodiment of the present disclosure. The variational inference machine 202 includes an input layer 302, a plurality of hidden layers 304, and an output layer 306. In an example embodiment, the input layer 302 has 8 nodes and receives the input variable x, which is an 8-tuple, where each element of the 8-tuple represents an attribute. … The hidden layers 304 may be structured to encode the data input into the input layer 302 as a latent distribution z, which is output from the output layer 306. A latent distribution includes one or more distributions, which provide a probabilistic distributed representation of the input variable of the variational inference machine. A probabilistic distributed representation is a set of latent variables defined by probability distributions as opposed to latent variables defined by discrete values. As illustrated in FIG. 3, the output layer has 6 nodes, and the latent distribution z includes a set of three distributions, where each distribution may be defined by a mean                         
                            
                                
                                    μ
                                
                                
                                    z
                                
                            
                        
                    (x) and a standard deviation                         
                            
                                
                                    σ
                                
                                
                                    z
                                
                            
                        
                    (x) ( e.g., for Gaussian distributions).”).] [Clayton paragraph [0043]: variational inference machine functioning in a time critical environment (“autoencoding runtime sensor data from the real world to generate a runtime embedding”) (“In an example embodiment, immediately upon being output from the variational inference machine 202, the time dependency infused latent distribution z, is provided directly to a machine learning model 206. In time critical real-time systems, it may be important to provide the time dependency infused latent distribution z, that is representative of the time series data x, as quickly as possible. In another example embodiment, the time dependency infused latent distribution z, is further adapted or converted in form before being input into a machine learning model 206.”).]) …  
However, Clayton in view of Higgins does not teach
… the runtime sensor data comprising sensor data from at least one of a device of a driver or a device of a rider; 
comparing the runtime embedding to one or more embeddings of the inference model, a similarity in the comparing indicating the event associated with the inference model occurring in the real world; and 
outputting a result of the comparing.  
Arditi teaches
… the runtime sensor data comprising sensor data from at least one of a device of a driver or a device of a rider ([Arditi paragraph [0025]: a system receiving data from an application installed on mobile computing devices, with such data being GPS data with such data being GPS data on each requestor computing device (“the runtime sensor data comprising sensor data from at least one of … a device of a rider”) (“In particular embodiments, the transportation management system 130 may, in response to a ride request, identify available providers that are registered with the transportation management system 130 through an application installed on each of their respective mobile computing devices 150 or through an associated transportation management vehicle device 160. For example, the transportation management system 130 may locate candidate ride providers 180 who are available ( e.g., based on a status indicator provided through each ride provider's 180 computing device 150) and in the general vicinity of the requested pick-up location (e.g., based on GPS data provided by the provider computing device 150 and the requestor computing device 120).”).]); 
… the comparing indicating the event ([Arditi Figure 11, elements 1119, 1140, 1130; paragraphs [0080]-[0081]: a system performing a comparison between predicted and expected the comparing indicating the event”) if it is above a certain threshold or criteria, with the detected event/result being a vehicle-related emergency, which is interpreted as a detection of an accident (“The prediction 1130 may be compared 1140 with the event-type label 1119 of the training data sample 1110 using, for example, a loss function that measures/ quantifies the difference. Based on the comparison 1140 results, the training algorithm may adjust the model's 1120 parameters/configurations (e.g., weights) accordingly to minimize the differences between the generated predictions 1130 and the corresponding labels 1119. … In particular embodiments, appropriate weights that were learned during the training process may be applied to the features. At step 1080, the machine-learning model, based on the features of the received data, may generate a score representing a likelihood or confidence that the received data is associated with a particular event type (e.g., improper behavior, confrontation, health emergency, etc.).”).]) …
outputting a result of the comparing ([Arditi Figure 10, elements 1090, 1095; paragraph [0082]: a system performing a comparison between predicted and expected labels, and triggering an alert to a user device within the vehicle (“outputting a result of the comparing”), with the detected event/result being a vehicle-related emergency (interpreted as a detection of an accident) (“At 1090, the transportation management system may determine whether the score is sufficiently high relative to a threshold or criteria to warrant certain action. … On the other hand, if the score is sufficiently high, then at step 1095 the system may generate an appropriate alert and/or determine an appropriate action/response. In particular embodiments, the system may send alerts to appropriate recipients based on the detected event types. … An alert may additionally or alternatively be sent to one of the devices within the vehicle in which the event is occurring. For example, the alert may warn the passengers that events in the vehicle are being recorded, inform the passengers that an appropriate third-party has been contacted ( e.g., police, ambulance, etc.), or provide guidance on what to do ( e.g., remain calm in the vehicle, exit the vehicle, etc.).”).]).  
Both Clayton in view of Higgins and Arditi are analogous art as both teach sensor devices collecting real-time sensor data to perform event detection.
It would have been obvious to a person having ordinary skill in the art before the effective filing date to take the sensor devices of Clayton in view of Higgins and incorporate them into the user devices of Arditi in order to perform event detection. The motivation to combine is taught in Arditi, as processing run-time sensor data locally on user devices will avoid transmission of large quantities of data, and will help better detect, validate, and react to events in real-time, with minimal delay, thus improving the ride experience for riders ([Arditi paragraph [0022]: “Current information about the particular ride requestor may be used to improve the overall ride experience for the ride requestor and/or provide additional services. … As yet another example, interior sensor data may be used to detect and respond to emergencies appropriately ( e.g., urgent health conditions, etc.). The various embodiments described herein, therefore, enable vehicles dispatched for transporting ride requestors to be dynamically configured to suit the preferences of the ride requestors and provide additional services such as emergency detection. Such enhancements enable the dispatched vehicles to provide ride requestors with improved ride experience and safety.”] [Arditi paragraph [0084]: “…while the steps in FIG. 10 may be performed by the transportation management system, any combination of those steps may be performed by any other computing system, including, e.g., the ride requestor's computing device, the ride provider's computing device, the transportation management vehicle device, and/or the vehicle. For instance, from step 1060 to 1090, the ride provider's mobile device, the transportation management vehicle device, or the vehicle may process the sensor data to detect any event of interest. In particular embodiments where machine-learning models are used for making such determination, the transportation management system may transmit a trained machine-learning model to the computing system in the vehicle to allow event-detection to be made locally. This may be desirable since sensor data may be overly large to transmit to the remote transportation management system in a timely fashion. If the machine-learning model takes as input other non-local data (e.g., the requestor's profile data, ride-sharing data, etc.), such information may be made available to the computing device executing the machine-learning model ( e.g., the computing device may obtain the data from the transportation management system, the ride provider's device, or the ride requestor's device).”]).
Furthermore, Clayton in view of Higgins, in further view of Arditi does not teach
… comparing the runtime embedding to one or more embeddings of the inference model, a similarity … associated with the inference model occurring in the real world …
Rajkumar teaches
comparing the runtime embedding to one or more embeddings of the inference model, a similarity … associated with the inference model occurring in the real world ([Rajkumar Figure 9, elements 504, 914, 920, 918; paragraph [0208]: training a machine learning model from a training data set to generate embeddings based on feature data representing a physical object (“occurring in the real world”), and storing its learned embeddings into a database along with a classification label/prediction output (“In some implementations, the server system 112 provides feature data from each of the datasets 406 to the machine learning model 506. For example, the server system 112 provides the feature data 914 of the "glasses" object to the machine learning model 506. As illustrated in the training module 902, a layer of the machine learning model 506 can produce an output that includes an embedding 920 and a corresponding classification label 922. … The server system 112 stores the feature data 914, the produced embedding 920, and the classification label 918 denoting "glasses" in the database 114.”).] [Rajkumar Figure 9, elements 506, 928, 932; paragraph [0213]: a comparison module performing comparisons between a generated embedding produced by a machine learning model in the evaluation/testing phase (“runtime embedding”) and embeddings stored in a database (“comparing the runtime embedding to one or more embeddings of the inference model”), with a match indicating that the machine learning model has properly recognized/detected the physical object matching the embedding (“a similarity … associated with the inference model occurring in the real world”) (“In some implementations, in order to evaluate the machine learning model 506, the evaluation model 904 performs tests under comparison module 932 to compare the embedding 928 produced by the machine learning model 506 to one or more embeddings of a similar object type stored in the database 114. If the results match, the evaluation model 904 may determine the machine learning model 506 can properly recognize the particular object and proceed to test the next object from the database 114.”).]) …
Both Clayton in view of Higgins, in further view of Arditi and Rajkumar are analogous art as both teach the generation of run-time sensor data embeddings for a machine-learning model to perform predictions.
It would have been obvious to a person having ordinary skill in the art before the effective filing date to take the sensor data embeddings of Clayton in view of Higgins, in further view of Arditi and apply the embedding comparison module of Rajkumar in order to perform a comparison between run-time embeddings and training embeddings. The motivation to combine is taught in Rajkumar, as embeddings are compressed representation of learnings for a machine-learning model, which can be shared across other similar systems (e.g., in the case of Rajkumar, across a robot fleet using the same machine-learning model), thus making the learnings from one system portable to similar systems, and making learning more efficient and code deployment faster across systems ([Rajkumar paragraph [0009]: “… the embedding may represent information derived from an output layer of a neural network model or from a hidden layer of the neural network model. When a set of input information, such as sensor data describing an object, is provided to the machine learning model, the processing of the machine learning model may encode the information in a form that is used directly as an embedding, or is further processed to generate the embedding. As a result, the embedding may be a compressed representation of an object or other observation, where the specific values of the embedding may depend on the structure and training state of the machine learning model used to generate the embedding.”] [Rajkumar paragraph [0074]: “The robot fleet does not require that every robot in the fleet to be trained individually to identify new objects. Rather, the robot fleet employs a distributed learning technique, where one robot can learn to identify an object and produce an embedding that corresponds to the object. That robot can share the embedding with the other robots over the communication network in a manner that allows all robots in the fleet to identify the newly identified object. Providing the embedding and corresponding classification information allows robots that receive the data to instantly learn to classify the object. The technique can use machine learning models to produce an embedding describing the characteristics of the identified object. This may be beneficial because user need only train one robot to identify an object rather than each robot. In addition, sharing embeddings among robots preserves processing resources by allowing robots to learn without having to retrain the machine learning model on each robot.”]).
Regarding Claim 3, Clayton in view of Higgins as applied to Claim 1, in further view of Arditi, and in even further view of Rajkumar teaches
The system of claim 2, wherein the outputting the result comprises providing a notification to at least one of the device of the driver or the device of the rider indicating the event ([Arditi Figure 10, elements 1090, 1095; paragraph [0082]: a system performing a comparison between predicted and expected labels, and triggering an alert to a user device within the vehicle (“providing a notification to at least one of the device of the driver or the device of the rider indicating the event”), with the detected event/result being a vehicle-related emergency (interpreted as a detection of an accident) (“At 1090, the transportation management system may determine whether the score is sufficiently high relative to a threshold or criteria to warrant certain action. … On the other hand, if the score is sufficiently high, then at step 1095 the system may generate an appropriate alert and/or determine an appropriate action/response. In particular embodiments, the system may send alerts to appropriate recipients based on the detected event types. … An alert may additionally or alternatively be sent to one of the devices within the vehicle in which the event is occurring. For example, the alert may warn the passengers that events in the vehicle are being recorded, inform the passengers that an appropriate third-party has been contacted ( e.g., police, ambulance, etc.), or provide guidance on what to do ( e.g., remain calm in the vehicle, exit the vehicle, etc.).”).]).  
Regarding Claim 9, 
The method of claim 8, further comprising, during runtime: 
autoencoding runtime sensor data from the real world to generate a runtime embedding, the runtime sensor data comprising sensor data from at least one of a device of a driver or a device of a rider (This claim limitation is similar in scope to the corresponding claim limitation in Claim 2, and hence is rejected under similar rationale); 
comparing the runtime embedding to one or more embeddings of the inference model, a similarity in the comparing indicating the event associated with the inference model occurring in the real world (This claim limitation is similar in scope to the corresponding claim limitation in Claim 2, and hence is rejected under similar rationale); and 
outputting a result of the comparing (This claim limitation is similar in scope to the corresponding claim limitation in Claim 2, and hence is rejected under similar rationale).  
Regarding Claim 10, 
The method of claim 9, wherein the outputting the result comprises providing a notification to at least one of the device of the driver or the device of the rider indicating the event (This claim limitation is similar in scope to the corresponding claim limitation in Claim 3, and hence is rejected under similar rationale).  
Regarding Claim 16, 
The machine-storage medium of claim 15, wherein the operations further comprise, during runtime: 
autoencoding runtime sensor data from the real world to generate a runtime embedding, the runtime sensor data comprising sensor data from at least one of a device of a driver or a device of a rider (This claim limitation is similar in scope to the corresponding claim limitation in Claim 2, and hence is rejected under similar rationale); 
comparing the runtime embedding to one or more embeddings of the inference model, a similarity in the comparing indicating the event associated with the inference model occurring in the real world (This claim limitation is similar in scope to the corresponding claim limitation in Claim 2, and hence is rejected under similar rationale); and 
outputting a result of the comparing (This claim limitation is similar in scope to the corresponding claim limitation in Claim 2, and hence is rejected under similar rationale).  
Regarding Claim 17, 
The machine-storage medium of claim 16, wherein the outputting the result comprises providing a notification to at least one of the device of the driver or the device of the rider indicating the event (This claim limitation is similar in scope to the corresponding claim limitation in Claim 3, and hence is rejected under similar rationale).  
Claims 4-7, 11-14, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Clayton et al., U.S. PGPUB 2017/0206464, filed 1/14/2016, published 7/20/2017 [hereafter referred as Clayton] in view of Higgins et al., β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework, April 24 2017, published as a conference .
Regarding Claim 4, Clayton in view of Higgins as applied to Claim 1 teaches
The system of claim 1, 
wherein the covariate comprises a known fact ([Higgins p.6 Figure 5, element y=scale; p.6 5th paragraph – p.7 6th paragraph, Section 3 Disentanglement Metric: referring to the algorithm listed in Steps 1-3 on pp.6-7 of Higgins, given a dataset D={X,V, W}, set a target generative factor k = y as fixed, with “y=scale” being a known fact shown in Figure 5 (“covariate comprises a known fact”); sample latent distributions                         
                            
                                
                                    v
                                
                                
                                    1
                                    ,
                                    l
                                
                            
                        
                      and                         
                            
                                
                                    v
                                
                                
                                    2,1
                                
                            
                        
                     (keeping k = y); infer                         
                            
                                
                                    z
                                
                                
                                    1
                                    ,
                                    l
                                
                            
                        
                     = 𝛍(                        
                            
                                
                                    x
                                
                                
                                    1
                                    ,
                                    l
                                
                            
                        
                    ) using the encoder q(z|x) ~ N(𝛍(x),                         
                            σ
                        
                    (x)); compute pairwise distances                         
                            
                                
                                    z
                                
                                
                                    d
                                    i
                                    f
                                    f
                                
                                
                                    l
                                
                            
                            =
                            
                                
                                    |
                                     
                                    z
                                
                                
                                    1
                                    ,
                                    l
                                
                            
                        
                    -                         
                            
                                
                                    z
                                
                                
                                    2
                                    ,
                                    l
                                
                            
                        
                    |; and input the average pairwise distance                         
                            
                                
                                    z
                                
                                
                                    d
                                    i
                                    f
                                    f
                                
                                
                                    b
                                
                            
                        
                     into a classifier to predict P(y|                        
                            
                                
                                    z
                                
                                
                                    d
                                    i
                                    f
                                    f
                                
                                
                                    b
                                
                            
                        
                    ) (“Figure 5: Schematic of the proposed distanglement metric: over a batch of L samples, each pair of images has a fixed value for one target generative factory y (here y = scale) and differs on all others. A linear classifier is then trained to identify the target factor using the average pairwise difference                         
                            
                                
                                    z
                                
                                
                                    d
                                    i
                                    f
                                    f
                                
                                
                                    b
                                
                            
                        
                     in the latent space over L samples.”).])… , 
the known fact being disentangled from the triplets prior to training ([Higgins p.6 Figure 5, element y=scale; p.6 5th paragraph – p.7 6th paragraph, Section 3 Disentanglement Metric: referring to the algorithm listed in Steps 1-3 on pp.6-7 of Higgins, given a dataset D={X,V, W} (“triplets”), set a target generative factor k = y as fixed, with “y=scale” being a known fact shown in Figure 5; sample latent distributions                         
                            
                                
                                    v
                                
                                
                                    1
                                    ,
                                    l
                                
                            
                        
                      and                         
                            
                                
                                    v
                                
                                
                                    2,1
                                
                            
                        
                     (keeping k = y) (“the known fact being disentangled from the triplets prior to training”); infer                         
                            
                                
                                    z
                                
                                
                                    1
                                    ,
                                    l
                                
                            
                        
                     = 𝛍(                        
                            
                                
                                    x
                                
                                
                                    1
                                    ,
                                    l
                                
                            
                        
                    ) using the encoder q(z|x) ~ N(𝛍(x),                         
                            σ
                        
                    (x)); compute pairwise distances                         
                            
                                
                                    z
                                
                                
                                    d
                                    i
                                    f
                                    f
                                
                                
                                    l
                                
                            
                            =
                            
                                
                                    |
                                     
                                    z
                                
                                
                                    1
                                    ,
                                    l
                                
                            
                        
                    -                         
                            
                                
                                    z
                                
                                
                                    2
                                    ,
                                    l
                                
                            
                        
                    |; and input the average pairwise distance                         
                            
                                
                                    z
                                
                                
                                    d
                                    i
                                    f
                                    f
                                
                                
                                    b
                                
                            
                        
                     into a classifier to predict P(y|                        
                            
                                
                                    z
                                
                                
                                    d
                                    i
                                    f
                                    f
                                
                                
                                    b
                                
                            
                        
                    ) (“Figure 5: Schematic of the proposed distanglement metric: over a batch of L samples, each pair of images has a fixed value for one target generative factory y (here y = scale) and differs on all others. A linear classifier is then trained to identify the target factor using the average pairwise difference                         
                            
                                
                                    z
                                
                                
                                    d
                                    i
                                    f
                                    f
                                
                                
                                    b
                                
                            
                        
                     in the latent space over L samples.”).]).  
However, Clayton in view of Higgins does not teach
… [a known fact] associated with the plurality of user devices providing the sensor data  …  
Arditi teaches
… [a known fact] associated with the plurality of user devices providing the sensor data ([Arditi paragraph [0025]: a system receiving data from an application installed on mobile computing devices, with such data being GPS data on each requestor computing device (“plurality of user devices providing the sensor data”) (“In particular embodiments, the transportation management system 130 may, in response to a ride request, identify available providers that are registered with the transportation management system 130 through an application installed on each of their respective mobile computing devices 150 or through an associated transportation management vehicle device 160. For example, the transportation management system 130 may locate candidate ride providers 180 who are available ( e.g., based on a status indicator provided through each ride provider's 180 computing device 150) and in the general vicinity of the requested pick-up location (e.g., based on GPS data provided by the provider computing device 150 and the requestor computing device 120).”).] [Arditi paragraph [0051]: a system receiving contextual information associated with applications on the requestor’s device, including operating system, where a covariate is interpreted as a characteristic/known fact of a data set (“[a known fact] associated with the plurality of user devices providing the sensor data”) (“In particular embodiments, the transportation application may access other types of application usage data from other applications (including the operating system) installed on the requestor's device and use them as contextual information 715.”).]) …  
Both Clayton in view of Higgins and Arditi are analogous art as both teach sensor devices collecting real-time sensor data to perform event detection.
It would have been obvious to a person having ordinary skill in the art before the effective filing date to take the sensor devices of Clayton in view of Higgins and incorporate them into the user devices of Arditi in order to perform event detection. The motivation to combine is taught in Arditi, as processing run-time sensor data locally on user devices will avoid transmission of large quantities of data, and will help better detect, validate, and react to events in real-time, with minimal delay, thus improving the ride experience for riders ([Arditi paragraph [0022]: “Current information about the particular ride requestor may be used to improve the overall ride experience for the ride requestor and/or provide additional services. … As yet another example, interior sensor data may be used to detect and respond to emergencies appropriately ( e.g., urgent health conditions, etc.). The various embodiments described herein, therefore, enable vehicles dispatched for transporting ride requestors to be dynamically configured to suit the preferences of the ride requestors and provide additional services such as emergency detection. Such enhancements enable the dispatched vehicles to provide ride requestors with improved ride experience and safety.”] [Arditi paragraph [0084]: “…while the steps in FIG. 10 may be performed by the transportation management system, any combination of those steps may be performed by any other computing system, including, e.g., the ride requestor's computing device, the ride provider's computing device, the transportation management vehicle device, and/or the vehicle. For instance, from step 1060 to 1090, the ride provider's mobile device, the transportation management vehicle device, or the vehicle may process the sensor data to detect any event of interest. In particular embodiments where machine-learning models are used for making such determination, the transportation management system may transmit a trained machine-learning model to the computing system in the vehicle to allow event-detection to be made locally. This may be desirable since sensor data may be overly large to transmit to the remote transportation management system in a timely fashion. If the machine-learning model takes as input other non-local data (e.g., the requestor's profile data, ride-sharing data, etc.), such information may be made available to the computing device executing the machine-learning model ( e.g., the computing device may obtain the data from the transportation management system, the ride provider's device, or the ride requestor's device).”]).
Regarding Claim 5, Clayton in view of Higgins as applied to Claim 1, in further view of Arditi teaches
The system of claim 4, wherein the covariate comprises one or more of an operating system, phone model, or collection mode ([paragraph [0051]: contextual information associated with applications on the requestor’s device, including operating system, where a covariate is interpreted as a characteristic/known fact of a data set (“the covariate comprises one … of an operating system”) (“In particular embodiments, the transportation application may access other types of application usage data from other applications (including the operating system) installed on the requestor's device and use them as contextual information 715.”).]).  
Regarding Claim 6, Clayton in view of Higgins as applied to Claim 1 teaches
The system of claim 1.
However, Clayton in view of Higgins does not teach
wherein the event comprises co-presence of a driver and rider, fraud, dangerous driving, detection of an accident, 
Arditi teaches
wherein the event comprises co-presence of a driver and rider, fraud, dangerous driving, detection of an accident, ([Arditi Figure 10, elements 1090, 1095; paragraph [0082]: a system performing a comparison between predicted and expected labels, and triggering an alert to a user device within the vehicle, with the detected event/result being a vehicle-related emergency (interpreted as a detection of an accident) (“the event comprises … detection of an accident …”) (“At 1090, the transportation management system may determine whether the score is sufficiently high relative to a threshold or criteria to warrant certain action. … On the other hand, if the score is sufficiently high, then at step 1095 the system may generate an appropriate alert and/or determine an appropriate action/response. In particular embodiments, the system may send alerts to appropriate recipients based on the detected event types. … An alert may additionally or alternatively be sent to one of the devices within the vehicle in which the event is occurring. For example, the alert may warn the passengers that events in the vehicle are being recorded, inform the passengers that an appropriate third-party has been contacted ( e.g., police, ambulance, etc.), or provide guidance on what to do ( e.g., remain calm in the vehicle, exit the vehicle, etc.).”).]).  
Both Clayton in view of Higgins and Arditi are analogous art as both teach sensor devices collecting real-time sensor data to perform event detection.
It would have been obvious to a person having ordinary skill in the art before the effective filing date to take the sensor devices of Clayton in view of Higgins and incorporate them into the user devices of Arditi in order to perform event detection. The motivation to combine is taught in Arditi, as processing run-time sensor data locally on user devices will avoid transmission of large quantities of data, and will help better detect, validate, and react to events in real-time, with minimal delay, thus improving the ride experience for riders ([Arditi paragraph [0022]: “Current information about the particular ride requestor may be used to improve the overall ride experience for the ride requestor and/or provide additional services. … As yet another example, interior sensor data may be used to detect and respond to emergencies appropriately ( e.g., urgent health conditions, etc.). The various embodiments described herein, therefore, enable vehicles dispatched for transporting ride requestors to be dynamically configured to suit the preferences of the ride requestors and provide additional services such as emergency detection. Such enhancements enable the dispatched vehicles to provide ride requestors with improved ride experience and safety.”] [Arditi paragraph [0084]: “…while the steps in FIG. 10 may be performed by the transportation management system, any combination of those steps may be performed by any other computing system, including, e.g., the ride requestor's computing device, the ride provider's computing device, the transportation management vehicle device, and/or the vehicle. For instance, from step 1060 to 1090, the ride provider's mobile device, the transportation management vehicle device, or the vehicle may process the sensor data to detect any event of interest. In particular embodiments where machine-learning models are used for making such determination, the transportation management system may transmit a trained machine-learning model to the computing system in the vehicle to allow event-detection to be made locally. This may be desirable since sensor data may be overly large to transmit to the remote transportation management system in a timely fashion. If the machine-learning model takes as input other non-local data (e.g., the requestor's profile data, ride-sharing data, etc.), such information may be made available to the computing device executing the machine-learning model ( e.g., the computing device may obtain the data from the transportation management system, the ride provider's device, or the ride requestor's device).”]).
Regarding Claim 7, Clayton in view of Higgins as applied to Claim 1 teaches
The system of claim 1, 
wherein the operations further comprise preprocessing the sensor data prior to the assembling to align the sensor data to a lower frequency ([Clayton paragraph [0040]: sequential data forecast machine containing a feature extractor 604, which can remove noise from an input X to improve the signal to noise ratio (“preprocessing the sensor data … to align the sensor data to a lower frequency”) (“FIG. 6 is a high-level block diagram of a sequential data forecast machine 204, according to an example embodiment of the present disclosure. A sequential data forecast machine 204 includes an input layer 602, a feature extractor 604, a plurality of hidden layers 606, and an output layer 608, which may output sequential data forecast values based on the data input into the sequential data forecast machine 204. … typically, a feature extractor 604 may improve the accuracy of the sequential data forecast machine 204. For example, a feature extractor 604 may remove noise and/or highlight desired features in the input variable x, which may generally tend to improve the signal to noise ratio.”).] ([Clayton Figure 9, elements 902, 904, 906; Figure 10, element                         
                            
                                
                                    z
                                    1
                                
                                
                                    t
                                
                            
                        
                    ,                         
                            
                                
                                    z
                                    2
                                
                                
                                    t
                                
                            
                        
                    ,                         
                            
                                
                                    z
                                    3
                                
                                
                                    t
                                
                            
                        
                    ,                         
                            
                                
                                    Z
                                
                                
                                    t
                                
                            
                        
                    ; paragraphs [0057]-[0058]: a sensor fusion machine containing a multi-modal variational inference machine (MMVIM) and multi-modal sequential data forecast machine (MMSDFM) receiving latent distributions (generated from autoencoding) from a series of VIM-SDFM pairs that receive sensor data, producing final latent distribution                         
                            
                                
                                    Z
                                
                                
                                    t
                                
                            
                        
                    , generated by a triplet of variational inference machine-sequential data forecast machine pairs 902, 904, 906, of which each preprocess sensor data from different sources and perform autoencoding to produce the intermediate latent distributions                         
                            
                                
                                    z
                                    1
                                
                                
                                    t
                                
                            
                        
                    ,                         
                            
                                
                                    z
                                    2
                                
                                
                                    t
                                
                            
                        
                    ,                         
                            
                                
                                    z
                                    3
                                
                                
                                    t
                                
                            
                        
                     (“preprocessing the sensor data prior to the assembling…”) (“FIG. 10 is a flow diagram illustrating time series data adaptation including sensor fusion, according to an example embodiment of the present disclosure. In this example embodiment, the sensor fusion machine 908 includes a multi-modal variational inference machine (MMVIM) 910 and a multi-modal recurrent neural network (MMRNN) 912, both of which receive multiple modalities of data as inputs (i.e., data from multiple different domains), to provide a multi-modal sensor fusion output. … The sensor fusion machine 908 may receive the time dependency infused latent distributions                         
                            
                                
                                    z
                                    1
                                
                                
                                    t
                                
                            
                        
                    ,                         
                            
                                
                                    z
                                    2
                                
                                
                                    t
                                
                            
                        
                    , and                         
                            
                                
                                    z
                                    3
                                
                                
                                    t
                                
                            
                        
                    , for example, from the VIM-SDFM pairs 902, 904, 906. … The output layer of the MMVIM 910 outputs a multi-modal time dependency infused latent distribution                         
                            
                                
                                    Z
                                
                                
                                    t
                                
                            
                        
                    .”).]).  
Regarding Claim 11, 
The method of claim 8, wherein the covariate comprises a known fact associated with the plurality of user devices providing the sensor data, the known fact being disentangled from the triplets prior to training (This claim limitation is similar in scope to the corresponding claim limitation in Claim 4, and hence is rejected under similar rationale).  
Regarding Claim 12, 
The method of claim 11, wherein the covariate comprises one or more of an operating system, phone model, or collection mode (This claim limitation is similar in scope to the corresponding claim limitation in Claim 5, and hence is rejected under similar rationale).  
Regarding Claim 13, 
The method of claim 8, wherein the event comprises co-presence of a driver and rider, fraud, dangerous driving, detection of an accident, (This claim limitation is similar in scope to the corresponding claim limitation in Claim 6, and hence is rejected under similar rationale).  
Regarding Claim 14, 
The method of claim 8, further comprising preprocessing the sensor data prior to the assembling to align the sensor data to a lower frequency (This claim limitation is similar in scope to the corresponding claim limitation in Claim 7, and hence is rejected under similar rationale).  
Regarding Claim 18, 
The machine-storage medium of claim 15, wherein the covariate comprises a known fact associated with the plurality of user devices providing the sensor data, the known fact being disentangled from the triplets prior to training (This claim limitation is similar in scope to the corresponding claim limitation in Claim 4, and hence is rejected under similar rationale).  
Regarding Claim 19, 
The machine-storage medium of claim 15, wherein the event comprises co-presence of a driver and rider, fraud, dangerous driving, detection of an accident, (This claim limitation is similar in scope to the corresponding claim limitation in Claim 6, and hence is rejected under similar rationale).  
Regarding Claim 20, 
The machine-storage medium of claim 15, wherein the operations further comprise preprocessing the sensor data prior to the assembling to align the sensor data to a lower frequency (This claim limitation is similar in scope to the corresponding claim limitation in Claim 7, and hence is rejected under similar rationale).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Cirit, Fahrettin Olcay, U.S. Patent 10,111,043, Verifying Sensor Data Using Embeddings, filed 4/24/2017, issued 10/23/2018.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332.  The examiner can normally be reached on Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/WILLIAM WAI YIN KWAN/
Examiner, Art Unit 2121





/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121