Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after October 3, 2018, is being examined under the first inventor to file provisions of the AIA .
Claim 1-20 are pending.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 7-8 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 7 recites the limitation of ‘determining a number of cells of an input layer…’ and ‘determining the number of cells of an output layer…’. It is unclear what constitute the cells. Also it is unclear what constitutes “determining the number of cells of an output layer according to a number of the at least two application” as claimed.
For purpose of examination that claim is being interpreted as: number of cells are memory cell that is determined based on input data dimension and output data dimension or size.
Claim 8 depends on the claim 7 and inherit the same deficiency. Therefore, rejected by the
same reasoning as claim 7.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 1, 10, 11-13, 19-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1, 
2A Prong 1: The limitation of obtaining an application predictive model according to a plurality of groups of usage timing association records is a mental process, as it merely recites training something to get result, which can be done in human mind. 
The limitation of acquiring, from the application predictive model, probability values of launching the applications, by processing the usage status information of the applications is a mental process, as it recites a process of determining probability of whether an application will be launched or not. 
The limitation of determining an application to-be-launched at a next time point according to the probability values and is also a mental process, as it merely recites making expectation of which application will be launched next, which can be done in human mind.
2A Prong 2: This judicial exception is not integrated into a practical application. The limitation of acquiring usage status information of applications of at least two past time points of a next time point is a form of insignificant extra-solution activity.
2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The limitation of obtaining a model by training a model, long short-term memory (LSTM) neural network model, a terminal (generic component) and the application predictive model merely says particular technological field or environment the abstract idea is performed in (MPEP 2106.05(h)). The limitation of acquiring usage status information of applications of at least two past time points of a next time point recites mere data gathering (MPEP 2106.05(g)).

Regarding claim 10, the limitation of application predictive model comprises an input gate             
                
                    
                        i
                    
                    
                        t
                    
                
            
         , a forget gate             
                
                    
                        f
                    
                    
                        t
                    
                
            
         , an output gate             
                
                    
                        o
                    
                    
                        t
                    
                
            
         , a candidate memory cell             
                
                    
                        c
                    
                    
                        t
                    
                    
                        ~
                    
                
            
         , a final memory cell             
                
                    
                        c
                    
                    
                        t
                    
                
            
         , and an output status cell             
                
                    
                        h
                    
                    
                        t
                    
                
            
         , wherein
            
                
                    
                        i
                    
                    
                        t
                    
                
                =
                σ
                (
                
                    
                        W
                    
                    
                        i
                    
                
                
                    
                        x
                    
                    
                        t
                    
                
                +
                
                    
                        U
                    
                    
                        i
                    
                
                
                    
                        h
                    
                    
                        t
                        -
                        1
                    
                
                )
            
        
            
                
                    
                        f
                    
                    
                        t
                    
                
                =
                σ
                (
                
                    
                        W
                    
                    
                        f
                    
                
                
                    
                        x
                    
                    
                        t
                    
                
                +
                
                    
                        U
                    
                    
                        f
                    
                
                
                    
                        h
                    
                    
                        t
                        -
                        1
                    
                
                )
            
        
            
                
                    
                        o
                    
                    
                        t
                    
                
                =
                σ
                (
                
                    
                        W
                    
                    
                        o
                    
                
                
                    
                        x
                    
                    
                        t
                    
                
                +
                
                    
                        U
                    
                    
                        o
                    
                
                
                    
                        h
                    
                    
                        t
                        -
                        1
                    
                
                )
            
        
            
                
                    
                        c
                    
                    
                        t
                    
                    
                        ~
                    
                
                =
                t
                a
                n
                h
                (
                
                    
                        W
                    
                    
                        c
                    
                
                
                    
                        x
                    
                    
                        t
                    
                
                +
                
                    
                        U
                    
                    
                        c
                    
                
                
                    
                        h
                    
                    
                        t
                        -
                        1
                    
                
                )
            
        
            
                
                    
                        c
                    
                    
                        t
                    
                
                =
                
                    
                        f
                    
                    
                        t
                    
                
                ⨂
                 
                
                    
                        c
                    
                    
                        t
                        -
                        1
                    
                
                +
                
                    
                        i
                    
                    
                        t
                    
                
                ⨂
                
                    
                        c
                    
                    
                        t
                    
                    
                        ~
                    
                
            
        
            
                
                    
                        h
                    
                    
                        t
                    
                
                =
                
                    
                        o
                    
                    
                        t
                    
                
                ⨂
                t
                a
                n
                h
                ⁡
                (
                
                    
                        c
                    
                    
                        t
                    
                
                )
            
        
            
                
                    
                        x
                    
                    
                        t
                    
                
            
         indicating an application used at time point t in the usage timing association records;
            
                
                    
                        W
                    
                    
                        *
                    
                
            
         and             
                
                    
                        U
                    
                    
                        *
                    
                
            
         indicating network parameters learned, and             
                *
                ∈
                {
                i
                ,
                f
                ,
                o
                ,
                c
                }
            
         ;
             
                
                    
                        i
                    
                    
                        t
                    
                
            
         indicating an input gate at time point t,             
                
                    
                        f
                    
                    
                        t
                    
                
            
         indicating a forget gate at time point t, and             
                
                    
                        o
                    
                    
                        t
                    
                
            
         indicating an output gate at time point t;             
                
                    
                        c
                    
                    
                        t
                    
                
            
         indicating a final memory cell at time point t,             
                
                    
                        c
                    
                    
                        t
                        -
                        1
                    
                
            
         indicating a final memory cell at time point t-1 , and             
                
                    
                        c
                    
                    
                        t
                    
                    
                        ~
                    
                
            
         indicating a candidate memory cell at time point t;             
                
                    
                        h
                    
                    
                        t
                    
                
            
         indicating an output status cell at time point t, and             
                
                    
                        h
                    
                    
                        t
                        -
                        1
                    
                
            
         indicating an output status cell at time point t-1;             
                σ
            
         indicating a Sigmoid function;             
                ⨂
            
         indicating element-wise product of vectors are mathematical concept, because it merely recites mathematical relationship between status of an input gate, a forget gate, an output gate, a candidate memory cell, a final memory cell, and an output status cell, and time, which are equations that explains the LSTM. The limitation of the tanh function being expressed as             
                f
                
                    
                        x
                    
                
                =
                
                    
                        tanh
                    
                    ⁡
                    
                        
                            
                                x
                            
                        
                    
                
                =
                
                    
                        
                            
                                e
                            
                            
                                x
                            
                        
                        -
                        
                            
                                e
                            
                            
                                -
                                x
                            
                        
                    
                    
                        
                            
                                e
                            
                            
                                x
                            
                        
                        +
                        
                            
                                e
                            
                            
                                -
                                x
                            
                        
                         
                    
                
            
           is also a mathematical concept, as it merely recites a tanh function.
This judicial exception is not integrated into a practical application. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 11, the limitation of wherein the probability values comprise first probability values each indicating a probability of launching one of the applications and a second probability value indicating a probability of launching no application is a mental process, as it merely recites determining probability of launching applications, which can be done in human mind.
This judicial exception is not integrated into a practical application. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 12,
The limitation of acquire, probability values of launching the applications, by inputting the usage status information into the model and a plurality of groups of usage timing association records merely recites entering the data into a model, which can be done in aid of pen and paper.{YB:00711615.DOCX }-38- 
The limitation of determine an application to-be-launched at a next time point according to the probability values is also a mental process, as it merely recites making expectation of which application will be launched next, which can be done in human mind.
This judicial exception is not integrated into a practical application. In particular, the claim recites the additional element – using a processor, and a computer readable storage. The processor and readable storage are recited at a high-level of generality such that it amounts no more than mere instructions to apply the exception using a generic computer component. This judicial exception is not integrated into a practical application. The limitation of acquiring usage status information of applications of at least two past time points of a next time point is a form of insignificant extra-solution activity.
The limitation of a terminal device, comprising: at least one processor; and a computer readable storage, coupled to the at least one processor and storing at least one computer executable instruction thereon which, when executed by the at least one processor are generic computer components.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a processor and a computer readable storage to calculate probability of launching applications amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. An application predictive model, and LSTM neural network model merely says particular technological field or environment the abstract idea is performed in (MPEP 2106.05(h)). The limitation preloading the application to-be-launched  is applying mere instructions to perform the function on a generic component (2106.05(f)). The limitation of the application predictive model being obtained based on a plurality of groups of usage timing association records is also a particular technological field or environment, because the limitation merely recites a process of training a neural network model to obtain a model. The limitation of acquiring usage status information of applications of at least two past time points of a next time point is a mere data gathering (MPEP 2106.05(g)). 

Regarding claim 13, train the LSTM neural network model according to the plurality of groups of usage timing association records to obtain the application predictive model merely says particular technological field or environment the abstract idea is performed in (MPEP 2106.05(h)). The limitation of at least one processor merely recites a generic computer component.
This judicial exception is not integrated into a practical application. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Claim 17 is an apparatus claim having similar limitation to the method claim 10. Therefore, it is rejected with the same rationale as claim 10.

Claim 18 is an apparatus claim having similar limitation to the method claim 11. Therefore, it is rejected with the same rationale as claim 11.

Regarding claim 19, 
The limitation of a non-transitory computer readable storage medium storing a computer program which, when executed by a processor is a generic computer component.
The limitation of obtain a plurality of groups of usage timing association records by grouping the usage timing association records is a mental process, as it recites a process of grouping data, which can be done in aid of pen and paper.
This judicial exception is not integrated into a practical application. In particular, the claim recites the additional element – using a computer readable storage medium and processor. The processor and storage medium recited at a high-level of generality such that it amounts no more than mere instructions to apply the exception using a generic computer component. The limitation of acquire a user behavior sample within a preset time period, the user behavior sample comprising usage timing association records of at least two applications is a form of insignificant extra-solution activity.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a processor and a computer readable storage to train a neural network model amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Train a LSTM neural network model according to the plurality of groups of usage timing association records to obtain the application predictive model merely says particular technological field or environment the abstract idea is performed in (MPEP 2106.05(h)). The limitation of acquire a user behavior sample within a preset time period, the user behavior sample comprising usage timing association records of at least two applications is a mere data gathering (MPEP 2106.05(g))

Regarding claim 20, 
The limitation of the non-transitory computer readable storage medium of claim 19, wherein the processor is a generic computer component. 
The limitation of acquire, probability values of launching the applications, by processing the usage status information of the applications is a mental process, as it merely recites determining probability values of launching the applications, which can be done in human mind. 
The limitation of determine an application to-be-launched at a next time point according to the probability values is also a mental process, as it recites a process of expecting which application will be launched in future, which can be done in human mind.
This judicial exception is not integrated into a practical application. In particular, the claim recites the additional element – using a computer readable storage medium and processor. The processor and storage medium recited at a high-level of generality such that it amounts no more than mere instructions to apply the exception using a generic computer component. The limitation of acquiring usage status information of applications of at least two past time points of a next time point is a form of insignificant extra-solution activity.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a processor and a computer readable storage to calculate a probability of launching applications amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. An application predictive model merely says particular technological field or environment the abstract idea is performed in (MPEP 2106.05(h)). The limitation preloading the application to-be-launched is applying mere instructions to perform the function on a generic component (2106.05(f)). The limitation of acquiring usage status information of applications of at least two past time points of a next time point is a mere data gathering (MPEP 2106.05(g))

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 1-2, 9, 11-13, and 18-20 are rejected under 35 U.S.C. 103 over Merry (US 20140373032 A1) in view of Rodriguez (US 20180367484 A1).

Regarding claim 1, Merry teaches a method for preloading an application ([Merry, Abstract, line 1-3] “Systems and methods of pre-fetching data for applications in a computer system that are terminated or suspended and may be pre-launched by the computer system are disclosed”), comprising: 
obtaining an application predictive model according to a plurality of groups of usage timing records ([Merry, 0109] “In reference to FIG. 6, the adaptive predictor may operate on groupings of application usage periods, referred to as "cases". As shown in FIG. 6., one manner of creating cases may be affected by taking groups of (e.g., 3 or any desired number) adjacent application usage periods. It will be appreciated that it is also possible to create cases using other groupings such as current app switch, previous app switch, and any period that falls within the prediction window after the app switch”, [Merry, 110-111] “To determine the probability of “App X” being switched to in the next prediction window, the predictor may iterate overall of the cases and classify each of them based on their properties. Once the case is classified, the present system may adjust the counts for each class the case matches … These weights could be static or could depend on factors such as the number of total cases in a class and whether the current situation matches the class—e.g., assign weights (such as, 0.4,0.4, and 0.2). Once weights have been assigned, it may be possible to compute a final probability by taking the weighted sum of (Positive Cases)/(Total Cases) for each class”, adaptive predictor makes decision based on prior data, which corresponds to ‘obtaining a model’ process); 
acquiring usage status information of applications of a terminal of at least two past time points of a next time point ([Merry, 0132] “1. Given the current app (N) and previous app (N-1), history is examined to determine the frequency of the next app launched given the (N,N-1) pair. The probability expects to see a desired threshold number of cases of an app being launched after the (N,N-1) pair. If the desired threshold or greater of cases are found, then probabilities are weighted. If less than the desired threshold, then the following may occur: [0133] 2. The current app is examined (N) is examined and history is examined to determine if a pattern emerges. Again, the system is looking for the desired threshold number of cases and/or occurrences of the same pattern for N, N+1 are required. If not found, then the following may occur: [0134] 3. The previous app (N-1) is examined in the same way to determine if a pattern emerges for apps started within time T of app N-1. Again, the system tests for the desired threshold number of occurrences”, ”, two steps before future action is a 1 step before current action. Merry teaches acquiring N, current action, N-1, previous action, which is two steps before the N+1, future action); 
acquiring, from the application predictive model, probability values of launching the applications, by processing the usage status information of the applications with the application predictive model ([Merry, 0107] “(4) Adaptive Predictor [0108] This predictor may identify situations in the past that are similar to the current situation by considering the current foreground app, the last foreground app and how long the current app has been in usage. Once it has identified these situations, the predictor may return the percentage of situations which resulted in the queried event occurring within the prediction window”, [Merry, 0112] “Prediction engine module may receive activity data of a given app's lifecycle (e.g., the number of times an app is activated by a user, the time of day of activation, length of time of activation, and the like). These uses of an app may form a set of "cases" of use of an app. Each case may be assessed a calculated, predicted and/or estimated probability of future and/or potential activation”); and 
determining an application to-be-launched at a next time point according to the probability values and preloading the application to-be-launched ([Merry, 0120] “Once these probabilities have been set, they may be utilized by other modules of the present system--e.g., Pre-launch Policy--as a part of the rules and/or heuristics to determine whether to pre-launch a given app or not”).
Merry does disclose a model being obtained with usage data for making prediction but does not specifically teach training a long short-term memory (LSTM) neural network model according to a usage information.
Rodriguez teaches training a long short-term memory (LSTM) neural network model according to a usage information ([Rodriguez, 0055] “In some implementations, the suggestion server 156 and/or other components of the environment 100 can use machine learning, e.g., use a machine learning model that utilizes machine learning to determine suggested items. For example, in a training stage the suggestion server (or other component) can be trained using training data (e.g., message training data) of actual or generated messages in a messaging application context, and then at an inference stage can determine suggested items to new messages or other data it receives … In some implementations, the suggestion server 156 (or other component) can use one or more of a deep learning model, a logistic regression model, a Long Short Term Memory (LSTM) network, supervised or unsupervised model, etc. Some implementations can also detect image features in images or videos and determine suggested items (e.g., message responses) based on the image features”, [Rodriguez, 0287] “... In some implementations, after the high-confidence suggested response is generated and before it is selected by user input, the associated embedded application can be pre-loaded (e.g., downloaded in the background and stored in local storage) to the user device and its code initiated. This allows an instant display of output from the embedded application on the user device after the high-confidence suggested response is selected by user input. Such pre-loading of embedded applications can be omitted for lower-confidence (e.g., below threshold) suggested responses since they are less likely to be selected”, [Rodriguez, 0243] “In some implementations, machine-learning models may be trained using sample data or training data, e.g., commands and messages actually provided by users in response to embedded application and session events and who consent to provide such data for training purposes. Training data is treated before use to remove user identifiers and other user-related information”, shows that Rodriguez data uses usage information data such as response to embedded application or events as training data).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Merry and Rodriguez to use the LSTM of Rodriguez to implement the machine-learning based application pre-loading method of Merry. The suggestion and/or motivation for doing so is to improve the prediction accuracy, as using LSTM based machine learning to predict sequential data such as application usage is common practice as shown in Rodriguez ([Rodriguez, 0473, the last sentence] “For example, nodes with memory may include long short-term memory (LSTM) nodes. LSTM nodes may use the memory to maintain “state” that permits the node to act like a finite state machine (FSM). Models with such nodes may be useful in processing sequential data, e.g., words in a sentence or a paragraph, frames in a video, speech or other audio, etc”).

Regarding claim 2, Merry teaches acquiring usage timing association records of at least two applications within a preset time period ([Merry, 0109] “In reference to FIG. 6, the adaptive predictor may operate on groupings of application usage periods, referred to as "cases". As shown in FIG. 6., one manner of creating cases may be affected by taking groups of (e.g., 3 or any desired number) adjacent application usage periods. It will be appreciated that it is also possible to create cases using other groupings such as current app switch, previous app switch, and any period that falls within the prediction window after the app switch”, teaches the usage timing association records. [Merry, 0116] “As may be seen, this example considers the data of four apps (i.e., A, B, C and D) over the course of a desired period of time. It will be appreciated that the period of time may be varied according to the desire of the present system--e.g. a day, a week, etc”, teaches the preset time); 
obtaining the plurality of groups of usage timing association records by grouping the usage timing association records ([Merry, 0109] “In reference to FIG. 6, the adaptive predictor may operate on groupings of application usage periods, referred to as "cases". As shown in FIG. 6., one manner of creating cases may be affected by taking groups of (e.g., 3 or any desired number) adjacent application usage periods. It will be appreciated that it is also possible to create cases using other groupings such as current app switch, previous app switch, and any period that falls within the prediction window after the app switch”);
Merry does not specifically teach training the LSTM neural network model according to the usage information.
Rodriguez teaches training the LSTM neural network model according to the usage information ([Rodriguez, 0055] “In some implementations, the suggestion server 156 and/or other components of the environment 100 can use machine learning, e.g., use a machine learning model that utilizes machine learning to determine suggested items. For example, in a training stage the suggestion server (or other component) can be trained using training data (e.g., message training data) of actual or generated messages in a messaging application context, and then at an inference stage can determine suggested items to new messages or other data it receives … In some implementations, the suggestion server 156 (or other component) can use one or more of a deep learning model, a logistic regression model, a Long Short Term Memory (LSTM) network, supervised or unsupervised model, etc. Some implementations can also detect image features in images or videos and determine suggested items (e.g., message responses) based on the image features”, [Rodriguez, 0287] “... In some implementations, after the high-confidence suggested response is generated and before it is selected by user input, the associated embedded application can be pre-loaded (e.g., downloaded in the background and stored in local storage) to the user device and its code initiated. This allows an instant display of output from the embedded application on the user device after the high-confidence suggested response is selected by user input. Such pre-loading of embedded applications can be omitted for lower-confidence (e.g., below threshold) suggested responses since they are less likely to be selected”).

Regarding claim 9, Merry in view of Rodriguez teaches wherein obtaining the plurality of groups of usage timing association records by grouping the usage timing association records comprises ([Merry, Fig 6; 0109] “In reference to FIG. 6, the adaptive predictor may operate on groupings of application usage periods, referred to as “cases”. As shown in FIG. 6., one manner of creating cases may be affected by taking groups of (e.g., 3 or any desired number) adjacent application usage periods. It will be appreciated that it is also possible to create cases using other groupings Such as current app switch, previous app switch, and any period that falls within the prediction window after the app switch”, teaches grouping the usage timing association records of at least two applications”): applying a sliding window to the usage timing association records of the at least two applications within the preset time period ([Merry, Figure 6; 0110] “To determine the probability of “App X” being switched to in the next prediction window, the predictor may iterate overall of the cases and classify each of them based on their properties. Once the case is classified, the present system may adjust the counts for each class the case matches”, discloses ‘sliding window’ as the launching status of the applications can switch its position to the next prediction window. The predictor iterates through the cases, which also corresponds to the sliding window. [Merry, 0116] “As may be seen, this example considers the data of four apps (i.e., A, B, C and D) over the course of a desired period of time. It will be appreciated that the period of time may be varied according to the desire of the present system—e.g. a day, a week, etc”, discloses the data within preset period (desired period of time) are being considered); and 
determining usage timing association records corresponding to the sliding window at each position as one group of usage timing association records ([Merry, 0110] “To determine the probability of “App X” being switched to in the next prediction window, the predictor may iterate overall of the cases and classify each of them based on their properties. Once the case is classified, the present system may adjust the counts for each class the case matches. Each class may have a count of positive cases and a count of total cases. A case is “positive' if App X is switched to within the case”).

Regarding claim 11, Merry in view of Rodriguez teaches wherein the probability values comprise first probability values each indicating a probability of launching one of the applications ([Merry, 0111] “Once such cases have been classified and counts updated, the present system may determine which classes the current situation matches and assign weights to the classes. These weights could be static or could depend on factors such as the number of total cases in a class and whether the current situation matches the class—e.g., assign weights (such as, 0.4,0.4, and 0.2). Once weights have been assigned, it may be possible to compute a final probability by taking the weighted sum of (Positive Cases)/(Total Cases) for each class”, the disclosure ) and a second probability value indicating a probability of launching no application ([Merry, 0102] “This predictor may return a probability of 1.0 for the top 20 most frequently activated apps and 0.0 for all others”, the predictor return 0.0 possibility value if it is less frequently activated, which includes the application that has never been launched).

Regarding claim 12, Merry teaches terminal device, comprising: at least one processor; and a computer readable storage, coupled to the at least one processor and storing at least one computer executable instruction thereon which, when executed by the at least one processor, causes the at least one processor ([Merry, 0034] “Computer systems 102 may further comprise controller 104 which may in turn have one or more processors (e.g., a CPU and/or GPU) and computer memory, as is known in the art. Computer system 102 may further have operating system 106 installed in memory and working to control the lifecycles of various apps that may be activated by users of the computer system”): 
acquire usage status information of applications of a terminal of at least two past time points of a next time point ([Merry, 0132] “1. Given the current app (N) and previous app (N-1), history is examined to determine the frequency of the next app launched given the (N,N-1) pair. The probability expects to see a desired threshold number of cases of an app being launched after the (N,N-1) pair. If the desired threshold or greater of cases are found, then probabilities are weighted. If less than the desired threshold, then the following may occur: [0133] 2. The current app is examined (N) is examined and history is examined to determine if a pattern emerges. Again, the system is looking for the desired threshold number of cases and/or occurrences of the same pattern for N, N+1 are required. If not found, then the following may occur: [0134] 3. The previous app (N-1) is examined in the same way to determine if a pattern emerges for apps started within time T of app N-1. Again, the system tests for the desired threshold number of occurrences”, ”, two steps before future action is a 1 step before current action. Merry teaches acquiring N, current action, N-1, previous action, which is two steps before the N+1, future action);
acquire, from an application predictive model, probability values of launching the applications, by inputting the usage status information into the application predictive model, the application predictive model being obtained based on a plurality of groups of usage timing association records ([Merry, 0109] “In reference to FIG. 6, the adaptive predictor may operate on groupings of application usage periods, referred to as "cases". As shown in FIG. 6., one manner of creating cases may be affected by taking groups of (e.g., 3 or any desired number) adjacent application usage periods. It will be appreciated that it is also possible to create cases using other groupings such as current app switch, previous app switch, and any period that falls within the prediction window after the app switch”, [Merry, 110-111] “To determine the probability of “App X” being switched to in the next prediction window, the predictor may iterate overall of the cases and classify each of them based on their properties. Once the case is classified, the present system may adjust the counts for each class the case matches … These weights could be static or could depend on factors such as the number of total cases in a class and whether the current situation matches the class—e.g., assign weights (such as, 0.4,0.4, and 0.2). Once weights have been assigned, it may be possible to compute a final probability by taking the weighted sum of (Positive Cases)/(Total Cases) for each class”, adaptive predictor makes decision based on prior data, which corresponds to ‘obtaining a model’ process);{YB:00711615.DOCX }-38-
determine an application to-be-launched at a next time point according to the probability values and preloading the application to-be-launched ([Merry, 0120] “Once these probabilities have been set, they may be utilized by other modules of the present system--e.g., Pre-launch Policy--as a part of the rules and/or heuristics to determine whether to pre-launch a given app or not”).
Merry does not specifically teach the application predictive model being obtained based on a long short-term memory (LSTM) neural network model.
Rodriguez teaches the application predictive model being obtained based on a long short-term memory (LSTM) neural network model ([Rodriguez, 0055] “In some implementations, the suggestion server 156 and/or other components of the environment 100 can use machine learning, e.g., use a machine learning model that utilizes machine learning to determine suggested items … In some implementations, the suggestion server 156 (or other component) can use one or more of a deep learning model, a logistic regression model, a Long Short Term Memory (LSTM) network, supervised or unsupervised model, etc. Some implementations can also detect image features in images or videos and determine suggested items (e.g., message responses) based on the image features”, [Rodriguez, 0287] “... In some implementations, after the high-confidence suggested response is generated and before it is selected by user input, the associated embedded application can be pre-loaded (e.g., downloaded in the background and stored in local storage) to the user device and its code initiated. This allows an instant display of output from the embedded application on the user device after the high-confidence suggested response is selected by user input. Such pre-loading of embedded applications can be omitted for lower-confidence (e.g., below threshold) suggested responses since they are less likely to be selected”, [Rodriguez, 0243] “In some implementations, machine-learning models may be trained using sample data or training data, e.g., commands and messages actually provided by users in response to embedded application and session events and who consent to provide such data for training purposes. Training data is treated before use to remove user identifiers and other user-related information”, shows that Rodriguez data uses user data such as response to embedded application or events as training data). 

Regarding claim 13, Merry in teaches wherein the at least one processor is further configured to: obtain the application predictive model according to the plurality of groups of usage timing association records ([Merry, 0109] “In reference to FIG. 6, the adaptive predictor may operate on groupings of application usage periods, referred to as "cases". As shown in FIG. 6., one manner of creating cases may be affected by taking groups of (e.g., 3 or any desired number) adjacent application usage periods. It will be appreciated that it is also possible to create cases using other groupings such as current app switch, previous app switch, and any period that falls within the prediction window after the app switch” teaches the groups of usage timing association records. [Merry, 110-111] “To determine the probability of “App X” being switched to in the next prediction window, the predictor may iterate overall of the cases and classify each of them based on their properties. Once the case is classified, the present system may adjust the counts for each class the case matches … These weights could be static or could depend on factors such as the number of total cases in a class and whether the current situation matches the class—e.g., assign weights (such as, 0.4,0.4, and 0.2). Once weights have been assigned, it may be possible to compute a final probability by taking the weighted sum of (Positive Cases)/(Total Cases) for each class”, adaptive predictor makes decision based on prior data, which corresponds to ‘obtaining’ process).
Merry does not specifically teach train the LSTM neural network model according to a usage information.
Rodriguez teaches train the LSTM neural network model according to a usage information ([Rodriguez, 0055] “In some implementations, the suggestion server 156 and/or other components of the environment 100 can use machine learning, e.g., use a machine learning model that utilizes machine learning to determine suggested items. For example, in a training stage the suggestion server (or other component) can be trained using training data (e.g., message training data) of actual or generated messages in a messaging application context, and then at an inference stage can determine suggested items to new messages or other data it receives … In some implementations, the suggestion server 156 (or other component) can use one or more of a deep learning model, a logistic regression model, a Long Short Term Memory (LSTM) network, supervised or unsupervised model, etc. Some implementations can also detect image features in images or videos and determine suggested items (e.g., message responses) based on the image features”).

Claim 18 is an apparatus claim having similar limitation to the method claim 11. Therefore, it is rejected with the same rationale as claim 11 above.

Regarding claim 19, Merry teaches a non-transitory computer readable storage medium storing a computer program which, when executed by a processor ([Merry, 0034] “Computer systems 102 may further comprise controller 104 which may in turn have one or more processors (e.g., a CPU and/or GPU) and computer memory, as is known in the art. Computer system 102 may further have operating system 106 installed in memory and working to control the lifecycles of various apps that may be activated by users of the computer system”) 
acquire a user behavior sample within a preset time period, the user behavior sample comprising usage timing association records of at least two applications ([Merry, 0109] “In reference to FIG. 6, the adaptive predictor may operate on groupings of application usage periods, referred to as "cases". As shown in FIG. 6., one manner of creating cases may be affected by taking groups of (e.g., 3 or any desired number) adjacent application usage periods. It will be appreciated that it is also possible to create cases using other groupings such as current app switch, previous app switch, and any period that falls within the prediction window after the app switch”, the Figure 6 shows a plurality of application usage data composed with a plurality of applications); 
obtain a plurality of groups of usage timing association records by grouping the usage timing association records ([Merry, Fig 6; 0109] “In reference to FIG. 6, the adaptive predictor may operate on groupings of application usage periods, referred to as “cases”. As shown in FIG. 6., one manner of creating cases may be affected by taking groups of (e.g., 3 or any desired number) adjacent application usage periods. It will be appreciated that it is also possible to create cases using other groupings Such as current app switch, previous app switch, and any period that falls within the prediction window after the app switch”, teaches grouping the usage timing association records of at least two applications. [Merry, 110-111] “To determine the probability of “App X” being switched to in the next prediction window, the predictor may iterate overall of the cases and classify each of them based on their properties. Once the case is classified, the present system may adjust the counts for each class the case matches … These weights could be static or could depend on factors such as the number of total cases in a class and whether the current situation matches the class—e.g., assign weights (such as, 0.4,0.4, and 0.2). Once weights have been assigned, it may be possible to compute a final probability by taking the weighted sum of (Positive Cases)/(Total Cases) for each class”, adaptive predictor makes decision based on prior data, which corresponds to obtaining the model); 
Merry does disclose obtaining an application model according to the usage records, but does not explicitly disclose train a LSTM neural network model according to the usage records to obtain an application predictive model.
Rodrigues teaches train a LSTM neural network model according to the usage records to obtain an application predictive model ([Rodriguez, 0055] “In some implementations, the suggestion server 156 and/or other components of the environment 100 can use machine learning, e.g., use a machine learning model that utilizes machine learning to determine suggested items. For example, in a training stage the suggestion server (or other component) can be trained using training data (e.g., message training data) of actual or generated messages in a messaging application context, and then at an inference stage can determine suggested items to new messages or other data it receives … In some implementations, the suggestion server 156 (or other component) can use one or more of a deep learning model, a logistic regression model, a Long Short Term Memory (LSTM) network, supervised or unsupervised model, etc. Some implementations can also detect image features in images or videos and determine suggested items (e.g., message responses) based on the image features”).

Regarding claim 20, Merry in view of Rodriguez teaches wherein the processor is further configured to: acquire usage status information of applications of a terminal of at least two past time points of a next time point ([Merry, 0132] “1. Given the current app (N) and previous app (N-1), history is examined to determine the frequency of the next app launched given the (N,N-1) pair. The probability expects to see a desired threshold number of cases of an app being launched after the (N,N-1) pair. If the desired threshold or greater of cases are found, then probabilities are weighted. If less than the desired threshold, then the following may occur: [0133] 2. The current app is examined (N) is examined and history is examined to determine if a pattern emerges. Again, the system is looking for the desired threshold number of cases and/or occurrences of the same pattern for N, N+1 are required. If not found, then the following may occur: [0134] 3. The previous app (N-1) is examined in the same way to determine if a pattern emerges for apps started within time T of app N-1. Again, the system tests for the desired threshold number of occurrences”, ”, two steps before future action is a 1 step before current action. Merry teaches acquiring N, current action, N-1, previous action, which is two steps before the N+1, future action); 
acquire, from the application predictive model, probability values of launching the applications, by processing the usage status information of the applications with the application predictive model ([Merry, 0107] “(4) Adaptive Predictor [0108] This predictor may identify situations in the past that are similar to the current situation by considering the current foreground app, the last foreground app and how long the current app has been in usage. Once it has identified these situations, the predictor may return the percentage of situations which resulted in the queried event occurring within the prediction window”, [Merry, 0112] “Prediction engine module may receive activity data of a given app's lifecycle (e.g., the number of times an app is activated by a user, the time of day of activation, length of time of activation, and the like). These uses of an app may form a set of "cases" of use of an app. Each case may be assessed a calculated, predicted and/or estimated probability of future and/or potential activation”); and 
determine an application to-be-launched at a next time point according to the probability values and preloading the application to-be-launched ([Merry, 0120] “Once these probabilities have been set, they may be utilized by other modules of the present system--e.g., Pre-launch Policy--as a part of the rules and/or heuristics to determine whether to pre-launch a given app or not”).

Claim 3 and 6 is/are rejected under 35 U.S.C. 103 over Merry (US 20140373032 A1) in view of Rodriguez (US 20180367484 A1), and further in view of Chan (US 20040268213 A1).

Regarding claim 3, Merry in view of Rodriguez teaches wherein acquiring the usage timing association records of the at least two applications within the preset time period comprises: 
determining the usage timing association records according to usage status information of the at least two applications ([Merry, 0109] “In reference to FIG. 6, the adaptive predictor may operate on groupings of application usage periods, referred to as "cases". As shown in FIG. 6., one manner of creating cases may be affected by taking groups of (e.g., 3 or any desired number) adjacent application usage periods. It will be appreciated that it is also possible to create cases using other groupings such as current app switch, previous app switch, and any period that falls within the prediction window after the app switch”). 
Merry in view of Rodriguez does not specifically teach sorting applications according to frequencies of use of the applications within the preset time period and determining the at least two applications according to a sorting result.
Chan teaches sorting applications according to frequencies of use of the applications within the preset time period ([Chan, 0088] “At 1608, the applications are sorted based on their classification. In another example, the method runs periodically and automatically on a workstation, or on a network. In one such example, the method measures the frequency of application use over the period and at the end of the period, applications are selected automatically 1602, if they are used for some threshold determination of time. Further, in such an example, applications can be assigned classes 1604, based on the length or frequency of use of the application during the period. In such a case, the workstation or network automatically classifies applications on a periodic basis. Then the classified applications are input into a method of reformulating the resources according to the classifications. This creates an ongoing, dynamic, and specific reformulation of resources”); 
determining the at least two applications according to a sorting result ([Chan, 0088] “At 1608, the applications are sorted based on their classification. In another example, the method runs periodically and automatically on a workstation, or on a network. In one such example, the method measures the frequency of application use over the period and at the end of the period, applications are selected automatically 1602, if they are used for some threshold determination of time. Further, in such an example, applications can be assigned classes 1604, based on the length or frequency of use of the application during the period. In such a case, the workstation or network automatically classifies applications on a periodic basis. Then the classified applications are input into a method of reformulating the resources according to the classifications. This creates an ongoing, dynamic, and specific reformulation of resources”); 
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Merry, Rodriguez, and Chan to use the sorting applications according to the frequency of usage and determining applications based on the sorting results of Chan to implement the machine-learning based application pre-loading method of Merry and Rodriguez. The suggestion and/or motivation for doing so is to improve the prediction accuracy, as usage log data has application usage data and timestamps in it.

Regarding claim 6, Merry in view of Rodriguez teaches the method of claim 3, further comprising: determining a frequency of use of the application according to usage records ([Merry, 0132] “1. Given the current app (N) and previous app (N-1), history is examined to determine the frequency of the next app launched given the (N,N-1) pair”).
Merry in view of Rodriguez does not specifically teach prior to sorting the applications according to the frequencies of the use of the applications within the preset time period: for each application, filtering out usage records in which the application is used shorter than a preset period.
Chan teaches sorting the applications according to the frequencies of the use of the applications within the preset time period ([Chan, 0088] “At 1608, the applications are sorted based on their classification. In another example, the method runs periodically and automatically on a workstation, or on a network. In one such example, the method measures the frequency of application use over the period and at the end of the period, applications are selected automatically 1602, if they are used for some threshold determination of time. Further, in such an example, applications can be assigned classes 1604, based on the length or frequency of use of the application during the period. In such a case, the workstation or network automatically classifies applications on a periodic basis. Then the classified applications are input into a method of reformulating the resources according to the classifications. This creates an ongoing, dynamic, and specific reformulation of resources”); 
Neither Merry, Rodriguez, nor Chan explicitly teaches the detailed description about filtering out usage records prior to sorting the application, but it would have been obvious to a person of ordinary skill in art before the effective filling date of the claimed invention to perform the method of filtering out usage records shorter than a preset period prior to the method of sorting the application of Chan to implement the method of filtering out usage records less than preset time period prior to sorting the application. The modification would have been obvious because it is common to filter out unnecessary data prior to processing it. The suggestion and/or motivation to do so is to get rid of meaningless data, because in general, usage records shorter than the specified preset time period are meaningless and storing and processing the meaningless data wastes the resources.

Claim 4-5 are rejected under 35 U.S.C. 103 over Merry (US 20140373032 A1) in view of Rodriguez (US 20180367484 A1), in view of Chan (US 20040268213 A1), and further in view of Silvestri (US 20160189049 A1).

Regarding claim 4, Merry teaches wherein determining the usage timing association records according to the usage status information of the at least two applications comprises:
sampling a usage record of the at least two applications according to a preset sampling period and determining whether the at least two applications are in use at sampling time points in the preset sampling period ([Merry, 0058] “FIG. 3 is one embodiment of a high level flowchart (300) for the present system. The present system may start at 302. As part of an initialization, the present system may receive (at 304) data and/or metadata regarding applications that may need content from a third party source—so that the present system may identify applications as potential candidates for pre-fetching”, teaches sampling application usage data. [Merry, Fig 6; 0109] “In reference to FIG. 6, the adaptive predictor may operate on groupings of application usage periods, referred to as “cases”. As shown in FIG. 6., one manner of creating cases may be affected by taking groups of (e.g., 3 or any desired number) adjacent application usage periods. It will be appreciated that it is also possible to create cases using other groupings Such as current app switch, previous app switch, and any period that falls within the prediction window after the app switch”, teaches grouping the usage timing association records of at least two applications. To group application usage data, determining number of records to be in a group is must); and
determining the usage timing association records by associating the usage status information of the at least two applications according to the sampling time points ([Merry, 0109] “In reference to FIG. 6, the adaptive predictor may operate on groupings of application usage periods, referred to as "cases". As shown in FIG. 6., one manner of creating cases may be affected by taking groups of (e.g., 3 or any desired number) adjacent application usage periods. It will be appreciated that it is also possible to create cases using other groupings such as current app switch, previous app switch, and any period that falls within the prediction window after the app switch”, FIG. 6 shows a plurality of groups of application usage data composed with a plurality of applications); 
Merry does not specifically teach sampling a usage log of the at least two applications according to a preset sampling period and determining whether the at least two applications are in use at sampling time points in the preset sampling period, and training the LSTM neural network model according to the usage status information of the at least two applications at the sampling time points in the plurality of groups of usage timing association records.
Rodriguez teaches wherein training the LSTM neural network model according to the usage information ([Rodriguez, 0055] “In some implementations, the suggestion server 156 and/or other components of the environment 100 can use machine learning, e.g., use a machine learning model that utilizes machine learning to determine suggested items … In some implementations, the suggestion server 156 (or other component) can use one or more of a deep learning model, a logistic regression model, a Long Short Term Memory (LSTM) network, supervised or unsupervised model, etc. Some implementations can also detect image features in images or videos and determine suggested items (e.g., message responses) based on the image features”, [Rodriguez, 0287] “... In some implementations, after the high-confidence suggested response is generated and before it is selected by user input, the associated embedded application can be pre-loaded (e.g., downloaded in the background and stored in local storage) to the user device and its code initiated. This allows an instant display of output from the embedded application on the user device after the high-confidence suggested response is selected by user input. Such pre-loading of embedded applications can be omitted for lower-confidence (e.g., below threshold) suggested responses since they are less likely to be selected”).
Merry in view of Rodriguez and further in view of Chan does not specifically teach sampling a usage log, and training the neural network model according to the usage status information of the at least two applications at the sampling time points in the plurality of groups of usage timing association records.
Silvestri teaches sampling a usage log of the at least two applications according to a preset sampling period and determining whether the at least two applications are in use at sampling time points in the preset sampling period ([Silvestri, 0034] “In accordance with various embodiments, the server(s) 102 may have access to one or more user logs 118 (e.g., user databases) into which user information is retained for each of a plurality of users … [0036] For instance, the user's search request may contain any number of parameters, such as user or browser identity and the search terms, which may be retained in the query logs. Additional information related to the search, Such as a timestamp, may also be retained in the query logs along with the search request parameters”, [Silvestri, 0063, line 2-6] “More particularly, a distributed representation associated with an application action may indicate a sequential pattern of actions occurring prior to the action within a particular time period and/or after the action within a particular time period”, shows the invention takes care of application usage information, [Silvestri, Figure 7, 702] “Ascertain whether a threshold amount of contextual information pertaining to usage of at least a portion of a plurality of applications installed on a mobile device…” also discloses the application usage information); 
training the neural network model according to the usage status information of the at least two applications at the sampling time points in the plurality of timing records ([Silvestri, 0091] “Constructing the Bayesian Network. The first phase is to learn the structure of the Bayesian network. The procedure of constructing the PTAN Bayesian network from the training data may be performed as follows: 1. Based on the training data including a plurality of distributed representations (e.g., from a plurality of users), conditional mutual information between attributes may be computed for each one of a plurality of applications. The function of the conditional mutual information may be defined as follows: …” [Silvestri, 0063, line 2-6] “More particularly, a distributed representation associated with an application action may indicate a sequential pattern of actions occurring prior to the action within a particular time period and/or after the action within a particular time period”, shows that the distributed feature means the sampled application action performed within a particular time period, [Silvestri, 0072, line 10-17] “the pattern may also indicate an amount of time that has lapsed between any of the actions occurring prior to the action, an amount of time that has lapsed between any of the actions occurring after the action, and/or an amount of time that has lapsed between any of the actions occurring both prior to the action and after the action with respect to one another”, discloses the timing records),
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Merry, Rodriguez, Chan, and Silvestri to use the training a neural network and obtaining usage log data of Silvestri to implement the machine-learning based application pre-loading method of Merry, Rodriguez, and Chan. The suggestion and/or motivation for doing so is to improve the prediction accuracy, as usage log data has application usage data and timestamps in it.

Regarding claim 5, Merry in view of Rodriguez, in view of Chan, and further in view of Silvestri teaches the plurality of groups of application usage timing association records ([Merry, 0109] “In reference to FIG. 6, the adaptive predictor may operate on groupings of application usage periods, referred to as "cases". As shown in FIG. 6., one manner of creating cases may be affected by taking groups of (e.g., 3 or any desired number) adjacent application usage periods. It will be appreciated that it is also possible to create cases using other groupings such as current app switch, previous app switch, and any period that falls within the prediction window after the app switch”). 
Neither Merry, Rodriguez, Chan, nor Silvestri explicitly teaches the detailed description of ‘wherein the plurality of groups of records are (m-n+1) groups of records, n indicates a number of sampling time points associated with each of the plurality of groups of records and is an integer greater than or equal to 2, and m indicates a total number of sampling time points in the preset sampling period and is an integer greater than or equal to 3, wherein the i* group of records comprises records of the at least two applications at the ith to the (i+n-1)* sampling time point, and i is an integer and ranges from 1 to (m-n+1)’. 
However it would have been obvious to a person of ordinary skill in art before the effective filling date of the claimed invention to do that. The modification would have been obvious because the limitation is merely a detailed description about relationship between number of sampling time points used in sliding window that iterates through the groups of records, which is shown in Fig. 2 of the invention. Merry teaches sliding window going through a plurality of groups of records ([Merry, Figure 6] shows a diagram of prediction window, [Merry, 0110] “To determine the probability of “App X” being switched to in the next prediction window, the predictor may iterate overall of the cases and classify each of them based on their properties. Once the case is classified, the present system may adjust the counts for each class the case matches”, discloses ‘sliding window’ as the launching status of the applications can switch its position to the next prediction window. The predictor iterates through the cases, which also corresponds to the sliding window. [Merry, 0116] “As may be seen, this example considers the data of four apps (i.e., A, B, C and D) over the course of a desired period of time. It will be appreciated that the period of time may be varied according to the desire of the present system—e.g. a day, a week, etc”, discloses the data within preset period (desired period of time) are being considered). The motivation/suggestion to do so is to analyze the usage data of an application and applications that is used in similar time period according to a timeline.

Claim 7 is rejected under 35 U.S.C. 103 over Merry (US 20140373032 A1) in view of Rodriguez (US 20180367484 A1), and further in view of Rivoir (US 6105087 A).

Regarding claim 7, Merry in view of Rodriguez teaches the application predictive model using the plurality of groups of usage timing association records of the at least two applications ([Merry, 0109] “In reference to FIG. 6, the adaptive predictor may operate on groupings of application usage periods, referred to as "cases". As shown in FIG. 6., one manner of creating cases may be affected by taking groups of (e.g., 3 or any desired number) adjacent application usage periods. It will be appreciated that it is also possible to create cases using other groupings such as current app switch, previous app switch, and any period that falls within the prediction window after the app switch”). 
Merry in view of Rodriguez does not specifically teach determining a number of (memory) cells of an input layer of the predictive model according to vector dimensions, and determining a number of (memory) cells of an output layer of the predictive model according to vector dimensions.
Rivoir teaches determining a number of (memory) cells of an input layer of the predictive model according to vector dimensions ([Rivoir, column 6, line 3-14] “(17) Sequencer state machine 140 preferably performs the specific tasks by means of a memory-based look-up table (matrix or array). The size of the memory (matrix) is then determined by the number of inputs i (i.e. input lines 130a . . . 130z from the comparators 110a . . 110z, possible feed-back signals 220i, and state bus 180) and the number of outputs o (i.e., busses 160a . . . 160z and possible state outputs 200a . . . 200z for state bus 180). The matrix would be an 2.sup.i times o array, or in other words, a storage element with 2.sup.i .times.o entries, whereby i inputs are fed to a memory's address bus and o outputs are connected to a memory's data bus”).
determining a number of (memory) cells of an output layer of the predictive model according to vector size ([Rivoir, column 6, line 3-14] “(17) Sequencer state machine 140 preferably performs the specific tasks by means of a memory-based look-up table (matrix or array). The size of the memory (matrix) is then determined by the number of inputs i (i.e. input lines 130a . . . 130z from the comparators 110a . . 110z, possible feed-back signals 220i, and state bus 180) and the number of outputs o (i.e., busses 160a . . . 160z and possible state outputs 200a . . . 200z for state bus 180). The matrix would be an 2.sup.i times o array, or in other words, a storage element with 2.sup.i .times.o entries, whereby i inputs are fed to a memory's address bus and o outputs are connected to a memory's data bus”).
Neither Merry, Rodriguez, nor Rivoir explicitly teaches the detailed description about determining a number of memory cells according to the number of applications, but it would have been obvious to a person of ordinary skill in art before the effective filling date of the claimed invention to use the method of determining number of memory cells according to vector size of Rivoir to implement the method of determining a number of memory cells according to the number of applications. The modification would have been obvious because number of nodes should be equal to the number of dimensions or features of the input vector, which is the number of applications. The suggestion and/or motivation to do so is to allocate the memory cells efficiently, as allocating memory cells that will not be used wastes computation resources.

Claim 8 is rejected under 35 U.S.C. 103 over Merry (US 20140373032 A1) in view of Rodriguez (US 20180367484 A1), in view of Rivoir (US 6105087 A), and further in view of DiPietro (DiPietro, 2016, “A Friendly Introduction to Cross-Entropy Loss”)

Regarding claim 8, Merry in view of Rodriguez and further in view of Rivoir teaches the application predictive model using usage status information of each application. 
Merry in view of Rodriguez and further in view of Rivoir does not specifically teach wherein the application predictive model adopts an error function, which is a cross entropy loss function expressed as:             
                J
                =
                
                    
                        ∑
                        
                            k
                            =
                            1
                        
                        
                            C
                        
                    
                    
                        
                            
                                y
                            
                            
                                k
                            
                        
                        l
                        o
                        g
                        ⁡
                        (
                        
                            
                                y
                            
                            
                                k
                            
                            
                                ^
                            
                        
                        )
                    
                
            
         , wherein yk indicates an actual value of information, indicates a predicted value of the information, C = M+1 , M indicates the number of the at least two applications, and J indicates a cross entropy of the classification model.
DiPietro teaches wherein the application predictive model adopts an error function, which is a cross entropy loss function expressed as:             
                J
                =
                
                    
                        ∑
                        
                            k
                            =
                            1
                        
                        
                            C
                        
                    
                    
                        
                            
                                y
                            
                            
                                k
                            
                        
                        l
                        o
                        g
                        ⁡
                        (
                        
                            
                                y
                            
                            
                                k
                            
                            
                                ^
                            
                        
                        )
                    
                
            
         ,
wherein yk indicates an actual value of information, indicates a predicted value of the information, C = M+1 , M indicates the number of the at least two applications, and J indicates a cross entropy of the classification model ([DePietro, page 4, Cross Entropy] “In contrast, cross entropy is the number of bits we’ll need if we encode symbols from y using the wrong tool y’. This consists of encoding the i-th symbol using log1/y’ bits instead of log 1/yi bits. We of course still take the expected values to the true distribution y, since it’s the distribution that truly generates the symbols:             
                H
                
                    
                        y
                        ,
                        
                            
                                y
                            
                            
                                '
                            
                        
                    
                
                =
                
                    
                        ∑
                        
                            i
                        
                    
                    
                        l
                        o
                        g
                        
                            
                                1
                            
                            
                                
                                    
                                        y
                                    
                                    
                                        i
                                    
                                    
                                        '
                                    
                                
                            
                        
                        =
                        -
                        
                            
                                ∑
                                
                                    i
                                
                            
                            
                                
                                    
                                        y
                                    
                                    
                                        i
                                    
                                
                                l
                                o
                                g
                                 
                                
                                    
                                        (
                                        y
                                    
                                    
                                        i
                                    
                                    
                                        '
                                    
                                
                                )
                            
                        
                    
                
            
          ”, in this case, the yi corresponds to the predicted value of the information. M is merely a limitation of number of summation.             
                H
                
                    
                        y
                        ,
                        
                            
                                y
                            
                            
                                '
                            
                        
                    
                
            
         corresponds to the cross entropy of the model).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of DiPietro, Merry, Rodriguez, and Rivoir to use the Cross-Entropy Loss calculation of DiPietro to implement the LSTM-based application pre-loading method of Merry, Rodriguez, and Rivoir. The suggestion and/or motivation for doing so is to improve the prediction accuracy, as using cross entropy loss to test the machine learning process is common practice in the field of machine learning and classification ([DiPietro, page 2, Introduction, first paragraph] “When we develop a model for probabilistic classification, we aim to map the model's inputs to probabilistic predictions, and we often train our model by incrementally adjusting the model's parameters so that our predictions get closer and closer to ground-truth probabilities”).

Claim 10 and 17 are rejected under 35 U.S.C. 103 over Merry (US 20140373032 A1) in view of Rodriguez (US 20180367484 A1), and further in view of Lan (US 20170344829 A1).

Regarding claim 10, Merry in view of Rodriguez teaches the method of claim 1.
Merry in view of Rodriguez does not specifically teach the application predictive model comprises an input gate … the tanh function being expressed as             
                f
                
                    
                        x
                    
                
                =
                
                    
                        tanh
                    
                    ⁡
                    
                        
                            
                                x
                            
                        
                    
                
                =
                
                    
                        
                            
                                e
                            
                            
                                x
                            
                        
                        -
                        
                            
                                e
                            
                            
                                -
                                x
                            
                        
                    
                    
                        
                            
                                e
                            
                            
                                x
                            
                        
                        +
                        
                            
                                e
                            
                            
                                -
                                x
                            
                        
                         
                    
                
            
         .
Lan teaches the application predictive model comprises an input gate             
                
                    
                        i
                    
                    
                        t
                    
                
            
         , a forget gate             
                
                    
                        f
                    
                    
                        t
                    
                
            
         , an output gate             
                
                    
                        o
                    
                    
                        t
                    
                
            
         , a candidate memory cell             
                
                    
                        c
                    
                    
                        t
                    
                    
                        ~
                    
                
            
         , a final memory cell             
                
                    
                        c
                    
                    
                        t
                    
                
            
         , and an output status cell             
                
                    
                        h
                    
                    
                        t
                    
                
            
         , wherein
            
                
                    
                        i
                    
                    
                        t
                    
                
                =
                σ
                (
                
                    
                        W
                    
                    
                        i
                    
                
                
                    
                        x
                    
                    
                        t
                    
                
                +
                
                    
                        U
                    
                    
                        i
                    
                
                
                    
                        h
                    
                    
                        t
                        -
                        1
                    
                
                )
            
        
            
                
                    
                        f
                    
                    
                        t
                    
                
                =
                σ
                (
                
                    
                        W
                    
                    
                        f
                    
                
                
                    
                        x
                    
                    
                        t
                    
                
                +
                
                    
                        U
                    
                    
                        f
                    
                
                
                    
                        h
                    
                    
                        t
                        -
                        1
                    
                
                )
            
        
            
                
                    
                        o
                    
                    
                        t
                    
                
                =
                σ
                (
                
                    
                        W
                    
                    
                        o
                    
                
                
                    
                        x
                    
                    
                        t
                    
                
                +
                
                    
                        U
                    
                    
                        o
                    
                
                
                    
                        h
                    
                    
                        t
                        -
                        1
                    
                
                )
            
        
            
                
                    
                        c
                    
                    
                        t
                    
                    
                        ~
                    
                
                =
                t
                a
                n
                h
                (
                
                    
                        W
                    
                    
                        c
                    
                
                
                    
                        x
                    
                    
                        t
                    
                
                +
                
                    
                        U
                    
                    
                        c
                    
                
                
                    
                        h
                    
                    
                        t
                        -
                        1
                    
                
                )
            
        
            
                
                    
                        c
                    
                    
                        t
                    
                
                =
                
                    
                        f
                    
                    
                        t
                    
                
                ⨂
                 
                
                    
                        c
                    
                    
                        t
                        -
                        1
                    
                
                +
                
                    
                        i
                    
                    
                        t
                    
                
                ⨂
                
                    
                        c
                    
                    
                        t
                    
                    
                        ~
                    
                
            
        
            
                
                    
                        h
                    
                    
                        t
                    
                
                =
                
                    
                        o
                    
                    
                        t
                    
                
                ⨂
                t
                a
                n
                h
                ⁡
                (
                
                    
                        c
                    
                    
                        t
                    
                
                )
            
        
            
                
                    
                        x
                    
                    
                        t
                    
                
            
         indicating an application used at time point t in the usage timing association records;
            
                
                    
                        W
                    
                    
                        *
                    
                
            
         and             
                
                    
                        U
                    
                    
                        *
                    
                
            
         indicating network parameters learned, and             
                *
                ∈
                {
                i
                ,
                f
                ,
                o
                ,
                c
                }
            
         ;             
                
                    
                        i
                    
                    
                        t
                    
                
            
         indicating an input gate at time point t,             
                
                    
                        f
                    
                    
                        t
                    
                
            
         indicating a forget gate at time point t, and             
                
                    
                        o
                    
                    
                        t
                    
                
            
         indicating an output gate at time point t;             
                
                    
                        c
                    
                    
                        t
                    
                
            
         indicating a final memory cell at time point t,             
                
                    
                        c
                    
                    
                        t
                        -
                        1
                    
                
            
         indicating a final memory cell at time point t-1 , and             
                
                    
                        c
                    
                    
                        t
                    
                    
                        ~
                    
                
            
         indicating a candidate memory cell at time point t;             
                
                    
                        h
                    
                    
                        t
                    
                
            
         indicating an output status cell at time point t, and             
                
                    
                        h
                    
                    
                        t
                        -
                        1
                    
                
            
         indicating an output status cell at time point t-1;             
                σ
            
         indicating a Sigmoid function;             
                ⨂
            
         indicating element-wise product of vectors; the tanh function being expressed as             
                f
                
                    
                        x
                    
                
                =
                
                    
                        tanh
                    
                    ⁡
                    
                        
                            
                                x
                            
                        
                    
                
                =
                
                    
                        
                            
                                e
                            
                            
                                x
                            
                        
                        -
                        
                            
                                e
                            
                            
                                -
                                x
                            
                        
                    
                    
                        
                            
                                e
                            
                            
                                x
                            
                        
                        +
                        
                            
                                e
                            
                            
                                -
                                x
                            
                        
                         
                    
                
            
           ([Lan, 0039-0040] “Depending on whether an additional input is             
                
                    
                        h
                    
                    
                        t
                        -
                        1
                    
                
            
         or            
                
                    
                        h
                    
                    
                        t
                        -
                        1
                    
                
            
         , the LSTM neuron 500 may be regarded as a forward neuron or a backward neuron … (Equation 2, 3, 4, 5, 6) … where              
                
                    
                        i
                    
                    
                        t
                    
                
            
         ,             
                
                    
                        f
                    
                    
                        t
                    
                
            
         ,             
                
                    
                        c
                    
                    
                        t
                    
                
            
         ,             
                
                    
                        o
                    
                    
                        t
                    
                
            
         represents a matrix with elements respectively corresponding to the output responses of the input gates 510, forget gate 540, memory cells 530, and output gates 520 in the LSTM neurons, respectively … it will be noted that the recursive computations in the LSTM neurons may be represented in other forms”, [Lan, 0072-0073; Equation 12] “In some implementations, the processing of the internal units of the LSTM neurons in the horizontal direction along the time domain may not be masked so that information across various time slots in all the units may not be erased … (Equation 12) … During the training process, the output responses of the input gates, forget gates, memory cells, output gates, and final outputs of the LSTM neurons are not masked”, tanh is commonly used mathematical notation therefore it is obvious that tanh function is             
                f
                
                    
                        x
                    
                
                =
                
                    
                        tanh
                    
                    ⁡
                    
                        
                            
                                x
                            
                        
                    
                
                =
                
                    
                        
                            
                                e
                            
                            
                                x
                            
                        
                        -
                        
                            
                                e
                            
                            
                                -
                                x
                            
                        
                    
                    
                        
                            
                                e
                            
                            
                                x
                            
                        
                        +
                        
                            
                                e
                            
                            
                                -
                                x
                            
                        
                         
                    
                
            
          )
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Lan, Silvestri and Rodriguez to use the equation regarding the operation LSTM of Lan to implement the LSTM-based application pre-loading method of Rodriguez and Silvestri. The suggestion and/or motivation for doing so is to improve the prediction accuracy, as using LSTM based machine learning to predict sequential data such as application usage is common practice.
Claim 17 is an apparatus claim having similar limitation to the method claim 10. Therefore, it is rejected with the same rationale.

Claim 14-16 are rejected under 35 U.S.C. 103 over Merry (US 20140373032 A1) in view of Rodriguez (US 20180367484 A1), and further in view of Silvestri (US 20160189049 A1).

Regarding claim 14, Merry teaches wherein the at least one processor configured to using the plurality of groups of usage timing association records to obtain the application predictive model is configured to ([Merry, 0034] “Computer systems 102 may further comprise controller 104 which may in turn have one or more processors (e.g., a CPU and/or GPU) and computer memory, as is known in the art. Computer system 102 may further have operating system 106 installed in memory and working to control the lifecycles of various apps that may be activated by users of the computer system”, teaches uses processor to implement the method, [Merry, 0109] “In reference to FIG. 6, the adaptive predictor may operate on groupings of application usage periods, referred to as "cases". As shown in FIG. 6., one manner of creating cases may be affected by taking groups of (e.g., 3 or any desired number) adjacent application usage periods. It will be appreciated that it is also possible to create cases using other groupings such as current app switch, previous app switch, and any period that falls within the prediction window after the app switch”, teaches the usage timing association records): 
obtaining the plurality of groups of usage timing association records by grouping the usage timing association records ([Merry, 0109] “In reference to FIG. 6, the adaptive predictor may operate on groupings of application usage periods, referred to as "cases". As shown in FIG. 6., one manner of creating cases may be affected by taking groups of (e.g., 3 or any desired number) adjacent application usage periods. It will be appreciated that it is also possible to create cases using other groupings such as current app switch, previous app switch, and any period that falls within the prediction window after the app switch”);
Merry does not specifically teach train the LSTM neural network model according to the usage information, and acquire usage timing association records of at least two applications within a preset time period by sampling a usage log of the at least two applications according to a preset sampling period and associating usage status information of the at least two applications according to sampling time points.
Rodriguez teaches train the LSTM neural network model according to the usage information ([Rodriguez, 0055] “In some implementations, the suggestion server 156 and/or other components of the environment 100 can use machine learning, e.g., use a machine learning model that utilizes machine learning to determine suggested items. For example, in a training stage the suggestion server (or other component) can be trained using training data (e.g., message training data) of actual or generated messages in a messaging application context, and then at an inference stage can determine suggested items to new messages or other data it receives … In some implementations, the suggestion server 156 (or other component) can use one or more of a deep learning model, a logistic regression model, a Long Short Term Memory (LSTM) network, supervised or unsupervised model, etc. Some implementations can also detect image features in images or videos and determine suggested items (e.g., message responses) based on the image features”, [Rodriguez, 0287] “... In some implementations, after the high-confidence suggested response is generated and before it is selected by user input, the associated embedded application can be pre-loaded (e.g., downloaded in the background and stored in local storage) to the user device and its code initiated. This allows an instant display of output from the embedded application on the user device after the high-confidence suggested response is selected by user input. Such pre-loading of embedded applications can be omitted for lower-confidence (e.g., below threshold) suggested responses since they are less likely to be selected”, [Rodriguez, 0243] “In some implementations, machine-learning models may be trained using sample data or training data, e.g., commands and messages actually provided by users in response to embedded application and session events and who consent to provide such data for training purposes. Training data is treated before use to remove user identifiers and other user-related information”, shows that Rodriguez data uses usage information data such as response to embedded application or events as training data).
Merry in view of Rodrigues does not specifically teach acquire usage timing association records of at least two applications within a preset time period by sampling a usage log of the at least two applications according to a preset sampling period and associating usage status information of the at least two applications according to sampling time points.
Silvestri teaches acquire usage timing records of at least two applications within a preset time period by sampling a usage log of the at least two applications according to a preset sampling period and associating usage status information of the at least two applications according to sampling time points ([Silvestri, 0091] “Constructing the Bayesian Network. The first phase is to learn the structure of the Bayesian network. The procedure of constructing the PTAN Bayesian network from the training data may be performed as follows: 1. Based on the training data including a plurality of distributed representations (e.g., from a plurality of users), conditional mutual information between attributes may be computed for each one of a plurality of applications. The function of the conditional mutual information may be defined as follows: …” [Silvestri, 0063, line 2-6] “More particularly, a distributed representation associated with an application action may indicate a sequential pattern of actions occurring prior to the action within a particular time period and/or after the action within a particular time period”, shows that the distributed feature means the sampled application action performed within a particular time period, [Silvestri, 0072, line 10-17] “the pattern may also indicate an amount of time that has lapsed between any of the actions occurring prior to the action, an amount of time that has lapsed between any of the actions occurring after the action, and/or an amount of time that has lapsed between any of the actions occurring both prior to the action and after the action with respect to one another”, discloses the timing records. 
[Silvestri, 0034] “In accordance with various embodiments, the server(s) 102 may have access to one or more user logs 118 (e.g., user databases) into which user information is retained for each of a plurality of users … [0036] For instance, the user's search request may contain any number of parameters, such as user or browser identity and the search terms, which may be retained in the query logs. Additional information related to the search, Such as a timestamp, may also be retained in the query logs along with the search request parameters”, teaches sampling user logs, [Silvestri, Figure 7, 702] “Ascertain whether a threshold amount of contextual information pertaining to usage of at least a portion of a plurality of applications installed on a mobile device…” also discloses the application usage information);

Claim 15 is an apparatus claim having similar limitation to the method claim 5. Therefore, it is rejected with the same rationale as claim 5 above.

Regarding claim 16, Merry in view of Rodriguez, and further in view of Silvestri teaches the terminal device of claim 14, wherein the at least one processor configured to obtain the plurality of groups of usage timing association records by grouping the usage timing{YB:00711615.DOCX }-39- association records is configured to: move forward a sliding window over the usage timing association records of the at least two applications within the preset time period ([Merry, Figure 6; 0110] “To determine the probability of “App X” being switched to in the next prediction window, the predictor may iterate overall of the cases and classify each of them based on their properties. Once the case is classified, the present system may adjust the counts for each class the case matches”, discloses ‘sliding window’ as the launching status of the applications can switch its position to the next prediction window. The predictor iterates through the cases, which also corresponds to the sliding window. [Merry, 0116] “As may be seen, this example considers the data of four apps (i.e., A, B, C and D) over the course of a desired period of time. It will be appreciated that the period of time may be varied according to the desire of the present system—e.g. a day, a week, etc”, discloses the data within preset period (desired period of time) are being considered); and 
determine usage timing association records corresponding to the sliding window at each position as one group of usage timing association records ([Merry, 0110] “To determine the probability of “App X” being switched to in the next prediction window, the predictor may iterate overall of the cases and classify each of them based on their properties. Once the case is classified, the present system may adjust the counts for each class the case matches”, discloses ‘sliding window’ as the launching status of the applications can switch its position to the next prediction window. The predictor iterates through the cases, which also corresponds to the sliding window).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s
disclosure.
Regarding an application usage prediction and pre-loading an application,
Leroux et al, 2013, “Mobile application usage prediction through context-based learning”
US-20190005024-A1
US-20170316324-A1
US-20170098159-A1
US-6330702-B1
US 9929926 B1
Any inquiry concerning this communication or earlier communications from the examiner
should be directed to JUN KWON whose telephone number is (571)272-2072. The examiner can
normally be reached on 7:30 AM - 5:30 PM. If attempts to reach the examiner by telephone are
unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on (571)270-3169. The fax
phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application
Information Retrieval (PAIR) system. Status information for published applications may be obtained
from either Private PAIR or Public PAIR. Status information for unpublished applications is available
through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic
Business Center (EBC) at 866-217-9197 (toll-free).

/JUN KWON/
Examiner, Art Unit 2127

/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127