DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Continued Examination Under 37 CFR 1.114
A Request for Continued Examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. The applicant's submission, the Amendment filed on 26 April 2022, has been entered.

Status of the Claims
The currently pending claims in the present application are claims 1-20 of the Amendment.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-5, 8, 11-14, 17, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Pat. App. Pub. No. 2017/0278115 A1 to Sakaki et al. (“Sakaki”), in view of U.S. Pat. App. Pub. No. 2016/0162779 A1 to Marcus et al. (“Marcus”).1
Regarding independent claim 1, Sakaki teaches the following limitations:
“A system for predicting the probability that an entity will purchase a product within a future time period.” Sakaki teaches, in para. [0003], “a purchasing behavior analysis apparatus including: an acquiring unit that acquires posting information about a specific product from posting information posted to a social networking service” involving “a purchase likelihood probability calculating unit that calculates a value indicating probability that the user will be predicted to purchase the product mentioned in the posting information in the future.”
“A purchase probability predictor comprising one or more computing devices.” Sakaki teaches, in para. [0003], “a purchase likelihood probability calculating unit that calculates a value indicating probability that the user will be predicted to purchase the product mentioned in the posting information in the future.” Sakaki teaches, in para. [0026], “The server apparatus 10 is, for example, a purchasing behavior analysis apparatus.” Sakaki teaches, in para. [0028], “FIG. 2 illustrates the hardware configuration of the server apparatus 10 that functions as the purchasing behavior analysis apparatus in the purchasing behavior analysis system.” Sakaki teaches, in para. [0029], “As illustrated in FIG. 2, the server apparatus 10 includes a CPU 11.” The purchase likelihood probability calculating unit in Sakaki reads on the claimed “purchase probability predictor.” Componentry of the server in Sakaki reads on the claimed “computing devices.”
“A purchase probability prediction computer program having a plurality of sub-programs executable by said computing device or devices, wherein the sub-programs configure said computing device or devices to, receive input data in the form of entries, each entry comprising, an entity identifier that identifies an entity that is a potential purchaser of a product, a product identifier that identifies a product that the entity associated with the entity identifier might purchase based on an interest event that is indicative of the product being relevant to the entity, a time period identifier that specifies a past time period measured backward from a prescribed date of interest to an interest event date corresponding to the date the interest event associated with the entry occurred, and an intensity value indicative of the degree to which the product associated with the product identifier is deemed relevant to the entity associated with the entity identifier.” Sakaki teaches, in para. [0030], “The CPU 11 performs a predetermined process based on a control program stored in the memory 12 or the storage device 13 to control the operation of the server apparatus 10.” Sakaki teaches, in para. [0031], “FIG. 3 is a block diagram illustrating the functional structure of the server apparatus 10 implemented by the execution of the control program.” Sakaki teaches, in para. [0032], “As illustrated in FIG. 3, the server apparatus 10 according to this exemplary embodiment includes an SNS posting information acquiring unit 31, an SNS posting information storing unit 32, a text information extraction unit 33, a distributed representation conversion unit 34, an artificial neural network 35, an interest presence or absence probability calculating unit 36, a purchase desire probability calculating unit 37.” Sakaki teaches, in para. [0033], “The SNS posting information acquiring unit 31 acquires posting information related to a specific product that is requested to be researched from the information posted to SNS. For example, first, the SNS posting information acquiring unit 31 extracts posts about a specific product from the information posted to SNS and specifies the user who mentions the specific product from the posts. Then, the SNS posting information acquiring unit 31 acquires several to several thousands of posts before and after the post about the specific product in terms of time from a series of information posted by the user until now.” Sakaki teaches, in para. [0039], “The interest presence or absence determination layer 41 is a determination layer that receives the distributed representation converted by the distributed representation conversion unit 34 and determines whether the user is interested in the product mentioned in the posting information.” Sakaki teaches, in para. [0043], “The interest presence or absence probability calculating unit 36 calculates a value indicating the probability (degree) that the user will be interested in the product mentioned in the posting information based on the output value from the interest presence or absence determination layer 41.” The control program in Sakaki teaches the claimed “purchase probability prediction computer program.” Aspects of the control program and associated CPU involved in operation of the units in Sakaki read on the claimed “plurality of sub-programs executable by said computing device or devices.” The acquiring of posting information in Sakaki reads on the claimed “receive input data in the form of entries.” The acquiring of posting information related to specific products and the specifying of users mentioning the specific products in Sakaki read on the claimed “each entry comprising, an entity identifier that identifies an entity that is a potential purchaser of a product, a product identifier that identifies a product that the entity associated with the entity identifier might purchase based on an interest event that is indicative of the product being relevant to the entity.” The acquiring of a specified number of posts, indicative of time from posts to the present time, in Sakaki, reads on the claimed “a time period identifier that specifies a past time period measured backward from a prescribed date of interest to an interest event date corresponding to the date the interest event associated with the entry occurred.” The values indicating the probability (degree) that the user will be interested in the product mentioned in the posting information of Sakaki reads on the claimed “an intensity value indicative of the degree to which the product associated with the product identifier is deemed relevant to the entity associated with the entity identifier.”
“Employ a supervised machine learning technique to create a separate initial prediction model for each product of interest in the input data that estimates the probability that an entity in the input data will purchase the product.” Sakaki teaches, in para. [0039], “The interest presence or absence determination layer 41 is a determination layer that receives the distributed representation converted by the distributed representation conversion unit 34 and determines whether the user is interested in the product mentioned in the posting information. The purchase desire determination layer 42 is a determination layer that receives an output value from the interest presence or absence determination layer 41 and determines whether the user wants the product. The purchase likelihood determination layer 43 is a determination layer that receives an output value from the purchase desire determination layer 42 and determines whether the user is predicted to purchase the product in the future.” Sakaki teaches, in para. [0041], “Each of the interest presence or absence determination layer 41, the purchase desire determination layer 42, and the purchase likelihood determination layer 43 performs so-called supervised learning.” Sakaki teaches, in para. [0045], “The purchase likelihood probability calculating unit 38 calculates a value indicating the probability (degree) that the user will be predicted to purchase the product mentioned in the posting information in the future based on the output value from the purchase likelihood determination layer 43.” Use of supervised learning in the purchase-likelihood determination layer in Sakaki reads on the claimed “employ a supervised machine learning technique to create a separate initial prediction model.” The purchase-likelihood determination layer being based on product data and user data in Sakaki reads on the claimed “for each product of interest in the input data.” Operation of the purchase likelihood probability calculating unit to calculate probability values in Sakaki reads on the claimed “estimates the probability that an entity in the input data will purchase the product.”
“Establish a list of entities, the products they are predicted to purchase and the probability of the purchases.” Sakaki teaches, in para. [0068] For example, in purchasing behavior phase determination example 3 illustrated in FIG. 9, as the interest presence or absence probability, output values indicating that the probability of “being interested in the product” is 0.8 and the probability of “not being interested in the product” is 0.2 are obtained. As the purchase desire probability, output values indicating that the probability of “wanting the product” is 0.9 and the probability of “not wanting the product” is 0.1 are obtained. As the purchase likelihood probability, output values indicating that the probability of “purchasing the product” is 0.8 and the probability of “not purchasing the product” is 0.2 are obtained.” Sakaki teaches, in para. [0069], “Therefore, the purchasing behavior phase determining unit 39 determines that the purchasing behavior phase of the user is the purchase prediction phase from these output values.” The generating of lists for users, products, and probabilities (output values), like the list in FIG. 9 of Sakaki, reads on the claimed “establish a list of entities, the products they are predicted to purchase and the probability of the purchases.”
Marcus teaches limitations below of independent claim 1 that do not appear to be explicitly taught in their entirety by Sakaki:
“Generate a matrix from a portion of the input data entries, said matrix generation comprising assigning an entity identifier and product identifier pair associated with each interest event to a different location in the matrix, along with a time identifier indicative of how far back in time the interest event associated with each entity-product identifier pair occurred from the prescribed date of interest.” Marcus teaches, in para. [0023], “A single historical data element 12 may be sent along with a single historical value of the response characteristic to be parsed and processed by a data extraction engine 14. Data extraction engine 14 extracts from single historical data element 12, a plurality of q key-value pairs 16 represented by (K.sub.i,V.sub.i) where i=0, 1, . . . q.” Marcus teaches, in para. [0024], “The extracted key-value pairs (K.sub.i,V.sub.i) may subsequently be projected onto n-axes in an n-dimensional space where each of the n feature values is represented by one of the n-axes, where n is an integer.” Marcus teaches, in para. [0032], “a temporal distance factor may be defined with reference to data created in the past with reference to another time, e.g., today.” Marcus teaches, in para. [0034], “the n-dimensional space may be represented by a data structure or matrix with n-columns representing the n-axes of n-data features.” Establishing the n-dimensional space matrix based on historical data elements and historical values in Marcus reads on the claimed “generate a matrix from a portion of the input data entries.” The establishing of key-value pairs and projecting them onto n-axes in n-dimensional space matrix in Marcus reads on the claimed “assigning” “pair associated with each interest event to a different location in the matrix.” The applying of the temporal distance factors in Marcus reads on the claimed “along with a time identifier indicative of how far back in time the interest event associated with each” “pair occurred from the prescribed date of interest. As explained above, Sakaki already teaches elements (see para. [0033]) reading on the claimed “entity identifier and product identifier pair” and the claimed “entity-product identifier pair,” wherein those elements are analogous to the key-value pairs of Marcus. Additionally or alternatively, Sakaki teaches, in para. [0033], “the SNS posting information acquiring unit 31 acquires several to several thousands of posts before and after the post about the specific product in terms of time from a series of information posted by the user until now,” wherein the time data of Sakaki reads on the claimed “time identifier.”
The claimed “supervised machine learning technique” is one “using the matrix as input.” As explained above, Sakaki already teaches elements (see para. [0041]) that read on the claimed “supervised machine learning technique.” Similarly, Marcus teaches, in para. [0079], “The types of model generators (e.g., training engine 26 in FIG. 1) used in model 30 are given in the summary below: A. Support Vector Machines (SVM) (e.g., supervised learning, autonomous prediction).” As shown in FIG. 1 of Marcus, the model (see second row down from the top) is downstream from the matrices (see first row at the top), and as such, uses the matrices as inputs.
“Validate each initial prediction model by comparing predicted purchases for the product of interest associated with the initial prediction model under consideration derived using the matrix against actual purchases found in the portion of the input data entries not employed to generate the matrix, and iteratively modifying one or more control parameters until the accuracy of the predicted purchases to the actual purchases is maximized.” As explained above, Sakaki teaches purchasing likelihoods (see abstract) that read on the claimed “predicted purchases for the product of interest.” Modifying Sakaki to include the teachings of Marcus would entail applying processes shown in FIG. 1 of Marcus in the context provided by Sakaki. In such a combination, the “VALIDATING MODEL” step in FIG. 1 of Marcus reads on the claimed “validate each initial prediction model.” The “ERROR” calculation in FIG. 1 of Marcus reads on the claimed “comparing predicted purchases for the product of interest associated with the initial prediction model under consideration derived using the matrix against actual purchases found in the portion of the input data entries not employed to generate the matrix,” wherein the absolute value of the difference between predicted values “p” in Marcus read on the claimed “predicted purchases for the product of interest,” their relationship with the “MODEL” in the “TRAINING MODEL” step of FIG. 1 of Marcus reads on the claimed “associated with the initial prediction model under consideration,” the “MODEL” being based on upstream matrices shown in FIG. 1 of Marcus reads on the claimed “derived using the matrix,” the determining of the difference between the predicted values “p” and the historical values V in the “VALIDATING MODEL” step of FIG. 1 of Marcus reads on the claimed “against actual purchases,” the historical values “V” in the “VALIDATING MODEL” step being different from the historical values “V” in the ”TRAINING MODEL” step of FIG. 1 of Marcus (see para. [0049]) reads on the claimed “actual purchases found in the portion of the input data entries not employed to generate the matrix.” Switching between models to select the one with the lowest computed error, by receiving a partially new training set, or by using some different model parameters and some same model parameters, in Marcus (see paras. [0051], [0052]), reads on the claimed “iteratively modifying one or more control parameters until the accuracy of the predicted purchases to the actual purchases is maximized,” wherein seeking the lowest computed error in Marcus (see para. [0052]) is the same as “accuracy” being “maximized.” See the Response to Arguments section below for additional explanation.
“Generate a final matrix from the input data entries, said final matrix generation comprising assigning an entity identifier and product identifier pair associated with each interest event to a different location in the matrix, along with a time identifier indicative of how far back in time the interest event associated with each entity-product identifier pair occurred from the prescribed date of interest.” See the first bullet point concerning Marcus for an explanation of teachings of Marcus used to reject the claimed “generate a matrix” step and the claimed “assigning an entity identifier and product identifier pair associated with each interest event to a different location in the matrix, along with a time identifier indicative of how far back in time the interest event associated with each entity-product identifier pair occurred from the prescribed date of interest.” Marcus teaches in para. [0051], “Retraining model 30 includes receiving a new (or partially new) training set of historical data elements which are used to repeat training method 22 as shown in FIG. 1, inputting different constants, metrics, thresholds, or other model parameters.” Creating matrices during retraining phases in Marcus reads on the claimed “generate a final matrix from the input data entries.” 
“Employ the supervised machine learning technique to create a separate final prediction model for each product of interest in the input data that estimates the probability that an entity in the input data will purchase the product within the future time period using the final matrix, and using the last-modified control parameters established in creating the initial prediction model for each product as input or if no modification was made to a control parameter in creating the initial prediction model for a product employing the initially-used control parameter as the input for that control parameter.” As explained above, Sakaki already teaches elements (see paras. [0041] and [0045], and FIG. 9) that read on the claimed “employ the supervised machine learning technique,” “create a” “prediction model for each product of interest in the input data that estimates the probability that an entity in the input data will purchase the product within the future time period.” Similarly, Marcus teaches, in para. [0079], “The types of model generators (e.g., training engine 26 in FIG. 1) used in model 30 are given in the summary below: A. Support Vector Machines (SVM) (e.g., supervised learning, autonomous prediction).” Marcus teaches in para. [0051], “Retraining model 30 includes receiving a new (or partially new) training set of historical data elements which are used to repeat training method 22 as shown in FIG. 1, inputting different constants, metrics, thresholds, or other model parameters.” Marcus teaches in para. [0053], “prediction method 40 includes applying model 30 to a input vector (V.sub.1, . . . V.sub.n) derived from a received historical data element via mapping method 10 so as to predict a new predicted value P of the response characteristic.” Generating new models as a result of retraining, in Marcus, reads on the claimed “create a separate final prediction model,” use of the same data for the retraining-based matrices to generate the new models during retraining cycles, in Marcus, reads on the claimed “using the final matrix and using the last-modified control parameters established in creating the initial prediction model for each product as input or if not modification was made to a control parameter in creating the initial prediction model for a product employing the initially-used control parameter as the input for that control parameter.” If, hypothetically speaking, Marcus taught inputting different constants, metrics, thresholds, and other model parameters, where none of the prior parameters are re-used, Marcus would not read on the “using the last-modified control parameters” and “employing the initially-used control parameter” claim limitations. Marcus, however, does not require all constraints being different. Rather, due to the “or” operator, Marcus teaches instances where, perhaps, different constants are used, but the same metrics, thresholds, and/or other model parameters are used. In such a scenario, whatever metrics, thresholds, and/or other model parameters that remain the same, form a group that reads on the claimed “using the last-modified control parameters” and “employing the initially-used control parameter” claim limitations. Additionally or alternatively, Marcus teaches, in para. [0083], “Step 4: The predicted values are compared to the actual values. Various measures of the quality or error of the prediction include: RMSE (Root Mean Square Error), MAE (Mean Absolute Error), StdDev (Standard Deviation), etc. If these error values are outside of an expected range, steps [1-4] may be rerun with internal tuning parameters adjusted or new training data.” Notice the “or” operator near the end of the cited passage. Re-running the steps with the same internal tuning parameters, but with new training data, in Marcus, reads on the claimed “using the last-modified control parameters” and “employing the initially-used control parameter” claim limitations.”
“For each product, using the input data, apply the finalized prediction model associated with that product to estimate the probability that an entity will purchase the product within the future time period.” As explained above, Sakaki already teaches elements that read on the claimed “For each product, using the input data, apply the” “prediction model associated with that product to estimate the probability that an entity will purchase the product within the future time period.” Marcus teaches in para. [0051], “Retraining model 30 includes receiving a new (or partially new) training set of historical data elements which are used to repeat training method 22 as shown in FIG. 1, inputting different constants, metrics, thresholds, or other model parameters.” Use of downstream (retrained) models in Marcus reads on the claimed “apply the finalized prediction model.”
Marcus describes, in its abstract, “A method of machine learning for generating a predictive model of a response characteristic;” and teaches in para. [0055], “Embodiments of the invention for modelling and predicting metrics may be applied to modeling user behavior to improve many technological fields, such as for example” “shopping behavior,” similar to the claimed invention and Sakaki. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the supervised machine learning features and processes of Sakaki, to include the pairs, matrices, models, and retraining features and processes of Marcus, to find the optimal mix of features for predictive models that yield a good prediction error, as taught by Marcus (see para. [0077].)
Regarding claim 2, the combination of Sakaki and Marcus teaches the following limitations:
“The system of Claim 1, wherein said interest event that is indicative of the product being relevant to the entity comprises one of the entity expressing an interest in the product in a communication or a third party mentioning the entity and the product in a communication.” Sakaki teaches, in para. [0083], “The example of the SNS posting information illustrated in FIG. 11 is an example of posting information in Twitter (registered trademark) and the user mentions a smart phone with the name “∘∘∘ phone” three times in the posts.” Sakaki teaches, in para. [0084], “The user posted a message indicating that “the president of a Z company gives a presentation on a new product of the ∘∘∘ phone” on Sep. 14, 2015. Therefore, it is presumed that the user is interested in the product.” The posting information related to the phone in Sakaki reads on the claimed “interest event that is indicative of the product being relevant to the entity.” The posts expressing interest in the phone in Sakaki read on the claimed “entity expressing an interest in the product.”
Regarding claim 3, the combination of Sakaki and Marcus teaches the following limitations:
“The system of Claim 1, wherein the intensity value indicative of the degree to which the product associated with the product identifier is deemed relevant to the entity associated with the entity identifier in an entry comprises the number of times an interest event in the input data is associated with the same entity and product.” Sakaki teaches, in para. [0068], “in purchasing behavior phase determination example 3 illustrated in FIG. 9, as the interest presence or absence probability, output values indicating that the probability of “being interested in the product” is 0.8 and the probability of “not being interested in the product” is 0.2 are obtained.” Sakaki teaches, in para. [0083], “The example of the SNS posting information illustrated in FIG. 11 is an example of posting information in Twitter (registered trademark) and the user mentions a smart phone with the name “∘∘∘ phone” three times in the posts.” The probability values in Sakaki read on the claimed “intensity value indicative of the degree to which the product associated with the product identifier is deemed relevant to the entity associated with the entity identifier in an entry.” The probability values being determined using posting information in Sakaki reads on the claimed “times an interest event in the input data is associated with the same entity and product.” Marcus teaches, in para. [0010], “the response characteristic is selected from the group consisting of a number of clicks; a number of times that a web page is shared, saved or viewed; and a number of times that a user clicks on a specific button, icon or image on a web page.” The number of clicks in Marcus reads on the claimed “number of times.” The rationales for combining the teachings of Sakaki and Marcus in the rejection of claim 1 also apply to this rejection of claim 3.
Regarding claim 4, the combination of Sakaki and Marcus teaches the following limitations:
“The system of Claim 1, wherein the sub-program for generating a matrix from a portion of the input data entries comprises: mapping each input data entry onto a timeline based on the entry’s time period identifier; splitting the timeline so that a prescribed percentage of the entries closest to the prescribed date of interest are designated as test entries and the remaining entries are designated as training entries.” Marcus teaches, in para. [0076], “For each feature (in categories such as location, job title, category, seasonality, environmental, and/or complements and combinations), the predictive model may input historical data (e.g., historical number of clicks) collected for that classification over a predetermined period of time (e.g., 12 months or other suitable period of time.” Marcus teaches, in para. [0077], “The vectors may be used to train the model with historical correlations between job classifications and performance metrics (“training phase”, e.g., training 22 method in FIG. 1). The predictive model may be used to predict future performance metrics for a new job posting (e.g., predicted number of clicks), before the job posting is ever posted. The processor may then compare the predicted future metrics (e.g., p.sup.a, p.sup.b, . . . p.sup.k in FIG. 1) with actual metrics such as the actual number of clicks (e.g., V.sub.0.sup.a, V.sub.0.sup.b, . . . V.sub.0.sup.k in FIG. 1) collected during a second (e.g., more recent) predetermined period of time (e.g., the previous full month) to verify the accuracy of the model (“verification phase”, e.g., verification 32 method in FIG. 1).” The tracking of metrics over predetermined periods of time in Marcus, wherein the metrics are then used for subsequent retraining in Marcus, reads on the claimed “the sub-program for generating a matrix from a portion of the input data entries comprises: mapping each input data entry onto a timeline based on the entry’s time period identifier,” wherein the basis for establishing the contours and boundaries of the predetermined period of time in Marcus reads on the claimed “timeline.” The designating of verification metrics collected during the more recent predetermined period of time in Marcus, the metrics being distinguishable from other (remaining) training metrics collected during the predetermined period of time in Marcus, reads on the claimed “splitting the timeline so that a prescribed percentage of the entries closest to the prescribed date of interest are designated as test entries and the remaining entries are designated as training entries,” wherein the verification metrics in Marcus read on the claimed “test entries” and the training metrics in Marcus reads on the claimed “training entries.”
“For the portion of the timeline comprising training entries, stepping a time window of a prescribed size over the timeline starting at the time corresponding to the mapped entry having largest time period identifier and moving forward in time a prescribed stride amount with each successive step.” See the immediately preceding bullet point, and in particular, the reproduced passages from Marcus and the rejection rationales. The predetermined period of time in Marcus reads on the claimed “portion of the timeline comprising training entries.” Marcus teaches, in para. [0032], “(2) Temporal Distance: A similar approach may be used for relating non-numerical values of time. In some embodiments, a temporal distance factor may be defined with reference to data created in the past with reference to another time, e.g., today. For example, using a temporal granularity of 1 month and assuming that the temporal reference is November, 2015, to project data created November, 2015, (distance unit of 0) the 1/(1+D.sup.2) factor is 1. Data created in September, 2015 has 2 (temporal) distance units, or an orthogonality relationship or projection factor of 1/(1+2.sup.2)=⅕, e.g., 20%.” The use of temporal granularities in Marcus reads on the claimed “stepping a time window of a prescribed size over the timeline starting at the time corresponding to the mapped entry having largest time period identifier and moving forward in time a prescribed stride amount with each successive step,” wherein the granularity reads on the claimed “time window,” applying distance units relating to older dates (e.g., September in the example provided by Marcus) reads on the claimed “starting at the time corresponding to the mapped entry having largest time period identifier,” and applying distance units relating to more recent dates (e.g., November in the example) according to the granularity value reads on the claimed “moving forward in time a prescribed stride amount with each successive step.”
“At each step of the time window, creating an entity identifier and product identifier pair for each entry mapped onto the timeline that falls within the current time window step, and assigning each created pair to a different location in the matrix and associate a time window identifier assigned to the current time window step with the created pair whenever an entity identifier and product identifier pair corresponding to the same interest event as the created pair is not already assigned to a location in the matrix, and whenever an entity identifier and product identifier pair corresponding to the same interest event as the created pair is already assigned to a location in the matrix, associating a time window identifier assigned to the current time window step with the entity identifier and product identifier pair corresponding to the same interest event as the created pair.” Sakaki teaches, in para. [0033], “The SNS posting information acquiring unit 31 acquires posting information related to a specific product that is requested to be researched from the information posted to SNS. For example, first, the SNS posting information acquiring unit 31 extracts posts about a specific product from the information posted to SNS and specifies the user who mentions the specific product from the posts. Then, the SNS posting information acquiring unit 31 acquires several to several thousands of posts before and after the post about the specific product in terms of time from a series of information posted by the user until now.” See the cited passage of Marcus from the immediately preceding bullet point. Marcus teaches, in para. [0023], “A single historical data element 12 may be sent along with a single historical value of the response characteristic to be parsed and processed by a data extraction engine 14. Data extraction engine 14 extracts from single historical data element 12, a plurality of q key-value pairs 16 represented by (K.sub.i,V.sub.i) where i=0, 1, . . . q.” Marcus teaches, in para. [0024], “The extracted key-value pairs (K.sub.i,V.sub.i) may subsequently be projected onto n-axes in an n-dimensional space where each of the n feature values is represented by one of the n-axes, where n is an integer.” Marcus teaches, in para. [0034], “the n-dimensional space may be represented by a data structure or matrix with n-columns representing the n-axes of n-data features.” Each unit of temporal granularity in Marcus reads on the claimed “at each step of the time window.” Identifying position information related to specific products and specific users during the specified time in Sakaki, for each of the units of temporal granularity in Marcus, reads on the claimed “creating an entity identifier and product identifier pair for each entry mapped onto the timeline that falls within the current time window step." The projecting of pairs onto the n-dimensional space matrices during a first cycle of the process shown in FIG. 1 of Marcus, wherein the pairs have assigned temporal granularity units, reads on the claimed “assigning each created pair to a different location in the matrix and associate a time window identifier assigned to the current time window step with the created pair whenever an entity identifier and product identifier pair corresponding to the same interest event as the created pair is not already assigned to a location in the matrix.” The projecting of pairs onto the n-dimensional space matrices during a subsequent cycle of the process shown in FIG. 1 of Marcus, wherein the pairs of have assigned temporal granularity units, reads on the claimed “whenever an entity identifier and product identifier pair corresponding to the same interest event as the created pair is already assigned to a location in the matrix, associating a time window identifier assigned to the current time window step with the entity identifier and product identifier pair corresponding to the same interest event as the created pair.” In other words, when pairs used in an initial cycle of the process in FIG. 1 of Marcus overlap with pairs used in a subsequent cycle of the process in FIG. 1 of Marcus, such a scenario reads on the claimed “already assigned” condition. One exemplary instance in which the scenario would take place in Marcus is when retraining models use partially new training sets (see para. [0051]). The rationales for combining the teachings of Sakaki and Marcus in the rejection of claim 1 also apply to this rejection of claim 4.
Regarding claim 5, the combination of Sakaki and Marcus teaches the following limitations:
“The system of Claim 4, wherein the sub-program for generating a matrix from a portion of the input data entries, further comprises splitting the timeline so that 30% of the entries closest to the prescribed date of interest are designated as test entries and the remaining 70% of the entries are designated as training entries.” Marcus teaches, in para. [0076], “For each feature (in categories such as location, job title, category, seasonality, environmental, and/or complements and combinations), the predictive model may input historical data (e.g., historical number of clicks) collected for that classification over a predetermined period of time (e.g., 12 months or other suitable period of time when seasonality is a factor) into the support vector machine, neural network or any linear regression model.” Marcus teaches, in para. [0077], “The predictive model may be used to predict future performance metrics for a new job posting (e.g., predicted number of clicks), before the job posting is ever posted. The processor may then compare the predicted future metrics (e.g., p.sup.a, p.sup.b, . . . p.sup.k in FIG. 1) with actual metrics such as the actual number of clicks (e.g., V.sub.0.sup.a, V.sub.0.sup.b, . . . V.sub.0.sup.k in FIG. 1) collected during a second (e.g., more recent) predetermined period of time (e.g., the previous full month) to verify the accuracy of the model (“verification phase”, e.g., verification 32 method in FIG. 1).” The establishing of the predetermined periods of time and the second predetermined periods of time in Marcus reads on the claimed “wherein the sub-program for generating a matrix from a portion of the input data entries, further comprises splitting the timeline.” The second predetermined period of time being a full month in Marcus, as part of the verification phase, reads on the claimed “30% of the entries closest to the prescribed date of interest are designated as test entries.” The overall predetermined period of time being 12 months or any other suitable time period in Marcus, as part of the training phase, reads on the claimed “the remaining 70% of the entries are designated as training entries.” The rationales for combining the teachings of Sakaki and Marcus in the rejection of claim 1 also apply to this rejection of claim 5.
Regarding claim 8, while the claim is of different scope relative to claim 4, the claim nevertheless recites limitations similar to the limitations recited by claim 4. Further, while claim 8 recites a “final matrix” and claim 4 recites a “matrix,” Marcus teaches both matrices in that running a first iteration of the process shown in FIG. 1 of Marcus involves one or more matrices that read on the claimed “matrix,” and running a second iteration of the process shown in FIG. 1 of Marcus for retraining purposes involves one or more matrices that read on the claimed “final matrix.” Additionally or alternatively, whatever matrix was generated in Marcus just prior to the last cycle or iteration in Marcus reads on the claimed “final matrix.” For at least these reasons, claim 8 is rejected under 35 USC 103 as being obvious in view of the combination of Sakaki and Marcus, at least for the same reasons as claim 4.
Regarding claim 11, the combination of Sakaki and Marcus teaches the following limitations:
“The system of Claim 1, wherein the future time period is 200 days starting from the prescribed date of interest.” With respect to FIG. 11 of Sakaki, the exemplary date of September 14, 2015, reads on the claimed “prescribed date of interest;” and the span between September 14, 2015, and the exemplary date of January 18, 2016, reads on the claimed “future time period” “starting from the prescribed date of interest.” While the exemplary span in Sakaki is around 126 days, one of ordinary skill would recognize that the specific dates in Sakaki are only examples, and any date range could have been used, including one amount to 200 days or any other amount.
Regarding claims 12-14 and 17, while the claims are of different scope relative to claims 1, 2, 4, and 8, respectively, the claims nevertheless recite limitations similar to the limitations of claims 1, 2, 4, and 8. Further, Sakaki teaches a product category embodiment (see FIG. 15) that reads on the differences between claims 1, 2, 4, and 8 and claims 12-14 and 17, including consideration of product categories rather than (or in addition to) consideration of products themselves. The consideration of product categories in Sakaki reads on the “product category” limitations of claims 12-14 and 17. Claims 12-14 and 17 are, therefore, rejected under 35 USC 103 as obvious in view of the combination of Sakaki and Marcus at least for the same reasons as claims 1, 2, 4, and 8.
Regarding independent claim 20, while the claim is of different scope relative to independent claim 1, the claim nevertheless recites limitations similar to the limitations of claim 1. Claim 20 is, therefore, rejected under 35 USC 103 as obvious in view of the combination of Sakaki and Marcus at least for the same reasons as claim 1.
Claims 6 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Sakaki, in view of Marcus, and further in view of U.S. Pat. App. Pub. No. 2016/0086222 A1 to Kurapati (“Kurapati”).
Regarding claim 6, the combination of Sakaki and Marcus teaches the following limitations:
“The system of Claim 1, wherein the sub-program for employing a supervised machine learning technique to create a separate initial prediction model for each product of interest in the input data comprises employing a logistic regression technique.” As explained above, the combination of Sakaki and Marcus teaches elements that read on the claimed “sub-program for employing a supervised machine learning technique to create a separate initial prediction model for each product of interest in the input data.” Sakaki teaches, in para. [0046], “a method using an artificial neural network or a method using logistic regression,” which reads on the claimed “employing a logistic regression technique.”
Kurapati teaches limitations below of claim 1 that do not appear to be explicitly taught in their entirety by the combination of Sakaki and Marcus:
The claimed “logistic regression technique” is one “with elastic net regularization.” Kurapati teaches, in para. [0385], “A machine learning process 7202 can be applied to the obtained data, which may be a form of predictive modeling, optionally using or including techniques, or combinations of techniques, such as logistic regression, neural nets, algorithms such as lasso algorithms, models, such as elastic-net regularized, generalized linear and non-linear models, support vector machines (SVM), ensembles of decision trees, “random forests,” and the like.” Use of the elastic-net regularized technique in Kurapati teaches the claimed “with elastic net regularization.”
Kurapati describes machine learning processes in the form of predictive modeling (see para. [0385]), similar to the claimed invention and to the combination of Sakaki and Marcus. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the machine learning of the combination of Sakaki and Marcus, to include the elastic-net regularized technique of Kurapati, because different machine learning techniques can be used in combination for purposes of learning, and predicting, user behavior, as taught by Kurapati (see para. [0385]).
Regarding claim 15, while the claim is of different scope relative to claim 6, the claim nevertheless recites limitations similar to the limitations of claim 6. Further, Sakaki teaches a product category embodiment (see FIG. 15) that reads on the differences between claims 6 and 15, including consideration of product categories rather than (or in addition to) consideration of products themselves. The consideration of product categories in Sakaki reads on the “product category” limitation of claim 15. Claim 15 is, therefore, rejected under 35 USC 103 as obvious in view of the combination of Sakaki, Marcus, and Kurapati at least for the same reasons as claim 6.
Claims 7 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Sakaki, in view of Marcus, further in view of Kurapati, and further in view of U.S. Pat. App. Pub. No. 2006/0036510 A1 to Westphal et al. (“Westphal”).
Regarding claim 7, Westphal teaches limitations below that do not appear to be explicitly taught in their entirety by the combination of Sakaki, Marcus, and Kurapati:
“The system of Claim 6, further comprising a sub-program for eliminating, during the creation of the initial prediction models, probability estimates for entities that are known to already have the product or a product from a same category of products as the product.” As explained above, the combination of Sakaki, Marcus, and Kurapati already teaches elements that read on the claimed “creation of the initial prediction models” and “probability estimates for entities,” wherein the elements use or otherwise rely on forms of data. Westphal teaches, in para. [0044], “the system and method is particularly interested in pairings that include items of differing categories and, as such, product pairings of the same category may be simply ignored during the method steps which follow.” The ignoring of data for product pairings of the same category in Westphal reads on the claimed “eliminating” data “for entities that are known to already have the product or a product from a same category of products as the product.” Applying the selective ignoring in Westphal to the data in the combination of Sakaki and Marcus, teaches the limitations of the claim.
Westphal describes, in para. [0002], “directing a customer to additional purchasing opportunities, similar to the claimed invention and to the combination of Sakaki, Marcus, and Kurapati. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the prediction modelling aspects of the combination of Sakaki, Marcus, and Kurapati, to include operations in which same category data is ignored as in Westphal, to focus on product categories that can lead to additional purchasing opportunities, as taught by Westphal (see para. [0046]).
Regarding claim 16, while the claim is of different scope relative to claim 7, the claim nevertheless recites limitations similar to the limitations of claim 7. Further, Sakaki teaches a product category embodiment (see FIG. 15) that reads on the differences between claims 7 and 16, including consideration of product categories rather than (or in addition to) consideration of products themselves. The consideration of product categories in Sakaki reads on the “product category” limitation of claim 16. Claim 16 is, therefore, rejected under 35 USC 103 as obvious in view of the combination of Sakaki, Marcus, Kurapati, and Westphal at least for the same reasons as claim 7.
Claims 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Sakaki, in view of Marcus, and further in view of U.S. Pat. App. Pub. No. 2014/0081931 A1 to Kung et al. (“Kung”).
Regarding claim 9, Kung teaches limitations below that do not appear to be explicitly taught in their entirety by the combination of Sakaki and Marcus:
“The system of Claim 1, further comprising a sub-program for eliminating the input data entries deemed likely to be inaccurate, prior to executing the sub-program for generating the matrix, said eliminating comprising: identifying outlier entries in the input data using a seasonal ESD test on the time period identifiers and intensity values of the input data; and eliminating identified outlier entries from the input data up to a prescribed percentage of the entries.” Kung teaches, in para. [0012], “validation rules for data are automatically generated through profiling. Outliers are detected by examining patterns of data and selecting values that are not repeated enough in the full data set to exceed a predetermined threshold.” Kung teaches, in para. [0048], “Another alternative to the Grubbs test is the Generalized Extreme Studentized Deviate (ESD) Test. The ESD test is essentially the Grubbs' test applied sequentially. Given the upper bound, r, the generalized ESD test essentially performs r separate tests; a test for one outlier, a test for two outliers, and so on up to r outliers.” Kung teaches, in para. [0067], “The ERP 600 can then also utilize the stored validation rules from the data warehouse 612 when receiving future data from data sources 602a-602c, using the rules to reject bad data (or even correct the bad data).” Elements of the system in Kung that perform the validating read on the claimed “sub-program for eliminating the input data entries deemed likely to be inaccurate.” When implemented in the framework of the combination of Sakaki and Marcus, for example where data is input into the framework, such an implementation would read on the claimed “prior to executing the sub-program for generating the matrix.” Application of the ESD Test in Kung to identify outliers, when implemented in the aforementioned framework of the combination of Sakaki and Marcus, reads on the claimed “identifying outlier entries in the input data using a seasonal ESD test on the time period identifiers and intensity values of the input data.” The rejecting of outliers in Kung reads on the claimed “eliminating identified outlier entries from the input data up to a prescribed percentage of the entries.”
Kung describes aspects of information flows used in business systems (see para. [0002]), similar to the claimed invention and to the combination of Sakaki and Marcus. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the processes of the combination of Sakaki and Marcus, to include the EST testing aspects of Kung, when the combination receives input data, to ensure that bad data is rejected or corrected, as taught by Kung (see para. [0067]).
Regarding claim 18, while the claim is of different scope relative to claim 9, the claim nevertheless recites limitations similar to the limitations of claim 9. Claim 18 is, therefore, rejected under 35 USC 103 as obvious in view of the combination of Sakaki, Marcus, and Kung at least for the same reasons as claim 9.
Claims 10 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Sakaki, in view of Marcus, and further in view of Westphal.
Regarding claim 10, while the claim is of different scope relative to claim 7, the claim nevertheless recites limitations similar to the limitations recited by claim 7. Further, while claim 10 recites aspects not found in claim 7, including “after executing the sub-program for applying the finalized prediction model associated with each product to the input data to estimate the probability that an entity will purchase the product within the future time period,” Marcus teaches such timing in that steps performed in a subsequent (retraining) iteration of the processes of FIG. 1 of Marcus occur after earlier iterations, thus reading on the claimed “after executing” limitations. For at least these reasons, claim 10 is rejected under 35 USC 103 as being obvious in view of the combination of Sakaki, Marcus, and Westphal at least for the same reasons as claim 7.
Regarding claim 19, while the claim is of different scope relative to claim 10, the claim nevertheless recites limitations similar to the limitations of claim 10. Further, Sakaki teaches a product category embodiment (see FIG. 15) that reads on the differences between claims 10 and 19, including consideration of product categories rather than (or in addition to) consideration of products themselves. The consideration of product categories in Sakaki reads on the “product category” limitation of claim 19. Claim 19 is, therefore, rejected under 35 USC 103 as obvious in view of the combination of Sakaki, Marcus, and Westphal at least for the same reasons as claim 10.
Claims 1-5, 8, 11-14, 17, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Sakaki, in view of Marcus, and further in view of U.S. Pat. App. Pub. No. 2017/0220939 A1 to Bansal et al. (“Bansal”).2
Regarding independent claim 1, Sakaki teaches the following limitations:
“A system for predicting the probability that an entity will purchase a product within a future time period.” Sakaki teaches, in para. [0003], “a purchasing behavior analysis apparatus including: an acquiring unit that acquires posting information about a specific product from posting information posted to a social networking service” involving “a purchase likelihood probability calculating unit that calculates a value indicating probability that the user will be predicted to purchase the product mentioned in the posting information in the future.”
“A purchase probability predictor comprising one or more computing devices.” Sakaki teaches, in para. [0003], “a purchase likelihood probability calculating unit that calculates a value indicating probability that the user will be predicted to purchase the product mentioned in the posting information in the future.” Sakaki teaches, in para. [0026], “The server apparatus 10 is, for example, a purchasing behavior analysis apparatus.” Sakaki teaches, in para. [0028], “FIG. 2 illustrates the hardware configuration of the server apparatus 10 that functions as the purchasing behavior analysis apparatus in the purchasing behavior analysis system.” Sakaki teaches, in para. [0029], “As illustrated in FIG. 2, the server apparatus 10 includes a CPU 11.” The purchase likelihood probability calculating unit in Sakaki reads on the claimed “purchase probability predictor.” Componentry of the server in Sakaki reads on the claimed “computing devices.”
“A purchase probability prediction computer program having a plurality of sub-programs executable by said computing device or devices, wherein the sub-programs configure said computing device or devices to, receive input data in the form of entries, each entry comprising, an entity identifier that identifies an entity that is a potential purchaser of a product, a product identifier that identifies a product that the entity associated with the entity identifier might purchase based on an interest event that is indicative of the product being relevant to the entity, a time period identifier that specifies a past time period measured backward from a prescribed date of interest to an interest event date corresponding to the date the interest event associated with the entry occurred, and an intensity value indicative of the degree to which the product associated with the product identifier is deemed relevant to the entity associated with the entity identifier.” Sakaki teaches, in para. [0030], “The CPU 11 performs a predetermined process based on a control program stored in the memory 12 or the storage device 13 to control the operation of the server apparatus 10.” Sakaki teaches, in para. [0031], “FIG. 3 is a block diagram illustrating the functional structure of the server apparatus 10 implemented by the execution of the control program.” Sakaki teaches, in para. [0032], “As illustrated in FIG. 3, the server apparatus 10 according to this exemplary embodiment includes an SNS posting information acquiring unit 31, an SNS posting information storing unit 32, a text information extraction unit 33, a distributed representation conversion unit 34, an artificial neural network 35, an interest presence or absence probability calculating unit 36, a purchase desire probability calculating unit 37.” Sakaki teaches, in para. [0033], “The SNS posting information acquiring unit 31 acquires posting information related to a specific product that is requested to be researched from the information posted to SNS. For example, first, the SNS posting information acquiring unit 31 extracts posts about a specific product from the information posted to SNS and specifies the user who mentions the specific product from the posts. Then, the SNS posting information acquiring unit 31 acquires several to several thousands of posts before and after the post about the specific product in terms of time from a series of information posted by the user until now.” Sakaki teaches, in para. [0039], “The interest presence or absence determination layer 41 is a determination layer that receives the distributed representation converted by the distributed representation conversion unit 34 and determines whether the user is interested in the product mentioned in the posting information.” Sakaki teaches, in para. [0043], “The interest presence or absence probability calculating unit 36 calculates a value indicating the probability (degree) that the user will be interested in the product mentioned in the posting information based on the output value from the interest presence or absence determination layer 41.” The control program in Sakaki teaches the claimed “purchase probability prediction computer program.” Aspects of the control program and associated CPU involved in operation of the units in Sakaki read on the claimed “plurality of sub-programs executable by said computing device or devices.” The acquiring of posting information in Sakaki reads on the claimed “receive input data in the form of entries.” The acquiring of posting information related to specific products and the specifying of users mentioning the specific products in Sakaki read on the claimed “each entry comprising, an entity identifier that identifies an entity that is a potential purchaser of a product, a product identifier that identifies a product that the entity associated with the entity identifier might purchase based on an interest event that is indicative of the product being relevant to the entity.” The acquiring of a specified number of posts, indicative of time from posts to the present time, in Sakaki, reads on the claimed “a time period identifier that specifies a past time period measured backward from a prescribed date of interest to an interest event date corresponding to the date the interest event associated with the entry occurred.” The values indicating the probability (degree) that the user will be interested in the product mentioned in the posting information of Sakaki reads on the claimed “an intensity value indicative of the degree to which the product associated with the product identifier is deemed relevant to the entity associated with the entity identifier.”
“Employ a supervised machine learning technique to create a separate initial prediction model for each product of interest in the input data that estimates the probability that an entity in the input data will purchase the product.” Sakaki teaches, in para. [0039], “The interest presence or absence determination layer 41 is a determination layer that receives the distributed representation converted by the distributed representation conversion unit 34 and determines whether the user is interested in the product mentioned in the posting information. The purchase desire determination layer 42 is a determination layer that receives an output value from the interest presence or absence determination layer 41 and determines whether the user wants the product. The purchase likelihood determination layer 43 is a determination layer that receives an output value from the purchase desire determination layer 42 and determines whether the user is predicted to purchase the product in the future.” Sakaki teaches, in para. [0041], “Each of the interest presence or absence determination layer 41, the purchase desire determination layer 42, and the purchase likelihood determination layer 43 performs so-called supervised learning.” Sakaki teaches, in para. [0045], “The purchase likelihood probability calculating unit 38 calculates a value indicating the probability (degree) that the user will be predicted to purchase the product mentioned in the posting information in the future based on the output value from the purchase likelihood determination layer 43.” Use of supervised learning in the purchase-likelihood determination layer in Sakaki reads on the claimed “employ a supervised machine learning technique to create a separate initial prediction model.” The purchase-likelihood determination layer being based on product data and user data in Sakaki reads on the claimed “for each product of interest in the input data.” Operation of the purchase likelihood probability calculating unit to calculate probability values in Sakaki reads on the claimed “estimates the probability that an entity in the input data will purchase the product.”
“Establish a list of entities, the products they are predicted to purchase and the probability of the purchases.” Sakaki teaches, in para. [0068] For example, in purchasing behavior phase determination example 3 illustrated in FIG. 9, as the interest presence or absence probability, output values indicating that the probability of “being interested in the product” is 0.8 and the probability of “not being interested in the product” is 0.2 are obtained. As the purchase desire probability, output values indicating that the probability of “wanting the product” is 0.9 and the probability of “not wanting the product” is 0.1 are obtained. As the purchase likelihood probability, output values indicating that the probability of “purchasing the product” is 0.8 and the probability of “not purchasing the product” is 0.2 are obtained.” Sakaki teaches, in para. [0069], “Therefore, the purchasing behavior phase determining unit 39 determines that the purchasing behavior phase of the user is the purchase prediction phase from these output values.” The generating of lists for users, products, and probabilities (output values), like the list in FIG. 9 of Sakaki, reads on the claimed “establish a list of entities, the products they are predicted to purchase and the probability of the purchases.”
Marcus teaches limitations below of independent claim 1 that do not appear to be explicitly taught in their entirety by Sakaki:
“Generate a matrix from a portion of the input data entries, said matrix generation comprising assigning an entity identifier and product identifier pair associated with each interest event to a different location in the matrix, along with a time identifier indicative of how far back in time the interest event associated with each entity-product identifier pair occurred from the prescribed date of interest.” Marcus teaches, in para. [0023], “A single historical data element 12 may be sent along with a single historical value of the response characteristic to be parsed and processed by a data extraction engine 14. Data extraction engine 14 extracts from single historical data element 12, a plurality of q key-value pairs 16 represented by (K.sub.i,V.sub.i) where i=0, 1, . . . q.” Marcus teaches, in para. [0024], “The extracted key-value pairs (K.sub.i,V.sub.i) may subsequently be projected onto n-axes in an n-dimensional space where each of the n feature values is represented by one of the n-axes, where n is an integer.” Marcus teaches, in para. [0032], “a temporal distance factor may be defined with reference to data created in the past with reference to another time, e.g., today.” Marcus teaches, in para. [0034], “the n-dimensional space may be represented by a data structure or matrix with n-columns representing the n-axes of n-data features.” Establishing the n-dimensional space matrix based on historical data elements and historical values in Marcus reads on the claimed “generate a matrix from a portion of the input data entries.” The establishing of key-value pairs and projecting them onto n-axes in n-dimensional space matrix in Marcus reads on the claimed “assigning” “pair associated with each interest event to a different location in the matrix.” The applying of the temporal distance factors in Marcus reads on the claimed “along with a time identifier indicative of how far back in time the interest event associated with each” “pair occurred from the prescribed date of interest. As explained above, Sakaki already teaches elements (see para. [0033]) reading on the claimed “entity identifier and product identifier pair” and the claimed “entity-product identifier pair,” wherein those elements are analogous to the key-value pairs of Marcus. Additionally or alternatively, Sakaki teaches, in para. [0033], “the SNS posting information acquiring unit 31 acquires several to several thousands of posts before and after the post about the specific product in terms of time from a series of information posted by the user until now,” wherein the time data of Sakaki reads on the claimed “time identifier.”
The claimed “supervised machine learning technique” is one “using the matrix as input.” As explained above, Sakaki already teaches elements (see para. [0041]) that read on the claimed “supervised machine learning technique.” Similarly, Marcus teaches, in para. [0079], “The types of model generators (e.g., training engine 26 in FIG. 1) used in model 30 are given in the summary below: A. Support Vector Machines (SVM) (e.g., supervised learning, autonomous prediction).” As shown in FIG. 1 of Marcus, the model (see second row down from the top) is downstream from the matrices (see first row at the top), and as such, uses the matrices as inputs.
“Validate each initial prediction model by comparing predicted purchases for the product of interest associated with the initial prediction model under consideration derived using the matrix against actual purchases found in the portion of the input data entries not employed to generate the matrix, and iteratively modifying one or more control parameters until the accuracy of the predicted purchases to the actual purchases is maximized.” As explained above, Sakaki teaches purchasing likelihoods (see abstract) that read on the claimed “predicted purchases for the product of interest.” Modifying Sakaki to include the teachings of Marcus would entail applying processes shown in FIG. 1 of Marcus in the context provided by Sakaki. In such a combination, the “VALIDATING MODEL” step in FIG. 1 of Marcus reads on the claimed “validate each initial prediction model.” The “ERROR” calculation in FIG. 1 of Marcus reads on the claimed “comparing predicted purchases for the product of interest associated with the initial prediction model under consideration derived using the matrix against actual purchases found in the portion of the input data entries not employed to generate the matrix,” wherein the absolute value of the difference between predicted values “p” in Marcus read on the claimed “predicted purchases for the product of interest,” their relationship with the “MODEL” in the “TRAINING MODEL” step of FIG. 1 of Marcus reads on the claimed “associated with the initial prediction model under consideration,” the “MODEL” being based on upstream matrices shown in FIG. 1 of Marcus reads on the claimed “derived using the matrix,” the determining of the difference between the predicted values “p” and the historical values V in the “VALIDATING MODEL” step of FIG. 1 of Marcus reads on the claimed “against actual purchases,” the historical values “V” in the “VALIDATING MODEL” step being different from the historical values “V” in the ”TRAINING MODEL” step of FIG. 1 of Marcus (see para. [0049]) reads on the claimed “actual purchases found in the portion of the input data entries not employed to generate the matrix.” Switching between models to select the one with the lowest computed error, by receiving a partially new training set, or by using some different model parameters and some same model parameters, in Marcus (see paras. [0051], [0052]), reads on the claimed “iteratively modifying one or more control parameters until the accuracy of the predicted purchases to the actual purchases is maximized,” wherein seeking the lowest computed error in Marcus (see para. [0052]) is the same as “accuracy” being “maximized.” See the Response to Arguments section below for additional explanation.
“Generate a final matrix from the input data entries, said final matrix generation comprising assigning an entity identifier and product identifier pair associated with each interest event to a different location in the matrix, along with a time identifier indicative of how far back in time the interest event associated with each entity-product identifier pair occurred from the prescribed date of interest.” See the first bullet point concerning Marcus for an explanation of teachings of Marcus used to reject the claimed “generate a matrix” step and the claimed “assigning an entity identifier and product identifier pair associated with each interest event to a different location in the matrix, along with a time identifier indicative of how far back in time the interest event associated with each entity-product identifier pair occurred from the prescribed date of interest.” Marcus teaches in para. [0051], “Retraining model 30 includes receiving a new (or partially new) training set of historical data elements which are used to repeat training method 22 as shown in FIG. 1, inputting different constants, metrics, thresholds, or other model parameters.” Creating matrices during retraining phases in Marcus reads on the claimed “generate a final matrix from the input data entries.” 
“Employ the supervised machine learning technique to create a separate final prediction model for each product of interest in the input data that estimates the probability that an entity in the input data will purchase the product within the future time period using the final matrix.” As explained above, Sakaki already teaches elements (see paras. [0041] and [0045], and FIG. 9) that read on the claimed “employ the supervised machine learning technique,” “create a” “prediction model for each product of interest in the input data that estimates the probability that an entity in the input data will purchase the product within the future time period.” Similarly, Marcus teaches, in para. [0079], “The types of model generators (e.g., training engine 26 in FIG. 1) used in model 30 are given in the summary below: A. Support Vector Machines (SVM) (e.g., supervised learning, autonomous prediction).” Marcus teaches in para. [0051], “Retraining model 30 includes receiving a new (or partially new) training set of historical data elements which are used to repeat training method 22 as shown in FIG. 1, inputting different constants, metrics, thresholds, or other model parameters.” Marcus teaches in para. [0053], “prediction method 40 includes applying model 30 to a input vector (V.sub.1, . . . V.sub.n) derived from a received historical data element via mapping method 10 so as to predict a new predicted value P of the response characteristic.” Generating new models as a result of retraining, and then using them to make predictions, in Marcus, when applied in the context of making predictions in Sakaki, reads on the claimed “create a separate final prediction model for each product of interest in the input data that estimates the probability that an entity in the input data will purchase the product within the future time period using the final matrix.”
“For each product, using the input data, apply the finalized prediction model associated with that product to estimate the probability that an entity will purchase the product within the future time period.” As explained above, Sakaki already teaches elements that read on the claimed “For each product, using the input data, apply the” “prediction model associated with that product to estimate the probability that an entity will purchase the product within the future time period.” Marcus teaches in para. [0051], “Retraining model 30 includes receiving a new (or partially new) training set of historical data elements which are used to repeat training method 22 as shown in FIG. 1, inputting different constants, metrics, thresholds, or other model parameters.” Use of downstream (retrained) models in Marcus reads on the claimed “apply the finalized prediction model.”
Marcus describes, in its abstract, “A method of machine learning for generating a predictive model of a response characteristic;” and teaches in para. [0055], “Embodiments of the invention for modelling and predicting metrics may be applied to modeling user behavior to improve many technological fields, such as for example” “shopping behavior,” similar to the claimed invention and Sakaki. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the supervised machine learning features and processes of Sakaki, to include the pairs, matrices, models, and retraining features and processes of Marcus, to find the optimal mix of features for predictive models that yield a good prediction error, as taught by Marcus (see para. [0077].)
Bansal teaches limitations below of independent claim 1 that do not appear to be explicitly taught in their entirety by the combination of Sakaki and Marcus:
“Using the last-modified control parameters established in creating the initial prediction model for each product as input or if no modification was made to a control parameter in creating the initial prediction model for a product employing the initially-used control parameter as the input for that control parameter.” As explained in the alternative grounds of rejection above, the combination of Sakaki and Bansal teaches elements that read on the claimed “control parameters” and “initial prediction model.” To the extent that the applicant continues to disagree with the examiner’s interpretation of the combination of Sakaki and Marcus in the alternative grounds or rejection above, the examiner cites Bansal. Bansal teaches, in para. [0035], “A model tuner 220 tunes a prediction model of the identified horizon by performing at least two things with respect to that given horizon. First, as represented by the action of the model fitter 221, the model tuner 220 fits an initial prediction model to the parameter using the training data 211 thereby using machine learning to select the initial prediction model as represented by arrow 223.” Bansal teaches, in para. [0036], “Also, as represented by external data tuner 222, the model tuner 220 tunes the initial prediction model by adjusting an effect of external data 203 on the prediction to generate a final prediction model for the given horizon using the validation data 212.” Bansal teaches, in para. [0039], “In some embodiments, the model tuner 220 keeps the same initial prediction model for each of the prediction models for each horizon of a single multi-horizon prediction.” Bansal teaches, in para. [0040], “Furthermore, the external data tuner 222 may use the same validation data to tune towards the final prediction model for each of the horizons in the multi-horizon prediction results. However, the effect of the external data may change for each horizon. For instance, external data that has higher shorter term relevance may have a greater effect on the final prediction model for shorter term horizons than for longer term horizons. In alternative embodiments, the external data tuner 222 may use different validation data to tune towards the final prediction model for at least one of the horizons in the multi-horizon prediction results.” Bansal teaches, in para. [0048], “The external data 203 might be data originating remotely from the computing system 200. For instance, the external data 203 may be data provided and updated by a web service, such as processed web data.” The establishing of initial prediction models, and then tuning the initial predictions models towards final prediction models, by tuning external data or using different validation data, reads on the “using the last-modified control parameters” and “employing the initially-used control parameter” limitations. See FIG. 2 of Bansal, which shows that the tunable external data is not a parameter of the established initial prediction model.
Bansal describes making predictions about future values of parameters (see para. [0002]), similar to the claimed invention and to the combination of Sakaki and Marcus. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the modelling processes of the combination of Sakaki and Marcus, to include the tuning using external data and/or validation data, as in Bansal, to allow for more accurate predictions given a particular time horizon, as taught by Bansal (see para. [0036]).
Regarding claims 2-5, 8, and 11, the combination of Sakaki, Marcus, and Bansal teaches all of the claimed limitations. The explanations for how and where the limitations of claims 2-5, 8, and 11 are taught, with respect to the combination of Sakaki and Marcus that forms the basis for the alternative grounds of rejection up above, also apply to the rejection of claims 2-5, 8, and 11 based on the combination of Sakaki, Marcus, and Bansal.
Regarding claims 12-14 and 17, while the claims are of different scope relative to claims 1, 2, 4, and 8, respectively, the claims nevertheless recite limitations similar to the limitations of claims 1, 2, 4, and 8. Further, Sakaki teaches a product category embodiment (see FIG. 15) that reads on the differences between claims 1, 2, 4, and 8 and claims 12-14 and 17, including consideration of product categories rather than (or in addition to) consideration of products themselves. The consideration of product categories in Sakaki reads on the “product category” limitations of claims 12-14 and 17. Claims 12-14 and 17 are, therefore, rejected under 35 USC 103 as obvious in view of the combination of Sakaki, Marcus, and Bansal at least for the same reasons as claims 1, 2, 4, and 8.
Regarding independent claim 20, while the claim is of different scope relative to independent claim 1, the claim nevertheless recites limitations similar to the limitations of claim 1. Claim 20 is, therefore, rejected under 35 USC 103 as obvious in view of the combination of Sakaki, Marcus, and Bansal at least for the same reasons as claim 1.
Claims 6 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Sakaki, in view of Marcus, further in view of Bansal, and further in view of Kurapati.
Regarding claims 6 and 15, the combination of Sakaki, Marcus, Bansal, and Kurapati teaches all of the claimed limitations. The explanations for how and where the limitations of claims 6 and 15 are taught, with respect to the combination of Sakaki, Marcus, and Kurapati that forms the basis for the alternative grounds of rejection up above, also apply to the rejection of claims 6 and 15 based on the combination of Sakaki, Marcus, Bansal, and Kurapati.
Claims 7 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Sakaki, in view of Marcus, further in view of Bansal, further in view of Kurapati, and further in view of Westphal.
Regarding claims 7 and 16, the combination of Sakaki, Marcus, Bansal, Kurapati, and Westphal teaches all of the claimed limitations. The explanations for how and where the limitations of claims 7 and 16 are taught, with respect to the combination of Sakaki, Marcus, Kurapati, and Westphal that forms the basis for the alternative grounds of rejection up above, also apply to the rejection of claims 7 and 16 based on the combination of Sakaki, Marcus, Bansal, Kurapati, and Westphal.
Claims 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Sakaki, in view of Marcus, further in view of Bansal, and further in view of Kung.
Regarding claims 9 and 18, the combination of Sakaki, Marcus, Bansal, and Kung teaches all of the claimed limitations. The explanations for how and where the limitations of claims 9 and 18 are taught, with respect to the combination of Sakaki, Marcus, and Kung that forms the basis for the alternative grounds of rejection up above, also apply to the rejection of claims 9 and 18 based on the combination of Sakaki, Marcus, Bansal, and Kung.
Claims 10 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Sakaki, in view of Marcus, further in view of Bansal, and further in view of Westphal.
Regarding claims 10 and 19, the combination of Sakaki, Marcus, Bansal, and Westphal teaches all of the claimed limitations. The explanations for how and where the limitations of claims 10 and 19 are taught, with respect to the combination of Sakaki, Marcus, and Westphal that forms the basis for the alternative grounds of rejection up above, also apply to the rejection of claims 10 and 19 based on the combination of Sakaki, Marcus, Bansal, and Westphal.

Response to Arguments
On pp. 12-21 of the Amendment, the applicant argues for reconsideration and withdrawal of the previous claim rejections under 35 USC 103. The focal point of the applicant’s arguments is on the “using the last-modified control parameters established in creating the initial prediction model for each product as input or if not modification was made to a control parameter in creating the initial prediction model for a product employing the initially-used control parameter as the input for that control parameter” limitations of the claims. (See Amendment at p. 12.) The applicant refers to the limitation as the final control parameters feature. (See id. at p. 15.) The applicant argues, “What is clear from the Examiner’s characterization of Marcus is that at least one control parameter is changed in creating its final model” and “This teaching is contrary to the applicant’s claimed final control parameters feature where a final prediction model is created using the final control parameters established in creating the initial model.” (See id. at p. 17.) The applicant also assert, “None of the control parameters are changed in creating the claimed final prediction model.” (See id.) While the examiner generally agrees with the applicant’s summary of the examiner’s interpretation of Marcus, the disagrees with the applicant’s conclusion that Marcus fails to teach the final control parameters feature. If, hypothetically speaking, Marcus taught inputting different constants, metrics, thresholds, and other model parameters, where none of the prior parameters are re-used, Marcus would not read on the final control parameters feature. Marcus, however, does not require constraints being different. Rather, due to the “or” operator in the phrase “inputting different constants, metrics, thresholds, or other model parameters” of para. [0051], Marcus teaches instances where, perhaps, different constants are used, but the same metrics, thresholds, and/or other model parameters are used. In such a scenario, whatever metrics, thresholds, and/or other model parameters that remain the same, form a group that reads on the claimed final control parameters feature (i.e., “using the last-modified control parameters” and “employing the initially-used control parameter”). Additionally or alternatively, Marcus teaches, in para. [0083], “Step 4: The predicted values are compared to the actual values. Various measures of the quality or error of the prediction include: RMSE (Root Mean Square Error), MAE (Mean Absolute Error), StdDev (Standard Deviation), etc. If these error values are outside of an expected range, steps [1-4] may be rerun with internal tuning parameters adjusted or new training data.” Notice the “or” operator near the end of the cited passage. Re-running the steps with the same internal tuning parameters, but with new training data, in Marcus, reads on the final control parameters feature. For at least these reasons, the previous claims rejections under 35 USC 103 are being maintained. This rebuttal also addresses the remaining arguments in the Amendment.
Additionally or alternatively, to the extent the applicant continues to believe that the examiner’s interpretation of Sakaki and Marcus is untenable, a point which the examiner is not conceding, the examiner provides, in the 35 USC 103 section above, alternative grounds of rejection under 35 USC 103 that incorporate Bansal. As explained near the end of the 35 USC 103 section above, the establishing of initial prediction models, and then tuning the initial predictions models towards final prediction models, by tuning external data or using different validation data, in Bansal (see paras. [0035], [0036], [0039], [0048], and FIG. 2), reads on the final control parameters feature.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THOMAS Y HO, whose telephone number is (571)270-7918. The examiner can normally be reached Monday through Friday, 9:30 AM to 5:30 PM Eastern.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jerry O'Connor, can be reached at 571-272-6787. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/THOMAS YIH HO/Examiner, Art Unit 3624                                                                                                                                                                                                        


/Jerry O'Connor/Supervisory Patent Examiner,Group Art Unit 3624



    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 Alternative grounds of rejection under 35 USC 103, based on a different combination of references, can be found further down below in this Office Action.
        2 This marks the beginning of the portion of this Office Action that outlines the alternative grounds of rejection mentioned in footnote 1 on p. 3 of the Office Action.