DETAILED ACTION
This Second Non-Final Office Action is responsive to Applicant’s Amendment filed on 03/03/2021 in which claims 1 - 4 and 12 - 15 were amended.
Claims 1 - 19 are currently pending and under examination, of which claims 1 and 12 are independent claims. No claims are currently in condition for allowance.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The instant application has been transferred within the office. Examiner acknowledges application’s history with prior agreement, though proper due diligence in patent prosecution falls with the examiner to which the case has been re-assigned. Updated search and consideration is given to the applicant’s amendments and the office action is hereby made Non-Final.
Applicant’s amendments to the specification has overcome all objections previously set forth. Accordingly, objections are withdrawn.
Applicant’s amendment to independent claim 12 addresses the non-statutory subject matter rejection under 35 U.S.C. 101 of claims 12 - 19. Accordingly, the rejection is withdrawn.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.



Step 1: Is the claim to a process, machine, manufacture, or composition of matter? Yes—all claims are directed to one of the four statutory categories: claim 1 – 11 are method/process and claims 12-19 are computer readable medium/article of manufacture.
Step 2A, prong one: Does the claim recite an abstract idea, law of nature or natural phenomenon? Yes—the claims are directed to future event prediction based on model training. Specifically, the limitations disclose: 
identifying a first data set describing occurrences of a first event type, the first data set comprising a first plurality of user-content pairs relating a first set of users with content for which the first event type was observed; 
identifying a second data set describing occurrences of a second event type, the second data set comprising a second plurality of user-content pairs relating a second set of users with content for which the second event type was observed, the second set of users and the first set of users having a set of users in common; 
jointly training a set of embeddings for a joint set of users comprising the users in the first set of users and the users in the second set of users, wherein embeddings that correspond to the set of users in common are trained based on co-occurrences of events of the first event type in the first plurality of user-content pairs and co- occurrences of events of the second event type in the second plurality of user content pairs; 
training embeddings that correspond to users that are in the first set of users but not the set of users in common based on the embeddings that correspond to the set of users in common; and 
training a computer model that predicts the likelihood of occurrence of a future event of the first event type for a user with respect to a content item based on the embedding for the user in the jointly trained set of embeddings.

A model, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer elements. Nothing in the claim precludes the steps from practically being performed in the mind. For example, one may model such a prediction process as inferring that when friends Alice and Bob purchase an ugly Christmas sweater, then new friend Charlie will also purchase an ugly Christmas sweater, and an embedding is 
Step 2A, prong two: Does the claim recite additional elements that integrate the judicial exception into a practical application? No—the judicial exception is not integrated into a practical application. Although the claim discloses that the recited functionality is performed by a computer, the recited computer is recited at a high-level of generality such that it amounts to no more than a mere instructions to apply the exception using a generic computer component. Further, the steps of identifying first/second data amount to mere data gathering per MPEP 2106.05(g). Additionally, the training functionality to utilize only a portion of a joint embedding provides no evidence of improvement to the functioning of a computer. This is largely because, when viewing the application as a whole, there is no particularity to the transformation. Rather, the application merely states [0007] “apply appropriate weights”. This is severely deficient to enrich public knowledge or any skilled artisan in quid pro quo to resolve the application’s functionality. Detail concerning relation of any one set to another set as based on common set or some secret set amounts to no more than design choice. 
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? No—as noted above, additional elements are gauged with respect to MPEP 2106.05. The elements do not amount significantly as failing to satisfy inventive concept when considered individually or as a whole. In evaluating the claims relative to recent court findings, examiner notes finding of failure to improve computer functionality per Simio, LLC v. Flexsim Software Products, Inc. (Fed Circuit 2020).
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to independent claims 12, which recite a computer readable medium and code executed on processor, as well as to dependent claims 2 - 11 and 13 - 19.
Dependent claims disclose assorted groupings or portion of dataset to train or embed. This is consistent with deficiencies noted for language of the independent claim. Additional language not already identified pertains to claims 8 – 10 which disclose the event as conversion of an advertisement. The additional elements merely elaborate the abstract idea as pertaining to purchase prediction which is indicative of business methods.  
Taken alone, their additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-5, 7, 11-16, and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over: 
Hsieh et al., “Collaborative Metric Learning”, hereinafter Hsieh, in view of 
Biswas et al., “Combating the Cold Start User Problem in Model Based Collaborative Filtering”, hereinafter Biswas, and further in view of 
Wu et al., “Modeling the Evolution of Users’ Preferences and Social Links in Social Networking Services”, hereinafter Wu.
With respect to claim 1, Hsieh teaches: 
	A method {Hsieh [Abstract] “proposed algorithm”} comprising: 
identifying a first data set describing occurrences of a first event type, the first data set comprising a first plurality of user-content pairs relating a first set of users with content for which the first event type was observed {Hsieh [P.195 Sect.3 ¶2] “set of user-item pairs S that we know have a positive relationships and learn a user-item joint metric to encode these relationships” and [P.193 Sect.1 ¶4] “observe the relationships between certain items (user, item) pairs, i.e., the users’ ratings” wherein rating/preference is a type of event observed}; 
identifying a second data set describing occurrences of a second event type, the second data set comprising a second plurality of user-content pairs relating a second set of users with content for which the second event type was observed, the second set of users and the first set of users having a set of users in common {Hsieh [P.195 Sect2.3.1 ¶3] “unobserved user-item interactions as negative samples” and [P.196 Sect3.2 ¶2] “For each user-item pair (i, j), sample U negative items in parallel and compute the hinge loss in Eq.1” for [P.193 Sect.1 ¶4] “unknown ratings”, illustration Figs 1-2}; 
jointly training a set of embeddings for a joint set of users comprising the users in the first set of users and the users in the second set of users, wherein embeddings that correspond to the set of users in common are trained based on co-occurrences of events of the first event type in the first plurality of user-content pairs and co-occurrences of events of the second event type in the second plurality of user content pairs {Hsieh [Abstract] “We propose Collaborative Metric Learning (CML) which learns a joint metric space to encode not only users’ preferences but also the user-user and item-item similarity” where joint learning is by the objective function for training per equation [P.197 Sect3.5 ¶1]. The training jointly optimizes by minimizing loss terms Lf, Lc, Lm and uses L2 regularization for [P.197 Sect3.4 ¶2] “an object’s latent vector where an object can be a user or an item” illustrated Fig 3. Further, [P,195 Sect.3 ¶2] “This process, due to the triangle inequality, will also cluster 1) the users who co-like the same items together, and 2) the items that are co-liked by the same users together” and [P.195 ¶2] “infer users’ preferences, such as like, bookmarks, click-through etc.”}; 
However, Hsieh does not expressly teach users being of a second set, nor is it readily apparent that the users are separately processed as “training embeddings that correspond to users that are in the first set of users but not the set of users in common”.
Biswas teaches:
training embeddings that correspond to users that are in the first set of users but not the set of users in common based on the embeddings that correspond to the set of users in common {Biswas [P.9 ¶2-3] “compute true latent vectors of the cold users… generate the latent vector for a new user keeping everything else constant” with training performed by gradient descent on cold users, per same section at step (3) see [P.8 Last¶]}; and 
Biswas is directed to cold-start and modeling with latent representation over social datasets which is the problem identified by the instant specification’s background section. Therefore, the art is analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to isolate the latent embedding for cold users for the benefit “to incorporate new data without retraining the entire model” (Biswas [P.2 Sect.2]) and/or in order to “directly express the difference between the learned user profile and the true user profile in terms of the latent factors” (Biswas [P.1-2 PgBrk]).
However, the combination of Hsieh and Biswas does not teach “occurrence of a future event”.
Wu cures deficiencies teaching: 
training a computer model that predicts the likelihood of occurrence of a future event of the first event type for a user with respect to a content item based on the embedding for the user in the jointly trained set of embeddings {Wu [P.1243] Fig 1 illustrates context for consumption preference in social network topology. [P.1247 Sect6.2 ¶2] “predict users’ future consumption behavior” as supported by [P.1247] Algorithm 2 “predict consumption preference based on Eq. (26)”. The algorithm uses SGD/PGD updates user set U with item set V being held “Fix” as for [P.1248 ¶1] “jointly model user latent consumption vector and latent social vector”, [P.1242 Sect.3 ¶1] definitions, and [P.1245-46 Sect.5] detailed model}.
	Wu is directed to joint modeling with latent representations for social data thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to describe item ratings of Hsieh and Biswas according to future consumption detailed by Wu in order to “capture the interplay between users’ latent consumption vectors and latent social link vectors” (Wu [P.1241 ColBrk]) and/or in order to assess the “relative contribution of the social influence and the homophily [sic] effect for the evolution process of each user’s future consumption” (Wu [P.1246 Sect5.3]).

With respect to claim 2, the combination of Hsieh, Biswas, and Wu teaches the method of claim 1, further comprising: 
	further training the embeddings that correspond to users that are in the first set of users, but not the set of users in common, based on co-occurrences of events of the first event type in the first plurality of user-content pairs; and training embeddings that correspond to users that are in the second set of users, but not the set of users in common, based on co-occurrences of events of the second event type in the second plurality of user-content pairs {Wu [P.1244 Sect.5] “common latent consumption preferences” suggests common latent space and where [P.1243 Last¶] “cosine similarity measure to calculate the homophily effect between any pair of users. Let Lt(a) ϵ RMx1 denote the vector of user a’s consumption records… vector equals 0 if user a did not consume” is a first and second event type (consumption record, non-consumption) as discerned among social tie, see [P.1243 ¶2] “Nta is the set of users that a connects till t… stab denotes the social influence strength”. For example, [P.1243 RtCol] “user u2 finds u5 has many common consumption preferences with her (they both consumed v4 and v5), the u2 is likely to associate with u5 in the near future as shown in Fig 1”. Finally, [P.1251 Sect.7] “users’ latent behaviors are encoded latently” for training the model per Algorithm 2}.

With respect to claim 3, the combination of Hsieh, Biswas, and Wu teaches the method of claim 2, further comprising: 
	further training the embeddings that correspond to the users that are in the second set of users, but not the set of users in common, based on the embeddings that correspond to the set of users in common {Biswas [P.8 Sect7.1] “train a probabilistic matrix factorization model [20] on only the ratings given by the warm users” as opposed to training only cold users}.

With respect to claim 4, the combination of Hsieh, Biswas, and Wu teaches the method of claim 1, wherein  
	the embeddings that correspond to users that are in the first set of users, but not the set of users in common, are indirectly trained by the second plurality of user-content pairs by way of the embeddings that correspond to the set of users in common {Hsieh [P.195 Sect.3] “capturing users’ relative preferences… implicit feedback as a set of user-item pairs S that we know have positive relationships and learn a user-item joint metric to encode these relationships. Specifically, the learned metric pulls pairs in S closer and pushes the other pairs relatively farther apart” where implicit feedback is indirect training by learning joint metric}. 

With respect to claim 5, the combination of Hsieh, Biswas, and Wu teaches the method of claim 1, wherein 
	the first plurality of user-content pairs further relates a first set of content to users for which the first event type was observed, and the second plurality of user-content pairs further relates a second set of content to users for which the first event type was observed, the method further comprising: 4jointly training a set of embeddings for a joint set of content comprising the content in the first set of content and the content in the second set of content {Hsieh Figs 2 and 5 illustrate content as image data, [P.193 Last¶] “learns a joint user-item metric to encode not only users’ preferences but also user-user and item-item similarity… wide range of recommendation domains, including books, news and photography… content information”}.

With respect to claim 7, the combination of Hsieh, Biswas, and Wu teaches the method of claim 1, further comprising: 
	determining a sample size for the second plurality of user-content pairs based on ratio of a size of the first data set and a size of the second data set; and selecting a sample of the second plurality of user-content pairs based on the sample size; wherein the set of embeddings for the joint set of users are jointly trained based on the first plurality of user-content pairs and sample of the second plurality of the user-content pairs {Hsieh [P.197 Sect3.5] “form a mini-batch of size N” where a ratio of size is yielded by “For each pair, keep the negative item”. See also Wu [P.1244 RtCol] “imbalanced learning… undersampling”}.

With respect to claim 11, the combination of Hsieh, Biswas, and Wu teaches the method of claim 1, further comprising identifying the set of users in common by matching users of the first set of users to users of the second set of users; the matching comprising: 
	retrieving first data characterizing a first user of the first set of users; retrieving second data characterizing a second user of the second set of users; and comparing the first data and the second data to determine whether the first user matches the second user {Hsieh [P.198 ¶2] “user-user and item-item similarity is also encoded in their euclidean distances in this joint space” euclidean distance for similarity is comparing for any two users in the shared, joint space}.

With respect to claim 12, Hsieh teaches: 
	A non-transitory computer-readable medium containing computer program code executable on a processor {Hsieh [P.196 ¶3] “massive parallelism… modern GPUs” i.e., [P.199 ¶1-2] “GPU using Theano… The data and implementations are open-sourced” to perform the encoding. See also Biswas [P.8 Sect.7] “Linux Server with 2.93 GHz Intel Xeon X5570 machine with 98GB of memory with OpenSUSE Leap OS”} for: 
	The remainder of this claim is rejected for the same rationale as claim 1.

Claims 13-16 are rejected for the same rationale a claims 2-5, respectively.
Claim 18 is rejected for the same rationale as claim 7.
Claim 19 is rejected for the same rationale as claim 11. 

Claims 6, 8-9, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Hsieh, Biswas, and Wu in view of: 
Aslay et al., “Revenue Maximization in Incentivized Social Advertising”, hereinafter Aslay.
With respect to claim 6, the combination of Hsieh, Biswas, and Wu teaches the method of claim 5, further comprising: 
identifying a set of content in common by matching content of the first set of content in the first data set to content of the second set of content in the second data set; wherein embeddings that correspond to the set of content in common are trained based on co-occurrences of events of the first event type in the first plurality of user-content pairs and co-occurrences of events of the second event type in the second plurality of user content pairs, wherein the embeddings that correspond to content that is in the first set of content, but not the set of content in common, are trained based on co-occurrences of events of the first event type in the first plurality of user-content pairs, and the embeddings that correspond to content that is in the second set of content, but not the set of content in common, are trained based on co-occurrences of events of the second event type in the second plurality of user-content pairs {Aslay [P.3 ¶3] “the topic model maps each ad i to a distribution Ӯi over the latent topic space” wherein mapping ads is matching content, for the effect of [Abstract] “allocate ads to influential users, taking into account the propensity of ads for viral propagation” as detailed on [P.9] “update the latent seed set size si, using Eq. 10… update the influence spread estimation of current Si w.r.t. the updated sample Ri by invoking Algorithm 3” which evaluates coverage of seed set, coverage is co-occurrence}.
	Aslay shares the same co-author of Bislay, being Lakshmanan, and is directed to latent modeling of social datasets thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to map to a distribution over latent space as disclose by Aslay because “organic (i.e., non-promoted) posts, promoted posts can propagate from user to user in the network, potentially triggering a viral contagion” (Aslay [P.1 Sect.1 ¶3]) and/or in order to evaluate “revenue maximization” by considering “ad-specific influence probabilities” (Aslay [P.3 RtCol]).

With respect to claim 8, the combination of Hsieh, Biswas, and Wu teaches the method of claim 1, wherein: 
	the first data set describes interactions between the first set of users and one or more advertisements, and the first event type is a conversion event {Aslay [P.2] “users selected to endorse ad i” for advertisement campaign}; and 
the second data set describes interactions between the second set of users and one or more content items in a non-promotional content channel {Aslay [P.1 Sect.1] “ online advertising channels… organic (i.e., non-promoted) posts”}.

With respect to claim 9, the combination of Hsieh, Biswas, and Wu teaches the method of claim 8, wherein: 
	a content item described in the second data matches an advertisement in the first data set {Aslay [P.12 Sect.6 ¶2] “matching ads” i.e., [P.12 ¶3] “ads having the same distribution”, [P.2 Last¶]}. 

Claim 17 is rejected for the same rationale as claim 6.

Claims 10 is rejected under 35 U.S.C. 103 as being unpatentable over Hsieh, Biswas, Wu, and Aslay in view of: 
Zhao et al., “Improving Recommendation Accuracy using Networks and Complementary Products”, hereinafter Zhou.
With respect to claim 10, the combination of Hsieh, Biswas, and Wu teaches the method of claim 1. Aslay teaches wherein: 
	the first data set describes interactions between the first set of users and one or more advertisements, and the first event type describes a first type of conversion event {Aslay teaches replete reference to advertisements per title of reference and where first type of events include endorsement, [P.2 Sect.2]}; and 
	However, Aslay does not teach “second type of conversion event different from the first type”.
	Zhao teaches:
the second data set describes interactions between the second set of users and one or more advertisements, and the second event type describes a second type of conversion event different from the first type of conversion event {Zhao [P.3651] “co-purchasing behavior… co-occurrence relationships such that the occurrence (purchase) of one item’s property might trigger the co-occurrence (co-purchase) of other items linked to it” whereby [Abstract] “To model these complex relationships  we develop a method based on pairwise ranking and embedding learning to build representations of items based on their co-purchasing and co-browsing statistics. We conduct these experiments on Amazon dataset” and demonstratively set forth per [P.3561]}.
	Zhao is directed to cold-start and modeling for e-commerce thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to detail item-item similarity disclosed by Hsieh as co-purchasing taught by Zhao for the benefit of “building higher fidelity recommender systems as they are readily available in many real-world recommendation scenarios, and a priori seem likely to contain information that is highly relevant to the users’ preferences” (Zhao [P.3649 Sect.I ¶3], [P.3654] Fig 2). Furthermore, detailing purchases such as one would expect from dataset such as Amazon would reasonably include advertisement per Aslay as a matter of descriptive material.




The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
State of the Art:
Wang et al., “Item Silk Road: Recommending Items from Information Domain to Social Users” discloses dropout and overlapping users optimized over joint latent space.
Fonarev et al., “Efficient Rectangular Maximal-Volume Algorithm for Rating Elicitation in Collaborative Filtering” discloses seed set of cold users, see Figs 1-2.
Hussein et al., “Unified Embedding and Metric Learning for Zero-Exemplar Event Detection” discloses 500 event types for joint embedding, see Fig 3.
Additional Support:
Aharon et al., “ExcUseMe: Asking Users to Help in Item Cold-Start Recommendations” discloses [P.87 ¶3] “it is common to consider only a test set of users”.
Anava et al., “Budge-Constrained Item Cold-Start Handling in Collaborative Filtering Recommenders via Optimal Design” disclosure widely cited, Algs 1/2 “users subset”.
Volkovs et Yu, “Effective Latent Models for Binary Feedback in Recommender Systems” discloses user-user and item-item.
Patent Literature: 
Soni et al., US PG Pub No 20180285774A1 “Collaborative Personalization via Simultaneous Embedding of Users and Their Preferences” discloses most recent work of Yahoo, see Fig 3 user profile coverage and [0068] “joint embedding”.
Volkovs et al., same author’s patent literature US10,932,003B2 or US10,824,941B2.



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Chase P Hinckley whose telephone number is (571)272-7935.  The examiner can normally be reached on M-F 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda M. Huang can be reached on 571-270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/CHASE P. HINCKLEY/Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124