DETAILED ACTION
The applicant’s request for continued examination regarding application number 16/265,142, filed February 1, 2019 has been entered.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on May 24, 2022 has been entered.

Response to Amendments
The amendment filed May 24, 2022 has been entered. Examiner acknowledges receipt of Amendments to Application 16/265,142, which include: Amendments to the Claims, and Remarks containing Applicant’s amendments. 
Regarding Applicant’s Remarks, Examiner acknowledges Claims 1-2 and 13-14 have been amended. Claims 1-16 remain pending in the application. 

Response to Arguments
Examiner acknowledges receipt of Arguments to Application 16/265,142, which include: Remarks containing Applicant’s arguments. 
Regarding Applicant's Remarks for Claims 1-16 under 35 U.S.C. §103 as being unpatentable over Chew et al., U.S. PGPUB 2019/0311287, with PCT/US2017/014783 filed 1/24/2017 [hereafter referred as Chew] in view of Legrand et al., US PGPUB 2017/0091319, published 3/30/2017 [hereafter referred as Legrand], Examiner acknowledges Applicant’s arguments and have considered them, and have found them to be not persuasive. 
Regarding Applicant’s Remarks:
“Responsive to the Examiner's comments, Applicant has amended the claims to more clearly recite the element that the target webpage and the plurality of webpage variants are webpages used specifically for online marketing campaigns. 
The claimed invention of the present application is directed to a method and system for applying machine learning to marketing optimization for webpages. More specifically, a computer implemented method is disclosed, in which web traffic can be dynamically routed to a webpage that is most likely to perform well (in terms of a performance statistic, such as webpage conversion rate) for a particular visitor (who has certain known attributes), based upon the performance history of the webpage with other visitors having similar attributes. 
The present invention has particular application in the context of a vendor seeking to promote its products/services to Internet users by way of an online marketing campaign. The online marketing campaign includes a particular target webpage; the target webpage contains marketing information and solicits click-throughs or other webpage conversion (such as purchase actions) from Internet users. The particular target webpage has a number of webpage variants (the webpage variants differing slightly from each other - e.g., in terms of text content, layout, font, etc.). The online marketing campaign involves directing an online visitor or having the online visitor making a request for the target webpage. The claimed method is a machine learning method in which the online visitor is routed to a webpage variant, based on an "exploit/explore" strategy. This provides a basis for "deciding" for each instance which one of the webpage variants ("routed webpage variant") to deliver to each online visitor. The resulting performance outcome for the routed webpage variant is also added to the performance history so that the method learns the effectiveness of the 'exploit" strategy over time.”
Examiner has considered this argument and finds the argument to be not persuasive. Examiner notes that Applicant’s above arguments contain several sub-arguments, each of which will be discussed in the following paragraphs.
Regarding Applicant’s sub-argument that the Examiner’s prior art references do not teach a computer implemented method and system for directing users with similar attributes to webpages based on the performance history of a webpage, Examiner finds this sub-argument to be not persuasive. Examiner reminds the Applicant that MPEP 2111 requires that during patent examination, the pending claims must be given their broadest reasonable interpretation consistent with the specification, and an Examiner must construe claim terms in the broadest reasonable manner during prosecution as is reasonably allowed in an effort to establish a clear record of what applicant intends to claim. Examiner points out that Applicant’s above arguments are similar to earlier Remarks already presented and responded in the Final Office Action mailed January 24, 2022. As indicated in the same Final Office Action, both Chew and Legrand references teach computer systems that select webpage content variations for an intended target audience. Chew teaches a content distribution network using a machine learning model to identify and select content variations to a user client device. A content distribution network (content delivery network) is a computer implemented framework for distributing webpage content variations to users within an audience pool. These content variations stored in the content distribution network are distinct web pages or a set of web pages stored in response pools, and hence these distinct website variations stored in these response pools correspond to “webpage variants” (Chew [0007]-[0008]; [0020]-[0022]: “… the machine learning model is used to select or modify content for delivery to client devices … the content may be in the form of an electronic document … a web page (or a set of web pages forming a website), component elements of a web page … an image or images … or any other form of content. … the users engage with the content e.g., choosing to tap or click on particular content elements such as embedded links or interaction controls. … The content server may generate dynamic content from a set of input parameters … includes but is not limited to content generated from a template and content with modifiable characteristics such as font, font size, language, and color scheme … Digital copy testing is the process of identifying the best variant of a content item to provide, where multiple variants of the content item are available … some websites are “adaptive websites” that adjust a website’s structure, content, and/or appearance in response to one or more measured interactions with the site … it is helpful to deliver content to a computing device that has been optimally selected and formatted for the intended audience …”; Figure 1, [0023]: “FIG.1 is a diagram of an example distribution system in a network environment 100 that includes a client device 120, a content selection server 140, a data manager 150, and a content server 170. … As described in more detail below, requests are allocated to response pools … The data requests may be … requests for content … The audience is divided into sub-groups, or audience pools, that each received respective variations of the content item using different parameterization models …”; [0030]: “… the content selection server 140 selects a variation of a specified content item … the content item may have parameterized font options, font sizes, color options, background image options, animation sequence options, and so forth … different variations of a content item are in different image file formats … different variations different in a quality level of the image file format … the data manager 150 stores multiple variations of a content item, e.g., different pre-rendered variations of the same core content item …”; and [0057]: “… The request for the content item may specify a core content item, a variant of which will be delivered … the request may be for a content item corresponding to an electronic document (e.g., a web page …) that may be delivered in a variety of languages, using a variety of fonts and font sizes, accompanied by images in one or more different formats, sizes, and image qualities. Each variation of the electronic document is a distinct form of the content …”). As previously identified in the same Final Office Action, Applicant’s own specification paragraph [0021] and Figure 1 indicates providing a similar computer implemented framework, where a web hosting service provider, its associated databases, and web servers is used to promote its products/services through distribution of its produced webpage content through the internet ([0021]: “Although the online marketer 30 is shown in Fig.1 as separate from the web hosting service provider 40, it is contemplated that the web hosting service provider 40 may actually provide the online market/webpage service to the customer 20. Although not explicitly shown as such, it should be understood that the interactions and communications between the customer, marketer 30 and the web hosting service provider 40 typically will occur via the Internet or other communications.”). Hence, the Chew reference is within the scope of the Applicant’s claimed invention, and thus Applicant’s sub-argument is not persuasive, and the prior art rejection is maintained.
Regarding Applicant’s sub-argument that the Examiner’s prior art references do not teach “directing an online visitor or having the online visitor making a request for the target webpage”, Examiner finds this sub-argument to be not persuasive. As indicated earlier, the machine learning model identifies and selects the appropriate content variation response pool based on the user audience and their provided online feedback/interactions with the content, where these feedback and user interactions include tapping or clicking on particular content elements and hence represent visitor attributes (Chew [0020]: “… the machine learning model is used to select or modify content for delivery to client devices … the content may be in the form of an electronic document … a web page (or a set of web pages forming a website), component elements of a web page … an image or images … or any other form of content. … the users engage with the content e.g., choosing to tap or click on particular content elements such as embedded links or interaction controls.”; and [0080]: “… Some information that may be associated with the user … may include events, such as … one or more clicks, browser history data (e.g., the URLs visited, the number of URLs viewed, URL visit durations, etc.) …”). This feedback is described as a measure of utility of the response, where the utility of the response is measured in terms of the length of time between providing the response to the client device and the next subsequent request. A person having ordinary skill in the art would understand that in the context of web pages, this feedback represents the engagement of the user with respect to the content, which involves additional clicking or tapping on particular elements on the page (i.e., embedded links or other interaction controls on the page such as buttons and boxes to fill in data) to trigger subsequent requests. The machine learning model uses this feedback received from the user to further improve the selection of the content variation. This process of identifying and selecting the appropriate content variation to be delivered to the user’s client device, and using the resulting user feedback to improve the machine learning model’s selection of the content variation corresponds to a searching and routing process which routes a new visitor to a routed webpage content variant (Chew [0033]: “… with machine learning techniques, one advantage is in predicting the best performing content element configuration a priori, without necessarily testing all possible configurations …”; [0040]: “… Having assigned a request to the response pools corresponding to the machine learning model at block 230, the data processing system then selects, at block 240, a response to the request using the machine learning model and respond to the request, at block 250, with the selected response …”; [0044]-[0046]: “… At block 240, the data processing system selects a response to the request using the machine learning model … the model learning model selects responses to requests and is then improved based on feedback information obtained response to the selected responses …”; [0050]: “… the data processing system obtains feedback information indicating a performance level of the machine learning model … the feedback information indicates utility of the response. … the data processing system measures the length of time between requests from a same source and, when the length of time is below a threshold, determines that the response was of low utility … a response may include data that can be actuated to generate a subsequent request (e.g., the response may include a hyperlink or URL). If the included data is actuated, this may indicate a higher level of utility … the feedback information includes amount of time between delivering a response to a client device and receiving, from the client device, a subsequent data request … the feedback information includes identifying, by the data processing system, that a threshold length of time was exceeded without receiving any additional data requests from the client device …”; and [0067]: “… a user may indicate acceptance of the delivered content item by interacting with the content item, e.g., clicking, tapping, or otherwise selecting an element of the content item …”). Hence, the Chew reference is within the scope of the Applicant’s claimed invention, and thus Applicant’s sub-argument is not persuasive, and the prior art rejection is maintained.
Regarding Applicant’s sub-argument that the Examiner’s prior art references do not teach “a machine learning method in which the online visitor is routed to a webpage variant, based on an "exploit/explore" strategy”, Examiner finds this sub-argument to be not persuasive. As indicated in the Final Office Action mailed January 24, 2022, Chew teaches that the content requests are assigned to different response pools according to various parameters associated with the requests, where these parameters include those parameterized options associated with different content variations (Chew [0030]-[0031], [0038]-[0039], and [0060]). As indicated earlier, Chew teaches the user click-initiated acceptance indicators (“visitor attributes”) that are part of the user-related feedback/interactions in response to the delivered content to the users are mapped into an acceptance metric for each content variation item. Chew teaches these acceptance metrics are updated and stored (with these updated metrics measuring the acceptance of the content variation by a user up to the current point in time), such that these updated acceptance metrics correspond to a performance outcome for the routed webpage.  Chew further teaches the machine learning model uses these stored acceptance metrics to adjust the distribution factor to either leave unchanged or increase/decrease the percentage of requests for future selections by users represented in the audience pool, and hence using these updated acceptance metrics to increase/decrease the amount of requests (and corresponding responses) handled by specific response pools containing content variations corresponds to an updating of the performance history that is used for selecting content variation by incorporating the performance outcome and at least one new visitor attribute (Chew [0067]-[0069]: “… the content distribution platform 130 obtains an acceptance indicator indicating recipient acceptance of the delivered variant of the requested content item. … a user may indicate acceptance of a delivered content item by interacting with the content item, e.g., clicking, tapping, or otherwise selecting an element of the content item. In some implementations, acceptance is indicated by the absence of rejection. That is, if a content item is fully delivered without a refresh or interruption, this may indicate acceptance. If, on the other hand, the client device submits another request for the same content item within a short window, or if delivery of the content item is interrupted, this may indicate that the content item was not accepted. … these statistics are stored by the data manager 15, e.g., in data storage 156 …”, [0071]-[0072]: “If the content distribution platform 130 determines that the machine learning model is underperforming, then at block 395 the content distribution platform 130 adjusts the distribution factor to lower the percentage of requests assigned to the audience pool that uses the machine learning model for selections … If the content distribution platform 130 determines that the machine learning model is performing well, the content distribution platform 130 may leave the distribution factor unchanged or, at block 399, adjust the distribution factor to increase the percentage of requests assigned to the audience pool …”). As indicated in the Final Office Action mailed January 24, 2022, the Legrand reference teaches the “exploit/explore strategy” by extending the above acceptance metric process and collection of performance history taught in Chew, through use of an iterative technique involving Thompson sampling/probability matching to model this iterative process of identifying and selecting the appropriate content variation. This iterative technique represents an exploration-exploitation process of allowing the user to interact with a selection of new items (“explore”) by presenting some randomness in a wide enough selection of items related to the user’s previous choices and selections, in an effort to nudge the user to choose and select other options (“exploit”). The probability of selecting the desired document P(C|D) is expressed as a sequential history of clicks up to a certain period of time, where each click corresponds to the user accepting the current result (“performance outcome”), and the sequential history of clicks corresponds to “a performance history”. This iterative process of accepting the current result through clicks and adding the clicks to the existing history to determine the probability of selecting the desired document continues until the user is satisfied with the selection (Legrand [0282]-[0283]: “Thompson sampling, often times referred to as probability matching, is a concept of introducing some randomness into selecting a group of items … The exploration-exploitation tradeoff essentially is an attempt to balance exploring new options while taking advantage of options that will exploit a user's previous selections. This can be done by choosing a wide enough range of products that the user's choice exposes a lot about their preferences versus choosing products that are likely to appeal to the user immediately. In order to address this tradeoff, an iterative search method such as that of FIG. 19 can use Thompson sampling in its presentation of candidate documents to the user in each pass (i.e., in either or both of operations 1912 and 1922). … Thompson sampling progressively enriches the options presented to the user for options that are likely to be of interest to the user … it continues to present the user with opportunities to express preferences for documents that information up to a given point in time might not be of interest …” and [0260]-[0263]: “Bayes' rule can be used in one or more of operations 1918, 1920 and 1922, and can inform and influence other operations described in FIG. 19 … the resulting goal is to estimate the probability of document D being the desired document given the sequence of clicks up to the given point in time. In Bayesian theory, this probability is represented as P(DIC). … The user model can be designed and implemented to determine a probability that a document D would be chosen, given a set of documents presented to the user and the sequence of selections/clicks up to that point. The set of documents that are presented to the user in, for example, operations 1912, 1922 and 1924 can be determined using various techniques, such as Thompson sampling … Bayes' rule can be used to estimate P(DIC), given P(CID) and P(D). P(D) is the system's view, prior to the user's clicks, of the estimated probability that the user is interested in document D. P(D) is the Prior or the Prior probability score. The Prior remains constant through the user's sequence of clicks, while the system's view of P(CID) changes and adapts in dependence upon the user's clicks. P(CID) is essentially the system's view of the probability that the sequence of clicks C would have occurred to reach the document D. The sequence of clicks C, may for example, include clicks             
                
                    
                        c
                    
                    
                        1
                    
                
            
        ,             
                
                    
                        c
                    
                    
                        2
                    
                
            
        , ... up to the current point in time … suppose that C is a sequence of documents,             
                
                    
                        c
                    
                    
                        1
                    
                
            
        ,             
                
                    
                        c
                    
                    
                        2
                    
                
            
        , ... selected by the user through various iterations and D is a desired document. … Mathematically, this can be represented by             
                P
                
                    
                        C
                    
                    
                        D
                    
                
                =
                
                    
                        ∏
                        
                            j
                            =
                            1
                        
                        
                            i
                        
                    
                    
                        P
                        (
                        
                            
                                c
                            
                            
                                j
                            
                        
                        |
                        D
                        )
                    
                
            
        , where i is the number of iterations (see operation 1914) until the user commits to a selected document (see operation 1926).”). As indicated in the Final Office Action mailed January 24, 2022, the motivation to combine the Chew and Legrand references is taught in Legrand, since applying this iterative process of exploration-exploitation through modeling a sequence of selections leading to a desired webpage content improves the accuracy of selections of a target item by taking into account the user’s previous selections, as well as allowing presentation of new items of possible interest to the user. This iterative process makes the machine learning model more efficient in terms of providing a more robust and a balanced set of options that may be of interest to a user, while keeping the user interested as well as engaged with the targeted set of webpages (Legrand [0261], [0282]-[0283]). Hence, the Chew and Legrand references are within the scope of the Applicant’s claimed invention, and thus Applicant’s sub-argument is not persuasive, and the prior art rejection is maintained.
Examiner further notes that Applicant’s sub-argument that the Chew and Legrand references are not associated with an online marketing campaign is not persuasive. Examiner points out that the term “online marketing campaign” does not restrict or limit the scope of the claims according to the broadest reasonable interpretation of the claim, but merely indicates an application (i.e., intended use) of the claimed invention. The term “online marketing campaign” broadly indicates a strategy (i.e., a series of steps or a process) that is conducted online or digitally, i.e., through the use of webpages or websites. The “marketing” aspect of the strategy broadly indicates a process or technique of promoting a product or service to an online audience or a group of users using webpages. Promoting a product or service to an audience involves selecting related content that is relevant or interesting to an user audience, hence directing (“routing”) the user audience to the selected content. Directing a user audience to relevant or interesting content can be based on a user’s feedback to the provided content, such as click or tap interactions to a content element, the request for additional information (by selecting a hyperlink or URL), or the lack of data requests from the client device (e.g., suggesting a lack of interest). Applicant’s newly introduced limitation “wherein the target webpage and the plurality of webpage variants are webpages used for an online marketing campaign” also broadly recites the use of online webpages (and their variants) to promote/sell a product or service to an audience or a group of users. As established earlier in the above responses to Applicant’s earlier arguments, both the Chew and Legrand references teach aspects of an “online marketing campaign”, where the distributed content is in the form of webpages, and the distribution of the content is directed to a group of users, and past and current user feedback and interactions (clicks) are monitored and applied to a machine learning model to further improve and direct the users to a selection of webpage content variations. Additionally, Chew teaches that machine learning models generally are applied in a variety of industries and applications to make predictions and recommendations, where the models are built and tailored using supervised and/or unsupervised learning techniques to train a model that is applicable to the particular application/industry based on the provided input training data (Chew [0018]-[0019]: “… machine learning model can be used in a variety of industries. Machine learning models are used, for example, to analyze large data sets, to make predictions or recommendations … Machine learning techniques may be considered to be supervised, unsupervised, or some combination thereof. Generally, a supervised machine learning system is used to build a model from training data where the desired output for the training data is already known. An unsupervised system, or a hybrid system, generates or extends the model without (or with limited) initial knowledge about desired outputs … ”). Hence, given the above evidence, Applicant’s sub-argument is not persuasive, and the prior art rejection is maintained.
Regarding Applicant’s Remarks:
“Chew, on the other hand, is directed to a rather general method for webpage content selection by balancing content distribution between a machine learning model and a statistical model. More specifically, when requests for a content item is made, a first proportion of the received requests are assigned to a first group and the remaining requests are assigned to a second group. A machine learning model to select variations of the requested content item for responding to requests assigned to the first group and uses a statistical model to select content variations for requests assigned to the second group. Performance information is obtained (e.g., acceptance rates for the different variations), and the performance of the different models is compared and used for content selection. Audience share assigned to the machine learning model is increased when it outperforms the statistical model and decreased when it underperforms the statistical model. 
The basic thrust of Chew is using a secondary statistical model as a backup for a machine learning model, as a way of creating a failsafe and minimum performance threshold for a decision making process. This use case seems far removed from the subject matter of the present application, of dynamic routing based on contextual bandits to dynamically serve content and increase a performance statistic by personalizing content in real time for each visitor based on their attributes. Indeed, it has nothing to do with any online marketing optimization techniques in the context of an online marketing campaign.”
Examiner has considered this argument and finds the argument to be not persuasive. Examiner points out that Applicant’s above arguments are similar to earlier Remarks already presented and responded in the Final Office Action mailed January 24, 2022. As established earlier in the above responses to Applicant’s earlier arguments, both the Chew and Legrand references display aspects of an “online marketing campaign”, where the distributed content is in the form of webpages, and the distribution of the content is directed to a group of users, and past and current user feedback and interactions (clicks) are monitored and applied to a machine learning model to further improve and direct the users to a selection of webpage content variations. As indicated in the same Final Office Action, the operations involving the statistical model are not cited as part of the claim mapping for the recited limitations. The associated figures in the Chew reference teach multiple embodiments and implementations, with the Chew reference further indicating that the statistical model is not required to perform the actions involving the statistical model, and instead a default or preselected variation can be substituted in its place (i.e., Chew [0032]: “In some implementations, the content selection server 140 uses a statistical model … In some implementations, the content selection server 140 uses a default or pre-selected variation in certain circumstances, e.g., when insufficient information is available to use one of the selection models…” and [0056]: “… Otherwise, at block 348, the content selection server 140 identifies a preselected variation (e.g., a default variation) of the requested content item for delivery …”). Hence, Applicant’s argument that the thrust of the Chew reference is directed towards the use of a statistical model to act as a backup for a machine learning model is not persuasive. At the same time, Examiner also points out that the Applicant’s previously entered and newly amended claims do not contain limitations that would restrict or further limit the scope of the claims to be strictly performed using a machine learning model. Examiner reminds Applicant that MPEP 2145(VI) indicates that “Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.”. Examiner also cites the related guidelines in MPEP 2111.01(II), which caution against importing written description into a claim limitation that is broader than the cited embodiment: "Though understanding the claim language may be aided by explanations contained in the written description, it is important not to import into a claim limitations that are not part of the claim. For example, a particular embodiment appearing in the written description may not be read into a claim when the claim language is broader than the embodiment.". Similarly, the same guidance in MPEP 2145(VI) and MPEP 2111.01(II) also applies to Applicant’s assertion that the Chew reference is far removed from the “subject matter of the present application, of dynamic routing based on contextual bandits to dynamically serve content and increase a performance statistic by personalizing content in real time”. Examiner points out that Applicant’s previously entered and newly amended claims do not contain limitations that recite contextual bandits. In other words, Applicant’s previously entered and newly amended claims do not contain limitations that limit nor restrict the routing process to a dynamic process based on contextual bandits. Additionally, Applicant’s previously entered and newly amended claims also do not contain limitations that limit nor restrict the process of calculating a performance statistic to be one that is strictly increasing such that it personalizes content in real time. Hence, given the above evidence, Applicant’s arguments are not persuasive, and the prior art rejection is maintained.
Regarding Applicant’s Remarks:
“Further, the Examiner points to paragraph [0020] of Chew as allegedly disclosing that content may be "a web page (or a set of web pages forming a website)..." and appears to equate this with the "webpage variants" of the present application. Understood in context, these are not at all the same things. The "webpage variants" in the present application refer specifically to a number of alternative iterations of a particular target webpage (each webpage variant designed to elicit an action from the online visitor, such as a "click-throughs" or other webpage conversion) -the webpage variants differing from one another in various ways (e.g., in terms of content, format, layout, etc.). It is due to these differences, that there may be a difference in predicted performance, based upon the at least one known attribute for the online visitor. In Chew, the reference in paragraph [0020] is simply identifying the types of content that the learning model may be used to select or modify for delivery.”
Examiner has considered this argument and finds the argument to be not persuasive. Examiner points out that Applicant’s above arguments are similar to earlier Remarks already presented and responded in the Final Office Action mailed January 24, 2022. Under its broadest reasonable interpretation, the term “webpage variants” broadly indicate variations of a webpage. As established earlier in response to Applicant’s earlier arguments, the content variations stored in the content distribution network are web pages or a set of web pages stored in response pools, with multiple content variants existing in these response pools. These multiple content variants can include different font options, sizes, different image formats, languages, color schemes, where in the context of an electronic document such as a web page, these differences represent different types of formatting on a web page. Chew also teaches that the content contains content elements that are “clickable” such as embedded links/hyperlinks or interaction controls (i.e., buttons, checkboxes, clickable images). A person having ordinary skill in the art would understand a web page would contain clickable elements such as embedded links/hyperlinks or interactive buttons, checkboxes, and clickable images, all of which can be customized using different formatting techniques (such as fonts and color schemes). Thus, these multiple content variants that are stored in the form of an electronic document such as a web page or a set of web pages and contain different formatted and clickable elements correspond to “webpage variants” (Chew [0007]-[0008]: “… The method includes selecting, by the content distribution server, a content variation responsive to the received request using a selection model corresponding to the assigned audience pool and delivering the selected content variation to the client device via the data network … In at least one aspect, described is a system for balancing content selection …”; [0020]-[0022]: “… the content may be in the form of an electronic document … a web page (or a set of web pages forming a website), component elements of a web page … an image or images … or any other form of content. … the users engage with the content e.g., choosing to tap or click on particular content elements such as embedded links or interaction controls. … it is helpful to deliver content to a computing device that has been optimally selected and formatted for the intended audience …”; [0030]: “… the content selection server 140 selects a variation of a specified content item … the data manager 150 stores multiple variations of a content item, e.g., different pre-rendered variations of the same core content item …”; and [0057]: “… The request for the content item may specify a core content item, a variant of which will be delivered … the request may be for a content item corresponding to an electronic document (e.g., a web page …) that may be delivered in a variety of languages, using a variety of fonts and font sizes, accompanied by images in one or more different formats, sizes, and image qualities. Each variation of the electronic document is a distinct form of the content …”). Hence, given the above evidence, the Chew reference is within the scope of Applicant’s claimed invention, and thus, Applicant’s argument is not persuasive, and the prior art rejection is maintained.
Regarding Applicant’s Remarks:
“There is nothing in Chew or in Legrand that directs the reader to apply the teachings therein to online marketing optimization for webpages. As such, it is difficult to see how a person skilled in the art, upon reviewing Chew (either alone or in combination with Legrand) could be led to the subject matter of the present invention, or more specifically to the subject matter of claim 1, as amended. Since neither of Chew or Legrand appear to contemplate any application relating to online marketing optimization, any suggestion that claim 1 should be considered obvious with regard to Chew and Legrand, would only be possible with hindsight analysis (which is not proper). Accordingly, it is respectfully submitted that claim 1 is not actually rendered obvious by Chew and Legrand and is thus patentable.”
Examiner has considered this argument and finds the argument to be not persuasive. Examiner points out that Applicant’s above arguments are similar to earlier Remarks already presented and responded in the Final Office Action mailed January 24, 2022. As established earlier in the above responses to Applicant’s earlier arguments, both the Chew and Legrand references display aspects of an “online marketing campaign”, where the distributed content is in the form of webpages, and the distribution of the content is directed to a group of users, and past and current user feedback and interactions (clicks) are monitored and applied to a machine learning model to further improve and direct the users to a selection of webpage content variations. Furthermore, in response to the Applicant's argument that the Examiner's conclusion of obviousness is based upon improper hindsight reasoning, it must be recognized that any judgment on obviousness is in a sense necessarily a reconstruction based upon hindsight reasoning. But so long as it takes into account only knowledge which was within the level of ordinary skill at the time the claimed invention was made, and does not include knowledge gleaned only from the applicant's disclosure, such a reconstruction is proper. See In re McLaughlin, 443 F.2d 1392, 170 USPQ 209 (CCPA 1971). Hence, Applicant’s argument regarding hindsight analysis is not persuasive, and the prior art rejection is maintained.
Regarding Applicant’s Remarks:
“The same reasoning discussed above as to why claim 1 is not obvious with regard to Chew and Legrand, similarly, applies to independent claims 2, 13 and 14. The rest of the pending claims all depend directly or indirectly from one of such independent claims, and include all the elements thereof. Thus, for at least the same that claims 1, 2, 13 and 14 are patentable, so such claims should also be found patentable.”
Examiner has considered this argument and finds the argument to be not persuasive. Examiner notes that Applicant does not provide any additional arguments other than referencing Applicant’s previous set of arguments made for the limitations recited in independent Claim 1. As established in response to the previous set of arguments in the above paragraphs, Applicant’s arguments concerning the identified limitations in independent Claim 1 were not persuasive, and hence Applicant’s arguments for the same limitations present in independent Claims 2, 13, and 14 are also not persuasive, and thus the prior art rejections are maintained.

Claim Objections
Claims 8 and 16 are objected to because of the following informalities: The term “page variant” in the limitation “… wherein the step of calculating for each page variant …” should be corrected as “webpage variant”.  Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-4 and 7-16 are rejected under 35 U.S.C. §103 as being unpatentable over 
Chew et al., U.S. PGPUB 2019/0311287, with PCT/US2017/014783 filed 1/24/2017 [hereafter referred as Chew] in view of Legrand et al., US PGPUB 2017/0091319, published 3/30/2017 [hereafter referred as Legrand], in further view of Burtini et al., Improving Online Marketing Experiments with Drifting Multi-Armed Bandits, ICEIS-2015, 2015 [hereafter referred as Burtini].  
Regarding amended Claim 1, 
Chew teaches
(Currently amended) A computer-implemented method for routing Internet traffic for a target webpage, 
… wherein the target webpage has a plurality of webpage variants (Examiner’s note: Chew teaches a content distribution system in a network environment that selects and directs content requests to client devices on the network. Chew further teaches the content are in the form of electronic documents such as a web page, with different variations of the web page selected for the intended users/audiences. Chew also teaches that the content contains elements that are “clickable” such as embedded links/hyperlinks or interaction controls (i.e., buttons, checkboxes, clickable images). These multiple content variations are stored in response pools, and these content variations can include different font options, sizes, different image formats, languages, color schemes, where in the context of an electronic document such as a web page, these differences represent different types of formatting on a web page. Hence, these multiple content variants (stored in the form of an electronic document such as a web page or a set of web pages) contain different and distinctly formatted and clickable elements, and as such these distinct web pages correspond to “webpage variants” (Chew [0007]-[0008]; [0020]-[0022]: “… the machine learning model is used to select or modify content for delivery to client devices … the content may be in the form of an electronic document … a web page (or a set of web pages forming a website), component elements of a web page … the users engage with the content e.g., choosing to tap or click on particular content elements such as embedded links or interaction controls. … Digital copy testing is the process of identifying the best variant of a content item to provide, where multiple variants of the content item are available …”; [0030]: “… the content selection server 140 selects a variation of a specified content item … the data manager 150 stores multiple variations of a content item, e.g., different pre-rendered variations of the same core content item …”; [0057]: “… The request for the content item may specify a core content item, a variant of which will be delivered … the request may be for a content item corresponding to an electronic document (e.g., a web page …) that may be delivered in a variety of languages, using a variety of fonts and font sizes, accompanied by images in one or more different formats, sizes, and image qualities. Each variation of the electronic document is a distinct form of the content …”; and [0067], [0080]).) …
… wherein the Internet traffic comprises a plurality of new visitors (Examiner’s note: As indicated earlier, Chew teaches a content distribution system in a network environment that selects and directs content requests to client devices on the network, where this content distribution network is implemented as a computing system (Chew Figure 4, [0073]-[0077]). Chew further teaches these requests are exchanged between the content distribution system and a plurality of client devices, such that these requests and responses to the request over the Internet correspond to Internet traffic between the servers and the client devices associated with different users (Chew [0007]-[0008]; Figure 1, [0020]: “… The devices receive content over the network and present the received content to users … the content may be in the form of an electronic document … a web page (or a set of web pages forming a website) …”; [0023]-[0025]: “… example distribution system in a network environment that includes a client device 120, a content selection server 140, a data manager 150, and a content server 170. … a client device 120 receives content via a network 110 from a content distribution platform … the client device 120 is one of many client devices that obtain content from the content distribution platform 130. … An illustrative network 110 is the Internet …”; and [0046]: “… the data processing system responds to the request with the response selected … a data request is received from a client device 120 via a data network 110, and the data processing system responds to the request by transmitting the selected response to the client device 120 via the data network 110 …”).), comprising:
at a server (Examiner’s note: As indicated earlier, Chew teaches a content distribution platform including a content selection server and a content server (Chew Figure 1, [0023], [0028]: “… the content distribution platform 130 is illustrated as a content selection server 140 and a content server 170 that collaborate to provide content from the data manager 150 to client devices 120 via the network 110 …”; and Figure 4, [0073]-[0077]).):
(i)    receiving a request for the target webpage from a new visitor (Examiner’s note: As indicated earlier, Chew teaches the content distribution platform receiving a request from a client device for a content item (“target webpage”), and responding to the request by providing a URL to the identified content from the content server (Chew Figure 1, [0026]-[0027], [0029]: “… the content distribution platform 130 receives a request to provide a specific content item, and responsive to the request, the content selection server 140 directs the client device 120 to obtain the content from the content server 170 … the content selection server 140 generates a uniform resource locator (URL) … The content selection server 140 provides the URL to a client device 120, which in turn accesses the URL to obtain the identified content from the content server 170 …”; Figure 2, [0040]-[0041] and [0046]: “… a data request is received from a client device 120 via a data network 110, and the data processing system responds to the request by transmitting the selected response to the client device 120 …”; and Figure 3, operation 310, [0056]-[0057]).);
(ii)    receiving at least one attribute for said new visitor (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites receiving features, properties, or characteristics associated with a user. Chew teaches using a machine learning model to select a content variation based on a set of input parameters/parameterized options. Chew further teaches these parameterized options include font options, font sizes, color options, scaling options, language options, different file formats and sizes that can be specified by a visitor to a website (through the use of their corresponding client device). Chew additionally teaches these input parameters also include instructions from the content selection server indicating the latency/bandwidth characteristics and types of transport protocols associated with a client device, such that these options correspond to at least one attribute for said new visitor (Chew [0020]-[0021]: “… The content server may generate dynamic content from a set of input parameters … set based on data associated with an expected audience for the content, or user supplied. Content generated dynamically from a set of input parameters is parameterized content, which includes but is not limited to content generated from a template and content with modifiable characteristics such as font, font size, language, and color scheme. … an adaptive website may have a variable presentation element such as a font size that a visitor can request to have increased or decreased …”; [0030]-[0031]: “… the content item is modifiable responsive to one or more parameters … the content selection server 140 uses a machine learning model to select a content variation, e.g., using parameterized options …”; and [0038]-[0039]). Chew additionally teaches receiving acceptance indicators from a user client device indicating a user’s acceptance of a delivered content variant of the requested item, where this acceptance indicator represents a clicking or tapping or an element in the delivered content item. This acceptance indicator also corresponds to an attribute associated with a visitor (Chew [0067]: “… the content distribution platform 130 obtains an acceptance indicator indicating recipient acceptance of the delivered variant of the requested content item … a user may indicate acceptance of a delivered content item by interacting with the content item, e.g., clicking, tapping, or otherwise selecting an element of the content item …”).);
(iii)    calculating for each webpage variant, a predicted performance statistic … in relation to the new visitor, based upon the at least one attribute of the new visitor and based upon a performance history for each webpage variant, the performance history including a performance statistic in relation to visitor attributes (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites the delivery of a content to a user based on an attribute associated with the visitor and a performance history, and performing a calculation of a predicted performance statistic. As indicated earlier, Chew teaches using a machine learning model to select a content variation based on a set of input parameters/parameterized options, where these options represent one or more attributes associated with a user. Chew further teaches the machine learning model performing an estimation to determine a performance metric and a corresponding content variation, based on receiving additional feedback data. Chew teaches the user feedback information is measured in terms of the length of time between providing the response to the client device and the next subsequent request, and is a measure of utility of the response, and includes an acceptance indicator from the user client device indicating recipient acceptance of the content item. In the context of web pages, this type of user feedback measures the engagement and acceptance of the content from the user (corresponding to a performance statistic related to a visitor attribute), where the user will perform additional clicking or tapping on particular elements on the page (i.e., embedded links or other interaction controls on the page, Chew [0020], [0067], [0080]) to trigger subsequent requests. Hence, this sequence of measurements and acceptance indicators received over a period of request-response iterations represents a performance history (i.e., the acceptance of the content from the user) for each webpage variant. Chew teaches applying this user feedback information to the machine learning model to estimate a result (where this estimated result corresponds to calculating a predicted performance statistic) to identify a variation that is most likely to be accepted at the client device (Chew [0031]: “… a prediction model is used to estimate a performance metric and a content variation corresponding to the highest estimated performance metric may be selected for delivery to a client device. The machine learning model uses feedback information … to identify a variation that (in accordance with the machine learning model) is most likely to be accepted at the client device … the content selection server 140 uses this approach to select a content variation corresponding to a predicted highest estimated acceptance and delivers the selected content variation to a client device.”; Figure 2, [0040]: “… the data processing system obtains feedback information indicating a performance level of the machine learning model …”, [0044]-[0046]: “… At block 240, the data processing system selects a response to the request using the machine learning model … the model learning model selects responses to requests and is then improved based on feedback information obtained response to the selected responses …”; [0050]: “… the data processing system obtains feedback information indicating a performance level of the machine learning model … the feedback information indicates utility of the response. … the data processing system measures the length of time between requests from a same source and, when the length of time is below a threshold, determines that the response was of low utility … a response may include data that can be actuated to generate a subsequent request (e.g., the response may include a hyperlink or URL). If the included data is actuated, this may indicate a higher level of utility … the feedback information includes amount of time between delivering a response to a client device and receiving, from the client device, a subsequent data request …”; and Figure 3B, [0067]-[0068]: “… At block 360, the content distribution platform 130 obtains an acceptance indicator indicating recipient acceptance of the delivered variant of the requested content item … a user may indicate acceptance of a delivered content item by interacting with the content item, e.g., clicking, tapping, or otherwise selecting an element of the content item … If the content item was selected for delivery using the machine learning model, then acceptance of the delivered content item (as indicated at block 360) is used to update an acceptance metric for the machine learning model.”).);
(iv)    routing the new visitor to a routed webpage variant …. wherein the routed webpage variant is one of the webpage variants (Examiner’s note: As indicated earlier, Chew teaches selection and delivery of a distinct content variant using a machine learning model, where the delivery of a distinct content variant through use of URLs corresponds to routing a user to one of the webpage variants (Chew [0030], [0031]; Figure 2, [0040], [0044]-[0046]; Figure 3A, [0057]: “… the request may be for a content item corresponding to an electronic document … a web page … Each variation … is a distinct form of the content …”, [0060]: “…the content selection server 140 determines whether the assigned audience pool uses a machine learning model for content selections, and if so, at block 335 uses the machine learning model to select a variation of the requested content item for delivery …”, and [0063]: “… the content distribution platform delivers the selected variation of the requested content item to the requesting client device … the content selection server 140 delivers, to the client device 120, an identifier (e.g., a URL) pointing to or describing the selected variation of the content item …”).);
(v)    determining an actual performance outcome for the routed webpage variant in respect of the new visitor (Examiner’s note: As indicated earlier, Chew teaches the content distribution platform receiving a sequence of user feedback information, including an acceptance indicator indicating recipient acceptance of the delivered content item, where this sequence of measurements and acceptance indicators received over a period of request-response iterations represents a performance history for each webpage variant. Chew further teaches updating the values of an acceptance metric for the model based on receiving an acceptance indicator from a user client device. Hence, this update of an acceptance metric based on receiving acceptance indicators from a user corresponds to a process for determining an actual performance outcome for the routed webpage variant in respect of a visitor (Chew [0050]; Figure 3B, [0067]-[0068]).);
(vi)    updating the performance history by incorporating the actual performance outcome for the routed webpage variant in respect of the new visitor and by incorporating the at least one attribute of the new visitor (Examiner’s note: As indicated earlier, Chew teaches using a machine learning model to select a content variation based on a set of input parameters/parameterized options (“at least one attribute of a new visitor”), where these options include options such as font sizes that can be specified by a visitor to a website (through the use of their corresponding client device) (Chew [0020]-[0021]; [0030]-[0031], [0038]-[0039]). Chew further teaches that the content requests are assigned to different response pools according to various parameters associated with the requests. As indicated earlier, Chew teaches the user click-initiated acceptance indicators (“visitor attributes”) that are part of the user-related feedback/interactions in response to the delivered content to the users are mapped into an acceptance metric for each content variation item. Chew teaches these acceptance metrics are updated and stored (with these updated metrics measuring the acceptance of the content variation by a user up to the current point in time), such that these updated acceptance metrics correspond to a performance outcome for the routed webpage. Chew further teaches the machine learning model uses these stored acceptance metrics to adjust the distribution factor to either leave unchanged or increase/decrease the percentage of requests for future selections by users represented in the audience pool, and hence using these updated acceptance metrics to increase/decrease the requests (and corresponding responses) handled by specific response pools containing content variations corresponds to an updating of the performance history that is used for selecting content variation by incorporating the performance outcome and at least one new visitor attribute (Chew [0055]: “… the requested content can be parameterized, i.e., customized or modified according to one or more parameters … the data server allocates requests to different response pools based on an allocation policy that includes use of a distribution factor, where the data server parameterizes the content item differently for the different response pools … FIG. 3A is a flowchart for distributing content variations based on a distribution factor, and FIG. 3B is a flowchart 305 for updating the distribution factor based on acceptance rates for the distributed content …”, [0059]-[0060]: “… the content selection server 140 assigns the request to an audience pool based on a distribution factor … The machine learning model attempts to tailor content variations to a requesting entity … requests are classified by the content selection server 140 according to various parameters tied to the request …”; Figure 3B, [0067]-[0069], [0071]-[0072]: “If the content distribution platform 130 determines that the machine learning model is underperforming, then at block 395 the content distribution platform 130 adjusts the distribution factor to lower the percentage of requests assigned to the audience pool that uses the machine learning model for selections … If the content distribution platform 130 determines that the machine learning model is performing well, the content distribution platform 130 may leave the distribution factor unchanged or, at block 399, adjust the distribution factor to increase the percentage of requests assigned to the audience pool that uses the machine learning model for selections …”).); and
(vii)    repeating steps (i) to (vi) for each subsequent new visitor (Examiner’s note: Chew teaches the method for responding to data requests by selecting an appropriate content variation as a response to the request is an iterative process where the system continuously receives data requests from client devices, and provides corresponding responses, using the same machine learning model (Chew Figure 2, [0040]: “Figure 2 is a flowchart of an example method 200 for responding to data requests … a data processing system receives data requests from client devices … the data processing system then selects, at block 240, a response to the request using the machine learning model and respond to the request, at block 250, with the selected response … At block 260, the data processing system obtains feedback information indicating a performance level of the machine learning model … The method 200 continues to iterate, such that the data processing system continuously receives data requests from client devices at block 210, including after block 280.”).).
While Chew teaches delivering a webpage variant to a user through delivery of URLs to a user client device, Chew does not explicitly teach
… based upon an exploit/explore strategy …
Legrand teaches
… based upon an exploit/explore strategy (Examiner’s note: Legrand teaches the exploration-exploitation tradeoff using Thompson sampling to select and present candidate documents to the user, where these candidate documents include webpages (Legrand [0004]), and the use of the Thompson sampling involves modeling and calculating a probability about the desired document using Bayesian probability theory, where this probability calculation includes a probability associated with a sequence of selections (represented by a user’s input clicks on a webpage) that estimates the overall probability that a particular document D (webpage content) is selected as the target content item, as well as a prior probability of past clicks (Legrand [0282]-[0283]: “… The exploration-exploitation tradeoff essentially is an attempt to balance exploring new options while taking advantage of options that will exploit a user's previous selections. This can be done by choosing a wide enough range of products that the user's choice exposes a lot about their preferences versus choosing products that are likely to appeal to the user immediately. In order to address this tradeoff, an iterative search method such as that of FIG. 19 can use Thompson sampling in its presentation of candidate documents to the user in each pass (i.e., in either or both of operations 1912 and 1922). … Thompson sampling progressively enriches the options presented to the user for options that are likely to be of interest to the user … it continues to present the user with opportunities to express preferences for documents that information up to a given point in time might not be of interest …”; and [0260]-[0263]: “Bayes' rule can be used in one or more of operations 1918, 1920 and 1922, and can inform and influence other operations described in FIG. 19 … the resulting goal is to estimate the probability of document D being the desired document given the sequence of clicks up to the given point in time. … this probability is represented as P(DIC). … The set of documents that are presented to the user in, for example, operations 1912, 1922 and 1924 can be determined using various techniques, such as Thompson sampling … Bayes' rule can be used to estimate P(DIC), given P(CID) and P(D). … the system's view of P(CID) changes and adapts in dependence upon the user's clicks. P(CID) is essentially the system's view of the probability that the sequence of clicks C would have occurred to reach the document D. The sequence of clicks C, may for example, include clicks                         
                            
                                
                                    c
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    c
                                
                                
                                    2
                                
                            
                        
                    , ... up to the current point in time … suppose that C is a sequence of documents,                         
                            
                                
                                    c
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    c
                                
                                
                                    2
                                
                            
                        
                    , ... selected by the user through various iterations and D is a desired document. … Mathematically, this can be represented by                         
                            P
                            
                                
                                    C
                                
                                
                                    D
                                
                            
                            =
                            
                                
                                    ∏
                                    
                                        j
                                        =
                                        1
                                    
                                    
                                        i
                                    
                                
                                
                                    P
                                    (
                                    
                                        
                                            c
                                        
                                        
                                            j
                                        
                                    
                                    |
                                    D
                                    )
                                
                            
                        
                    , where i is the number of iterations (see operation 1914) until the user commits to a selected document (see operation 1926).”; and [0266]-[0267]).) …
Both Chew and Legrand are analogous art since both teach machine learning techniques to identify and select webpage content (Chew Abstract and Legrand [0004]-[0005]). 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to take the machine learning model taught in Chew and incorporate the exploration-exploitation strategy using Thompson sampling and Bayes’ probability theory taught in Legrand to perform selection and routing of a user to a webpage variant. The motivation to combine is taught in Legrand, since applying this iterative process of exploration-exploitation to find a desired webpage content by calculating the probability based on an existing sequence of user clicks and past user clicks improves the accuracy of selections of a target item by taking into account the user’s previous selections, as well as allowing presentation of new items of possible interest to the user. This iterative process makes the machine learning model more efficient in terms of providing a more robust and a balanced set of options that may be of interest to a user, while keeping the user interested as well as engaged with the targeted set of webpages (Legrand [0261], [0282]-[0283]).
While Chew in view of Legrand teaches delivering webpage variations to multiple users based on the received user attributes and click history (Chew [0040], [0050]-[0051], [0067]; Legrand [0260]-[0263]), as well as teaching techniques to de-emphasize the number of previous clicks over time (Legrand [0275]-[0280]), Chew in view of Legrand does not explicitly teach
… wherein the target webpage and the plurality of webpage variants are webpages used for an online marketing campaign …
… calculating … a predicted performance statistic related to the marketing campaign …
Burtini teaches
… wherein the target webpage and the plurality of webpage variants are webpages used for an online marketing campaign (Examiner’s note: Burtini teaches modeling an online marketing scenario involving the selection of webpage modifications that are identified to be the best set of webpage modifications, through a combination of techniques involving optimistic Thompson sampling and penalized weighted least squares applied to the samples over time, and applying web context factors representing contextual variables as input into a model that includes time as a parameter, and running this scenario over a period of time, to de-emphasize older, less reliable data, with the goal of maximizing desired user behavior such as sales/revenue and engagement time of the user. The selected set of webpage modifications used in this marketing scenario to maximize desired user behavior in an online marketing scenario corresponds to a set of target webpage and the plurality of webpage variants used for an online marketing campaign (Burtini p.630 Section 1 Introduction 2nd paragraph: “... we explore the variant of the problem where the reward distributions may be changing in time. Specifically, we explore the case where the reward distributions may be drifting in time and contextual information is available. This replicates the online marketing scenario, where an experiment to modify a webpage may be set up at a given time and run indefinitely … The web environment has context: user factors (web browser, operating system, geolocation), world factors (day of the week), and arm factors (grouping of modifications). Utilization of contextual variables allows learning behavior for classes of users and observable world effects in order to improve the results.”; p.630 Section 2 Background 1st paragraph: “… In our application, individual arms represent webpage modifications with the goal of maximizing desired user behavior (sales, time engaged, etc.)”; p.631 Stochastic Drift 1st paragraph: “… Often it is possible to detrend non-stationary data by fitting a model that includes time as a parameter …”; p.632 Section 2.5 Probability Matching 1st paragraph: “Probability matching, especially randomized probability matching known as Thompson sampling, has been explored … The basic technique is to express a model that matches the probability of playing a particular arm with the probability of that arm being the best, conditional on all the information observed thus far.”; p.632 Section 3 Overview of the Approach 1st paragraph: “The general technique we experiment with is to fit a regression model of varying form to the data and then to utilize the technique of optimistic Thompson sampling to predict arm payoff in the next iteration of the algorithm …”; p.632 Section 3.1 Autoregression and Detrending 1st paragraph: “Formally, we fit a model                         
                            
                                
                                    Y
                                
                                
                                    t
                                    ,
                                    i
                                
                            
                            =
                            
                                
                                    α
                                
                                
                                    t
                                
                            
                            +
                            
                                
                                    A
                                    R
                                
                                
                                    i
                                
                            
                            
                                
                                    p
                                
                            
                            +
                            
                                
                                    T
                                    r
                                    e
                                    n
                                    d
                                
                                
                                    i
                                
                            
                            
                                
                                    t
                                
                            
                            +
                            
                                
                                    A
                                
                                
                                    t
                                    ,
                                    i
                                
                            
                            +
                            
                                
                                    ε
                                
                                
                                    t
                                    ,
                                    i
                                
                            
                        
                     … Where …                         
                            
                                
                                    Y
                                
                                
                                    t
                                    ,
                                    i
                                
                            
                        
                     is the expected reward for arm i at time t … This model can be readily extended to contain any contextual variables, such as demographic information about the user (in the web optimization context) or grouping criteria on the arms to improve the learning rate.”; and p.633 Sections 3.2 Penalized Weighted Least Squares and 3.3 Optimistic Thompson Sampling).) …
... calculating … a predicted performance statistic related to the marketing campaign (Examiner’s note: Under its broadest reasonable interpretation, the phrase “… related to the marketing campaign” broadly references an association with an online marketing campaign, and hence this limitation broadly recites calculating a predicted performance statistic associated with an online marketing campaign, where calculating a predicted performance statistic in the context of marketing broadly indicates a calculation associated with a goal or expected result. As indicated earlier, Burtini teaches modeling an online marketing scenario involving the selection of webpage modifications that are identified to be the best set of webpage modifications, through a combination of techniques involving optimistic Thompson sampling and penalized weighted least squares applied to the samples over time, and applying web context factors representing contextual variables as input into a model that includes time as a parameter, and running this scenario over a period of time, to de-emphasize older, less reliable data, with the goal of maximizing desired user behavior such as sales/revenue and engagement time of the user. These goals of maximizing desired user behavior (sales/revenue, time engaged) in the context of an online marketing scenario represent the calculation of expected rewards in a model (“predicted performance statistic”), and thus the calculation of expected rewards associated with these goals correspond to calculating a predicted performance statistic related to the marketing campaign (Burtini p.630 Section 1 Introduction 2nd paragraph; p.630 Section 2 1st paragraph”; p.632 Section 2.5 Probability Matching 1st paragraph; and p.632 Section 3.1 Autoregression and Detrending 1st paragraph).) …
Both Chew in view of Legrand and Burtini are analogous art since they both teach using machine learning models that select webpage variants with respect to associated user and webpage based factors/variables, where the selection is based on determining a probability using Thompson sampling, and penalizing/de-emphasizing past user actions.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the machine learning modeling techniques taught in Chew in view of Legrand and apply the web contextual variables and user behavior in conjunction with the improved modeling techniques taught in Burtini as a way to maximize the expected rewards (i.e., identify and select the best webpage content variations) related to an online marketing campaign. The motivation to combine is taught in Burtini, since applying modeling techniques such as Thompson sampling and de-emphasizing older, less reliable data on user and web contextual parameters replicates the conditions in an online marketing scenario where the input data dynamically changes based on user behavior over a period of time, resulting in a better model fit to make more accurate and reliable predictions. Furthermore, applying enhanced techniques such as optimistic Thompson sampling eliminates the need for sampling decreases related to a prediction, which additionally saves on the computational resources in the system, thereby making the model more computationally efficient (Burtini p.630 Section 1 Introduction: “In this work, we explore the variant of the problem where the reward distributions may be changing in time. Specifically, we explore the case where the reward distributions may be drifting in time and contextual information is available. This replicates the online marketing scenario, where an experiment to modify a webpage may be set up at a given time and run indefinitely with the aim of maximizing revenue. The web environment has context: user factors (web browser, operating system, geolocation), world factors (day of the week), and arm factors (grouping of modifications). Utilization of contextual variables allows learning behavior for classes of users and observable world effects in order to improve the results.”; p.632 Section 2.5 Probability Matching 2nd paragraph: “… traditional Thompson sampling both increases (if the draw is above the point estimate of the mean) and decreases (if the draw is below the point estimate of the mean) a prediction, depending on the sample draw; for the purpose of maximizing reward (minimizing regret), the decrease appears to have no benefit. For this reason, optimistic Thompson sampling, which only increases predictions proportional to their uncertainty, outperforms the traditional technique.”).
Regarding amended Claim 2, 
Chew teaches
(Currently amended) A computer-implemented method for routing Internet traffic for a target webpage, 
… wherein the target webpage has a plurality of webpage variants (This limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.) …
… wherein the Internet traffic comprises a plurality of new visitors (This limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.), comprising:
at a server (This limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.):
(i)    receiving a request for the target webpage from a new visitor (This limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.);
(ii)    receiving at least one attribute for said new visitor (This limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.);
(iii)    determining for each webpage variant a performance history, the performance history including a performance statistic … in relation to visitor attributes (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites the delivery of a content to a user based on identifying (determining) a performance history that includes a performance statistic related to a visitor attribute. As indicated earlier, Chew teaches using a machine learning model to select a content variation based on a set of input parameters/parameterized options, where these options represent one or more attributes associated with a user. Chew further teaches the machine learning model performing an estimation to determine a performance metric and a corresponding content variation, based on receiving additional user feedback data that includes acceptance indicators (also representing user attributes). Chew teaches the user feedback information is measured in terms of the length of time between providing the response to the client device and the next subsequent request, and is a measure of utility of the response, and includes an acceptance indicator from the user client device indicating recipient acceptance of the content item. In the context of web pages, this type of user feedback measures the engagement and acceptance of the content from the user (corresponding to a performance statistic related to a visitor attribute), where the user will perform additional clicking or tapping on particular elements on the page (i.e., embedded links or other interaction controls on the page, Chew [0020], [0067], [0080]) to trigger subsequent requests. Hence, this sequence of measurements and acceptance indicators received over a period of request-response iterations represents an identification (determination) of performance history that includes a performance statistic (i.e., the acceptance of the content from the user) for each webpage variant (Chew [0031]; Figure 2, [0040], [0044]-[0046]; [0050]; and Figure 3B, [0067]-[0068]).);
(iv)    calculating for each webpage variant, a predicted performance statistic in relation to the new visitor, based upon the at least one attribute of the new visitor and based upon a performance history (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites the delivery of a content to a user based on an attribute associated with the visitor and a performance history, and performing a calculation of a predicted performance statistic. As indicated earlier, Chew teaches using a machine learning model to select a content variation based on a set of input parameters/parameterized options, where these options represent one or more attributes associated with a user. Chew further teaches the machine learning model performing an estimation to determine a performance metric and a corresponding content variation, based on receiving additional feedback data. As indicated earlier, Chew teaches the user feedback information is measured in terms of the length of time between providing the response to the client device and the next subsequent request, and is a measure of utility of the response, and includes an acceptance indicator from the user client device indicating recipient acceptance of the content item (also representing user attributes). In the context of web pages, this type of user feedback measures the engagement and acceptance of the content from the user, where the user will perform additional clicking or tapping on particular elements on the page (i.e., embedded links or other interaction controls on the page, Chew [0020], [0067], [0080]) to trigger subsequent requests. Hence, this sequence of measurements and acceptance indicators received over a period of request-response iterations represents a performance history for each webpage variant. Chew teaches applying this user feedback information to the machine learning model to estimate a result (where this estimated result corresponds to calculating a predicted performance statistic) to identify a variation that is most likely to be accepted at the client device (Chew [0031]; Figure 2, [0040], [0044]-[0046]; [0050]; and Figure 3B, [0067]-[0068]).);
(v)    routing the new visitor to a routed webpage variant …. wherein the routed webpage variant is one of the webpage variants (This limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.);
(vi)    determining an actual performance outcome for the routed webpage variant in respect of the new visitor (This limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.);
(vii)    updating the performance history by incorporating the actual performance outcome for the routed webpage variant in respect of the new visitor (Examiner’s note: As indicated earlier, Chew teaches using a machine learning model to select a content variation based on a set of input parameters/parameterized options, where these options include options such as font sizes that can be specified by a visitor to a website (through the use of their corresponding client device) (Chew [0020]-[0021]; [0030]-[0031], [0038]-[0039]). Chew further teaches that the content requests are assigned to different response pools according to various parameters associated with the requests. As indicated earlier, Chew teaches the user click-initiated acceptance indicators that are part of the user-related feedback/interactions in response to the delivered content to the users are mapped into an acceptance metric for each content variation item. Chew teaches these acceptance metrics are updated and stored (with these updated metrics measuring the acceptance of the content variation by a user up to the current point in time), such that these updated acceptance metrics correspond to a performance outcome for the routed webpage. Chew further teaches the machine learning model uses these stored acceptance metrics to adjust the distribution factor to either leave unchanged or increase the percentage of requests for future selections by users represented in the audience pool, and hence using these updated acceptance metrics to increase the requests (and corresponding responses) handled by specific response pools containing content variations corresponds to an updating of the performance history that is used for selecting content variation by incorporating the performance outcome (Chew [0055], [0059]-[0060]; Figure 3B, [0067]-[0069], [0071]-[0072]).); and
(viii)    repeating steps (i) to (vii) for each subsequent new visitor (Examiner’s note: Chew teaches the method for responding to data requests by selecting an appropriate content variation as a response to the request is an iterative process where the system continuously receives data requests from client devices, and provides corresponding responses, using the same machine learning model (Chew Figure 2, [0040]).).
While Chew teaches delivering a webpage variant to a user through delivery of URLs to a user client device, Chew does not explicitly teach
… based upon an exploit/explore strategy …
Legrand teaches
… based upon an exploit/explore strategy (This limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.) …
Both Chew and Legrand are analogous art since both teach machine learning techniques to identify and select webpage content (Chew Abstract and Legrand [0004]-[0005]). 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the exploration-exploitation strategy using Thompson sampling taught in Legrand into the machine learning model taught in Chew to perform selection and routing of a user to a webpage variant. The motivation to combine is taught in Legrand, as provided in the prior art claim mapping of Claim 1 recited above.
While Chew in view of Legrand teaches delivering webpage variations to multiple users based on the received user attributes and click history (Chew [0040], [0050]-[0051], [0067]; Legrand [0260]-[0263]), as well as teaching techniques to de-emphasize the number of previous clicks over time (Legrand [0275]-[0280]), Chew in view of Legrand does not explicitly teach
… wherein the target webpage and the plurality of webpage variants are webpages used for an online marketing campaign …
… determining … a predicted performance statistic related to the marketing campaign …
Burtini teaches
… wherein the target webpage and the plurality of webpage variants are webpages used for an online marketing campaign (This limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.) …
… determining … a performance statistic related to the marketing campaign (Examiner’s note: Under its broadest reasonable interpretation, the phrase “… related to the marketing campaign” broadly references an association with an online marketing campaign, and hence this limitation broadly recites determining a performance statistic associated with an online marketing campaign, where determining a performance statistic in the context of marketing broadly indicates an input metric. As indicated earlier, Burtini teaches performing an online marketing scenario involving the selection of webpage modifications that are identified to be the best set of webpage modifications, through use of optimistic Thompson sampling and applying web context factors representing contextual variables as input into a model, and running this scenario over a period of time with the goal of maximizing desired user behavior such as sales/revenue and engagement time of the user. Burtini teaches contextual variables can include grouping criteria to improve the learning rate, where this identification of grouping criteria to improve the learning rate corresponds to an identification (“determination”) of a performance statistic related to the marketing campaign (Burtini p.630 Section 1 Introduction 2nd paragraph; p.630 Section 2 1st paragraph”; p.632 Section 2.5 Probability Matching 1st paragraph; and p.632 Section 3.1 Autoregression and Detrending 1st paragraph).) …
Both Chew in view of Legrand and Burtini are analogous art since they both teach using machine learning models that select webpage variants with respect to associated user and webpage based factors/variables, where the selection is based on determining a probability using Thompson sampling, and penalizing/de-emphasizing past user actions.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the machine learning model using Thompson sampling as taught in Chew in view of Legrand and enhance the model to use optimistic Thompson sampling when applying the web contextual variables and user behavior as taught in Burtini as a way to maximize the expected rewards (i.e., identify and select the best webpage content variations) related to an online marketing campaign. The motivation to combine is taught in Burtini, as provided in the prior art claim mapping of Claim 1 recited above.
Regarding original Claim 3, 
Chew in view of Legrand, in further view of Burtini teaches
(Original) The computer-implemented method of claim 1, wherein the at least one attribute is selected from the group consisting of: 
visitor device operating system type; 
desktop or mobile user (Examiner’s note: Chew teaches that a content server can select content based on instructions from the content selection server that includes specifying different image sizes according to the type of client device (smaller images for smart phones, larger images for laptop and desktop computers). Hence the selection of these different image sizes based on different types of client devices correspond to at least one attribute associated with a desktop or mobile user (Chew [0038]: “The content server 170 obtains a content item from the data manager 150 for delivery to the client device 120, e.g., response to an instruction from the content selection server 140 … the content server 170 selects an image size (or image file size), e.g., selecting smaller images for presentation at smaller client devices such as smart phones, selecting larger images for presentation at larger client devices such as laptop and desktop computers …”).); 
visitor browser type; 
IP address (Examiner’s note: Chew teaches that a content server can select content based on instructions from the content selection server indicating the standard protocol to be used (UDP, TCP, SCTP), where these protocols contain the source and destination IP addresses to aid in delivering the corresponding responses to the requested content. Hence these instructions for determining the type of protocol to be used corresponds to at least one attribute associated with an IP address (Chew [0025], [0038]-[0039]: “The content server 170 obtains a content item from the data manager 150 for delivery to the client device 120, e.g., response to an instruction from the content selection server 140 … The modifications or variations are responsive to instructions from the content selection server 140, which determines which variation of a content item is to be delivered … the content item, modified content item, or content item variant is then transmitted by the content server 170 to the client device 120 using a standard protocol such as UDP, TCP, or SCTP …”).); 
internet service provider (Examiner’s note: Chew teaches that a content server can select content based on instructions from the content selection server that includes specifying different image file sizes according to latency and bandwidth connections associated with mobile phone connections or home/office broadband connections, where latency and bandwidth are attributes associated with internet service providers. Hence the selection of these different image file sizes based on different latency and bandwidth attributes correspond to at least one attribute associated with an internet service provider (Chew [0038]: “The content server 170 obtains a content item from the data manager 150 for delivery to the client device 120, e.g., response to an instruction from the content selection server 140 … the content server 170 selects an image size (or image file size), … selecting smaller file sizes for high latency and/or low bandwidth connections such as mobile phone connection, and selecting larger file sizes for low latency and/or high bandwidth connections such as home or office broadband connections.”).); 
visitor geographic location; 
server’s geographic location; 
visitor age demographic (Examiner’s note: As indicated earlier, Burtini teaches performing an online marketing scenario involving the selection of webpage modifications that are identified to be the best set of webpage modifications, through use of optimistic Thompson sampling and applying web context factors representing contextual variables as input into a model, and running this scenario over a period of time with the goal of maximizing desired user behavior such as sales/revenue and engagement time of the user. Burtini teaches contextual variables including demographic information about the user. A person having ordinary skill in the art would understand user demographic information includes identifying characteristics associated with the user including age, income, and gender, and hence, this user demographic information corresponds to at least one attribute associated with a visitor age demographic (Burtini p.630 Section 1 Introduction 2nd paragraph; p.630 Section 2 1st paragraph”; p.632 Section 2.5 Probability Matching 1st paragraph; and p.632 Section 3.1 Autoregression and Detrending 1st paragraph); 
visitor firmographic attribute; 
visitor browser language (Examiner’s note: As indicated earlier, Chew teaches content generated dynamically from a set of input parameters. Chew further teaches that these input parameters provided in a request includes language options, such that different webpages may contain different languages (content variations) targeting different users. Hence these different language options associated with a webpage correspond to at least one attribute associated with a visitor browser language (Chew [0020]: “… The content server may generate dynamic content from a set of input parameters … set based on data associated with an expected audience for the content … Content generated dynamically from a set of input parameters is parameterized content, which includes … language …”; [0030]-[0031]: “… the content selection server 140 selects a variation of a specified content item … the content item may have parameterized … language options …”; and [0057]: “… a content distribution platform 130 receives a request to deliver a content item to a client device 120 … The request for the content item may specify a core content item, a variant of which will be delivered … the request may be for a content item … a web page … that may be delivered in a variety of languages …”).); 
referrer channel; and 
Urchin Tracking Module parameters.
Regarding original Claim 4, 
Chew in view of Legrand, in further view of Burtini teaches
(Original) The computer-implemented method of claim 3, wherein the at least one attribute for the new visitor is automatically detected by the server (Examiner’s note: Under its broadest reasonable interpretation, the term “automatically detected by the server” broadly indicates something that does not require an explicit request, and hence this limitation broadly recites an attribute that does not involve an explicit user request but are determined by the server. As indicated earlier, Chew teaches instructions from the content selection server indicating the latency/bandwidth characteristics of the network connection and types of transport protocols associated with a client device, where these options that are associated with the client device or with the network connection are not explicitly provided by a user, and hence correspond to attributes that are automatically detected by the server (Chew [0038]-[0039]).).
Regarding original Claim 7, 
Chew in view of Legrand, in further view of Burtini teaches
(Original) The computer-implemented method of claim 1, wherein the predicted performance statistic is calculated using one or more modeling techniques selected from of the group consisting of: 
Naive Bayes; 
Hierarchical Bayes; 
Neural Networks (Examiner’s note: As indicated earlier, Chew teaches a machine learning model using user feedback information about a destination client device to identify a variation that is most likely to be accepted by the client device. Chew further teaches that machine learning models encompass a variety of machine learning techniques, and in some implementations the machine learning model can be implemented using a neural network such as classical neural networks or deep learning neural networks (Chew Figure 1, [0031]; and [0019]: “Machine learning and machine learning models used in machine learning encompass a variety of machine learning techniques … Machine learning algorithms include, for example, classical neural networks, deep learning neural networks … In some implementations described herein, the machine learning model is a deep learning neural network …”, [0045]: “At block 240, the data processing system selects a response to the request using the machine learning model … the machine learning model selects responses to request and is then improved based on feedback information obtained responsive to the selected responses. In some implementations, the machine learning model is a deep learning neural network.”).); 
Linear Regression (Examiner’s note: As indicated earlier, Burtini teaches modeling an online marketing scenario involving the selection of webpage modifications that are identified to be the best set of webpage modifications, through a combination of techniques involving optimistic Thompson sampling and penalized weighted least squares applied to the samples over time, and applying web context factors representing contextual variables as input into a model, where this model is a linear regression model (Burtini p.632 Section 3 Overview of the Approach 1st paragraph: “The general technique we experiment with is to fit a regression model of varying form to the data and then to utilize the technique of optimistic Thompson sampling to predict arm payoff in the next iteration of the algorithm …”; and p.633 Section 3.2 Penalized Weighted Least Squares 4th paragraph: “To apply the weighted least squares procedure, we follow in the work of Pavlidis et al. (2008) which uses a standard linear regression to compute the estimates of each arm …”).); and 
Regression Trees (Examiner’s note: As indicated earlier, Chew teaches a machine learning model using feedback information about a destination client device to identify a variation that is most likely to be accepted by the client device. Chew further teaches that the machine learning model can be implemented using machine learning algorithms such as Classification and Regression Tree algorithms (Chew Figure 1, [0031]; and [0019]: “… Machine learning algorithms include, for example, … Classification and Regression Tree algorithms …”).). 
Regarding previously presented Claim 8, 
Chew in view of Legrand, in further view of Burtini teaches
(Previously presented) The computer-implemented method of claim 1, 
wherein the step of calculating for each webpage variant, additionally comprises calculating for each web page variant an uncertainty in relation to the new visitor based upon the at least one attribute of the new visitor (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites calculating an uncertainty related to a visitor based upon at least one attribute of the visitor. Legrand teaches that an uncertainty about the desired document at a given point in time can be modeled using Bayesian probability theory, where this probability calculation is based on a user model that calculates the probability associated with a sequence of selections (represented by a user’s input clicks on a webpage, where these user clicks correspond to at least one attribute of a visitor) that estimates the overall probability that a particular document D (webpage content) is selected as the target content item based on a sequence of clicks C that would have occurred to reach document D (i.e., P(C|D)), as well as a prior probability of past clicks (i.e., P(D)) that represents the system’s view of the estimated probability that a user is interested in document D (Legrand [0260]-[0261]: “… during a user session in which the user is seeking a desired document, uncertainty about the desired document at a given point in time can be modeled using Bayesian probability theory … this probability is represented as P(D|C). This modeling of the uncertainty regarding document D being the desired document, at a given point in time is described in further detail below as the “user model.” … The user model can be designed and implemented to determine a probability that a document D would be chosen, given a set of documents presented to the user and the sequence of selections/clicks up to that point … Given the user model, Bayes’ rule holds that P(D|C) is proportional to P(C|D)P(D) … P(D) is the prior or the prior probability score … P(C|D) is essentially the system’s view of the probability that the sequence of clicks C would have occurred to reach the document D …”). The calculations to determine the P(C|D) and P(D) probabilities are further taught in Legrand [0263] and [0268]-[0274], [0276]-[0280].), and 
wherein the uncertainty is calculated using one or more modeling techniques selected from of the group consisting of: 
Monte Carlo sampling (Examiner’s note: Legrand teaches scaling Bayesian techniques for choosing documents using Markov Chain Monte Carlo through a Metropolis-Hasting sampling method, which calculates the probability for choosing a random element of a set X (e.g., choosing a random document of a candidate list X) using a distribution close to some distribution Q, where the calculation of this probability of choosing a random element represents a calculation for determining the uncertainty (Legrand Figure 20, [0300]-[0302]).); 
Bootstrapping (Examiner’s note: Burtini teaches enhancing Thompson sampling using a bootstrap variant of Thompson sampling (Burtini p.632 Section 2.5 Probability Matching 1st paragraph: “… Recently, scalability has been studied by introducing a bootstrap-variant of Thompson scaling (Eckles and Kaptein, 2014)…”).); and 
propagation of uncertainty (Examiner’s note: Burtini teaches a weighted least squares technique where the weights are set to the inverse of their recency, where for each time step t, older data provides a less reliable estimate of the current state, where applying smaller weights to older data as time progresses corresponds to a propagation of uncertainty (Burtini p.633 Section 3.2 Penalized Weighted Least Squares: 1st-2nd paragraphs: “The weighted least squares (WLS) process introduces a multiplicative weighting of “reliability” for each observation, resulting in a technique which minimizes the reliability-adjusted squared errors. … the weights are set to the inverse of their recency, indicating that at each time step t, older data provides a less reliable estimate of the current state. … Intuitively, weighted least squares provides a simple, well-explored, highly tractable technique to discount the confidence of old data, increasing predictive uncertainty as time progresses. This is a desirable quality within the context of restless bandits as it appropriately accounts for the growing predictive uncertainty of old observations.”).). 
Regarding original Claim 9, 
Chew in view of Legrand, in further view of Burtini teaches
(Original) The computer-implemented method of claim 1, wherein the exploit/explore strategy is determined from using one or more of strategies selected from the group consisting of: 
Thompson sampling (Examiner’s note: As indicated earlier, Legrand teaches the exploration-exploitation tradeoff using Thompson sampling (Legrand [0260]-[0263]; [0281]-[0283]). Burtini also teaches a modeling technique that extends a linear UCB algorithm with optimistic Thompson sampling (Burtini p.633 Section 3.3 Optimistic Thompson Sampling).); 
epsilon-greedy strategy; 
epsilon-decreasing strategy; 
Monte Carlo simulation (Examiner’s note: As indicated earlier, Legrand teaches calculation of the uncertainty probability for larger candidate document lists using other Bayesian techniques such as Markov Chain Monte Carlo, implemented through Metropolis-Hasting sampling (Legrand Figure 20 and [0300]-[0311]).) and 
Upper Confidence Bound (Examiner’s note: Burtini teaches a modeling algorithm involving penalized weighted least squares and a linear UCB algorithm extended by the optimistic Thompson sampling technique, where the linear UCB algorithm is an algorithm based on upper confidence bound (Burtini p.631 Section 2.2 UCB Algorithms; and p.633 Section 3.3 Optimistic Thompson Sampling).).
Regarding previously presented Claim 10, 
Chew in view of Legrand, in further view of Burtini teaches
(Previously presented) The computer-implemented method of claim 1, wherein the exploit/explore strategy involves maximizing the predicted performance statistic (Examiner’s note: As indicated earlier, Legrand teaches the exploration-exploitation tradeoff using Thompson sampling to balance exploring new options while taking advantage of options that will exploit a user’s previous selections (Legrand [0282]). Legrand further teaches a user model that calculates the probability that the user clicks on a document x given a target t is proportional to exp(-λd(x,t)), with λ being chosen based on a maximum likelihood that maximizes the overall probability of the clicks seen in training data. Maximizing the number of clicks provides the estimate of the probability of document D being the desired document, and hence this process of maximizing the number of clicks corresponds to maximizing the predicted performance statistic (Legrand [0260]-[0262]: “… the resulting goal is to estimate the probability of document D being the desired document given the sequence of clicks up to the given point in time … this probability is represented as P(D|C) … Bayes’ rule can be used to estimate P(D|C), given P(C|D) and P(D) … P(C|D) is essentially the system’s view of the probability that the sequence of clicks C would have occurred to reach the document D. … The set of document that are presented to the user … can be determined using various techniques, such as Thompson sampling … the probability that the user clicks on a document x, given that the target is t, is proportional to exp(-λd(x,t)) … The value λ may be chosen using maximum likelihood, i.e., to maximize the overall probability of the clicks seen in training data…”).).
Regarding previously presented Claim 11, 
Chew in view of Legrand, in further view of Burtini teaches
(Previously presented) The computer-implemented method of claim 1, wherein the step of calculating for each webpage variant a predicted performance statistic in relation to the new visitor, is additionally calculated based on a plurality of learned model parameters (Examiner’s note: As indicated earlier, Burtini teaches modeling an online marketing scenario involving the selection of webpage modifications that are identified to be the best set of webpage modifications, through a combination of techniques involving optimistic Thompson sampling and penalized weighted least squares applied to the samples over time. Burtini teaches the fitting of this model and calculating the expected rewards                         
                            
                                
                                    Y
                                
                                
                                    t
                                    ,
                                    i
                                
                            
                        
                     (“predicted performance statistic”) for each arm i (where each arm represents a grouping of different webpage modifications), based on parameters such as the expected time trend Trend(t), the autoregressive term of order p AR(p), binary variables                         
                            
                                
                                    A
                                
                                
                                    t
                                    ,
                                    i
                                
                            
                        
                     and relevant interaction terms associated with each arm/group of different webpage modifications. The model fitting process uses a Bayesian conjugate prior technique to return an estimated set of time-detrended coefficients                         
                            
                                
                                    β
                                
                                ^
                            
                        
                     and estimates of their standard errors                         
                            
                                
                                    S
                                    E
                                
                                ^
                            
                            
                                
                                    (
                                    β
                                
                                ^
                            
                            )
                        
                     for a set of variables X (referencing the above-mentioned parameters), where these coefficients and standard error estimates are used to determine a weight matrix for the model that is further adjusted using a weighted least squares and LinUCB algorithm (extended by the optimistic Thompson sampling technique) to calculate the expected rewards. Hence, using the weight matrix to calculate the expected rewards for each group of webpage modifications corresponds to a calculation of a predicted performance statistic based on a plurality of learned model parameters for each webpage variant (Burtini p.630 Section 1 Introduction 2nd paragraph; p.630 Section 2 Background 1st paragraph; p.631 Stochastic Drift 1st paragraph; p.632 Section 2.5 Probability Matching 1st paragraph; p.632 Section 3 Overview of the Approach 1st paragraph; and p.632 Section 3.1 Autoregression and Detrending 1st paragraph: “… we fit a model                         
                            
                                
                                    Y
                                
                                
                                    t
                                    ,
                                    i
                                
                            
                            =
                            
                                
                                    α
                                
                                
                                    t
                                
                            
                            +
                            
                                
                                    A
                                    R
                                
                                
                                    i
                                
                            
                            
                                
                                    p
                                
                            
                            +
                            
                                
                                    T
                                    r
                                    e
                                    n
                                    d
                                
                                
                                    i
                                
                            
                            
                                
                                    t
                                
                            
                            +
                            
                                
                                    A
                                
                                
                                    t
                                    ,
                                    i
                                
                            
                            +
                            
                                
                                    ε
                                
                                
                                    t
                                    ,
                                    i
                                
                            
                        
                     … Where Trend(t) is a function representing the expected time trend, AR(p) is the autoregressive term of order p and                         
                            
                                
                                    Y
                                
                                
                                    t
                                    ,
                                    i
                                
                            
                        
                     is the expected reward for arm i at time t. … this model is generally fit as a model of                         
                            
                                
                                    Y
                                
                                
                                    t
                                
                            
                        
                     with binary (“dummy”) variables                         
                            
                                
                                    A
                                
                                
                                    t
                                    ,
                                    i
                                
                            
                        
                     and relevant interaction terms indicating which arm is detected … This model, fit with the … Bayesian conjugate prior technique, returns an estimated set of time-detrended, plausibly stationary coefficients                         
                            
                                
                                    β
                                
                                ^
                            
                        
                     and estimates of their standard errors                         
                            
                                
                                    S
                                    E
                                
                                ^
                            
                            
                                
                                    (
                                    β
                                
                                ^
                            
                            )
                        
                    . This model can be readily extended to contain any contextual variables … we follow in standard experiment design terminology and call the terms in our model 𝛂, AR(p), Trend(t), and                         
                            
                                
                                    A
                                
                                
                                    t
                                    ,
                                    i
                                
                            
                        
                     the design matrix and refer to it as X.”; p.633 Section 3.2 Penalized Weighted Least Squares: “… the weighted least squares procedure picks                         
                            
                                
                                    β
                                
                                ^
                            
                        
                    , coefficients on a set of variables, X, … according to the equation                         
                            
                                
                                    β
                                
                                ^
                            
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    X
                                                
                                                
                                                    T
                                                
                                            
                                            Ω
                                            X
                                        
                                    
                                
                                
                                    -
                                    1
                                
                            
                            (
                            
                                
                                    X
                                
                                
                                    T
                                
                            
                            Ω
                            y
                            )
                        
                    , where                         
                            Ω
                        
                     is the matrix of weights and y is the rewards as observed … To apply the weighted least squares procedure, we follow in the work of Pavlidis et al. (2008) which uses a standard linear regression to compute the estimates of each arm and the work of the LinUCB algorithm (Li et al., 2010) which applies a non-weighted penalized linear regression to compute estimates of the payoff for each arm … we strictly decrease the weight of a sample as it becomes further in time from our current prediction time …”, p.633 Figure 1 Pseudocode of combined algorithm).); and
wherein the step of updating the performance history by incorporating the actual performance outcome for the routed webpage variant in respect of the new visitor and by incorporating the at least one attribute of the new visitor additionally comprises: 
determining the learned model parameters for each webpage variant (Examiner’s note: As indicated earlier, Burtini teaches a model fitting process using a Bayesian conjugate prior technique to return an estimated set of time-detrended coefficients                         
                            
                                
                                    β
                                
                                ^
                            
                        
                     and estimates of their standard errors                         
                            
                                
                                    S
                                    E
                                
                                ^
                            
                            
                                
                                    (
                                    β
                                
                                ^
                            
                            )
                        
                     for a set of variables X, where these coefficients and standard error estimates are used to determine a matrix of weights in the model that are further adjusted using a weighted least squares and LinUCB algorithm (extended by the optimistic Thompson sampling technique) to calculate the expected rewards. As indicated earlier, the identification of the weight matrix based is on the determination of the variable coefficients and corresponding standard error estimates using a Bayesian conjugate prior technique. Hence, the identification of the weight matrix (based on determining variable coefficients and standard error estimates using a Bayesian conjugate prior technique corresponds to a determination of these learned model parameters for each webpage variant (Burtini p.630 Section 1 Introduction 2nd paragraph; p.630 Section 2 Background 1st paragraph; p.631 Stochastic Drift 1st paragraph; p.632 Section 2.5 Probability Matching 1st paragraph; p.632 Section 3 Overview of the Approach 1st paragraph; p.632 Section 3.1 Autoregression and Detrending 1st paragraph; and p.633 Section 3.2 Penalized Weighted Least Squares, p.633 Figure 1 Pseudocode of combined algorithm).). 
Regarding original Claim 12, 
Chew in view of Legrand, in further view of Burtini teaches
(Original) The computer-implemented method of claim 11, wherein the determining the learned model parameters is performed by a modelling method selected from the group consisting of 
Bayesian Inference (Examiner’s note: Under its broadest reasonable interpretation, this limitation recites a modeling method involving Bayesian probability techniques. As indicated earlier, Burtini teaches the identification of a weight matrix based on using a Bayesian conjugate prior technique. Hence, using a Bayesian conjugate prior technique to determine the weight matrix parameters corresponds to a modeling method involving Bayesian Inference (Burtini p.630 Section 1 Introduction 2nd paragraph; p.630 Section 2 Background 1st paragraph; p.631 Stochastic Drift 1st paragraph; p.632 Section 2.5 Probability Matching 1st paragraph; p.632 Section 3 Overview of the Approach 1st paragraph; p.632 Section 3.1 Autoregression and Detrending 1st paragraph; and p.633 Section 3.2 Penalized Weighted Least Squares, p.633 Figure 1 Pseudocode of combined algorithm). As indicated earlier, Legrand also teaches Bayesian probability theory, using a prior probability that involves de-emphasizing past clicks, where this Bayesian probability theory also corresponds to a Bayesian Inference modeling method (Legrand [0260]-[0261], [0266]-[0280]).), 
Maximum Likelihood Estimation and 
Stochastic Gradient Descent.
Regarding amended Claim 13, 
Claim 13 recites a communication system for routing Internet traffic for a target webpage, comprising of claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 1, and hence is rejected under similar rationale and motivations provided by Chew, Legrand, and Burtini as indicated in Claim 1. In addition, Chew teaches the content distribution platform and a plurality of client devices, where the content distribution platform includes a content selection server and a content server. Chew additionally teaches the content distribution platform is a computing system that contains at least one processor that executes instructions to perform actions to implement the described operations. Chew further teaches the network environment that connects these servers and client devices use HTTP/HTTPS protocols for communication for requesting and selecting the appropriate web page, such that this content distribution platform that uses HTTP/HTTPS for communication running over the data network corresponds to a web-based communication network that delivers content from a server to a plurality of client devices, where both server and the plurality of communication devices are coupled to the communication network (“a plurality of communication devices coupled to the communication network … a server coupled to the communication network …”). Hence, this computing system coupled to a communication network also corresponds to a communication system (Chew Figure 1, [0023]-[0025]: “… example distribution system in a network environment that includes a client device 120, a content selection server 140, a data manager 150, and a content server 170. The network environment 100 is referenced as an example environment … the client device 120 is one of many client devices that obtain content from the content distribution platform 130. A group of client devices that receives the same content item (in one or more variations) forms an audience for that content item. … The network 110 is composed of various network devices (nodes) linked together to form one or more data communication paths between participating devices … An illustrative network 110 is the Internet …”, [0026]-[0027]: “… The client device is capable of exchanging information with network nodes, computers, devices, and/or servers (e.g., the content selection server 140 and the content server 170) via the network 110 … the client device 120 executes a browser application (e.g., a web browser) capable of receiving data formatted according to the suite of hypertext application protocols such as the Hypertext Transfer Protocol (HTTP) and HTTP encrypted by Transport Layer Security (HTTPS). … the browser facilities interaction with the content distribution platform 130 in the form of one or more web pages …”; and Figure 4, [0073]-[0076]: “… the computing system 101 includes at least one processor 107 for performing actions in accordance with instructions and one or more memory devices 106 or 109 … a general purpose processor 107 such as a central processing unit (CPU) …”, [0080]: “… one or more modules of computer program instructions, encoded on one or more computer storage media for execution by, or to control the operation of, a data processing apparatus (including, e.g., a processor 107).”). 
Regarding amended Claim 14, 

Claim 14 recites a non-transitory computer readable storage medium having stored thereon processor-executable instructions configured to cause a processor to perform operations comprising of claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 2, and hence is rejected under similar rationale and motivations provided by Chew, Legrand, and Burtini as indicated in Claim 2. In addition, Chew teaches the content distribution platform is a computing system that contains at least one processor that executes instructions to perform actions to implement the described operations, where the instructions are stored in a non-transitory computer storage medium such as a random or serial access memory array or device (corresponding to a non-transitory computer readable storage medium) (Chew [0028]: “… the content distribution platform 130 is illustrated as a content selection server 140 and a content server 170 that collaborate to provide content from the data manager 150 to client devices 120 via the network 110 … the functionality described … is implemented across a number of computing devices …”; and Figure 4, [0073]-[0077], [0080]: “… Implementations of the subject matter described in this specification can be implemented as one or more computer programs embodied on a tangible medium, i.e., one or more modules of computer program instructions, encoded on one or more computer storage media for execution by, or to control the operation of, a data processing apparatus (including, e.g., a processor 107). A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device … The computer storage medium stores data, e.g., computer-executable instructions, in a non-transitory form.”).
Regarding previously presented Claim 15, 
Claim 15 recites the communication system of claim 13, further comprising of claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 8, and hence is rejected under similar rationale and motivations provided by Chew, Legrand, and Burtini as indicated in Claim 8, in view of the rejections of Claim 13.
Regarding previously presented Claim 16, 
Claim 16 recites the non-transitory computer readable storage medium of claim 14, further comprising of claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 8, and hence is rejected under similar rationale and motivations provided by Chew, Legrand, and Burtini as indicated in Claim 8, in view of the rejections of Claim 14.
Claim 5 is rejected under 35 U.S.C. §103 as being unpatentable over 
Chew et al., U.S. PGPUB 2019/0311287, with PCT/US2017/014783 filed 1/24/2017 [hereafter referred as Chew] in view of Legrand et al., US PGPUB 2017/0091319, published 3/30/2017 [hereafter referred as Legrand], in further view of Burtini et al., Improving Online Marketing Experiments with Drifting Multi-Armed Bandits, ICEIS-2015, 2015 [hereafter referred as Burtini] as applied to Claim 1; in even further view of Nassif et al., U.S. Patent 11,126,785, filed 2/17/2017 [hereafter referred as Nassif].  
Regarding original Claim 5, 
Chew in view of Legrand, in further view of Burtini teaches
(Original) The computer-implemented method of claim 1.
While Chew in view of Legrand, in further view of Burtini teaches user feedback measured by user engagement and acceptance of the content from the user (corresponding to a performance statistic related to a visitor attribute) through clicking or tapping on particular elements on the page Chew [0020], [0067], [0080]), Chew in view of Legrand, in further view of Burtini does not explicitly teach
… wherein the performance statistic reflects webpage conversion rate.
Nassif teaches
… wherein the performance statistic reflects webpage conversion rate (Examiner’s note: Nassif teaches conversion rates related to a successful purchase of an item in an e-commerce setting, where a visitor to a website has been converted into a paying customer. These success conditions are based on recording a successful interaction with the content items on a webpage, where these interactions include clicking on the web page or the content elements to obtain further information about the content items, which increases the likelihood of the user purchasing the associated item. Hence actions such as clicking or tapping particular elements on a webpage that result in successful purchases or transactions correspond to performance statistics that reflect webpage conversion rates (Nassif col.5 lines 10-39: “… success conditions may be defined based on one or more metrics of user interaction, such as clicks, purchases, and revenue to the provider of that content … The context information may include information known about the user and/or the content. This can include, for example, a likelihood of the user “clicking on” or otherwise interacting with the web page or the content to obtain further information about the content items. This can also include a likelihood of the user consuming (i.e., purchasing, … or otherwise obtaining) an item or service represented or described by the combination of content items …”; and col.6 lines 2-13: “… Recording a successful interaction may lead to a higher weight associated with the combination of content items shown, given the current contextual conditions. The specific success conditions may vary from layout to layout. … the purchase of an item is often referred to as a “conversion” in e-commerce vernacular, where a visitor to a website has been “converted” to a paying customer, or a view has been converted into a transaction … success conditions may be defined as a number of conversions, conversion rate, etc.”).).
Both Chew in view of Legrand, in further view of Burtini and Nassif are analogous art since they both teach using content selection models to deliver webpage content (containing different variations of content) to users in marketing-related scenarios.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the content selection modeling techniques taught in Chew in view of Legrand, in further view of Burtini and enhance it with the content selection modeling techniques taught in Nassif, as a way to focus on user data in a marketing context that represent the success or failure of converting the user to perform a transaction in order to improve the delivery of webpage content. The motivation to combine is taught in Nassif, since corresponding smaller/larger weights can be applied to this user data representing failed or successful conversions, which translates to the model be able to adapt to changing conditions and preferences, thereby making the model more robust in terms of delivering the optimal webpage content quicker to a user that will produce successful transactions and purchases (resulting in increased revenue), as well as making the model more computationally efficient and bandwidth-efficient (Nassif col.1 Background; col.2 lines 37-51; and col.6 lines 2-17).
Claim 6 is rejected under 35 U.S.C. §103 as being unpatentable over 
Chew et al., U.S. PGPUB 2019/0311287, with PCT/US2017/014783 filed 1/24/2017 [hereafter referred as Chew] in view of Legrand et al., US PGPUB 2017/0091319, published 3/30/2017 [hereafter referred as Legrand], in further view of Burtini et al., Improving Online Marketing Experiments with Drifting Multi-Armed Bandits, ICEIS-2015, 2015 [hereafter referred as Burtini] as applied to Claim 1; in even further view of Russo et al., A Tutorial on Thompson Sampling, November 18, 2017 [hereafter referred as Russo].  
Regarding original Claim 6, 
Chew in view of Legrand, in further view of Burtini teaches
(Original) The computer-implemented method of claim 1.
While Chew in view of Legrand, in further view of Burtini teaches user feedback measured by user engagement and acceptance of the content from the user (corresponding to a performance statistic related to a visitor attribute) through clicking or tapping on particular elements on the page Chew [0020], [0067], [0080]), Chew in view of Legrand, in further view of Burtini does not explicitly teach
… wherein the performance statistic reflects webpage click-through rates or user lead generations.
Russo teaches
… wherein the performance statistic reflects webpage click-through rates or user lead generations (Examiner’s note: Russo teaches solving a webpage advertisement based problem using Thompson sampling to learn to select the most successful ad by taking into account the success context of past ads, where users arriving at a website are shown versions of the website with different banner ads (“webpage variants”), and where a success is either associated with a click on the ad or with a conversion. The success context containing a click history corresponds to a performance history where each success action (i.e., an ad click) corresponds to a performance statistic. Russo further teaches this success context is represented as success probabilities                         
                            
                                
                                    θ
                                
                                
                                    k
                                
                            
                        
                     that point to either a click-through rate or conversion rate among the users visiting the website, and hence this success context corresponds to performance statistics that reflect webpage click-through rates (Russo pp.1-2 Introduction, 1st paragraph: “… Suppose there are K actions, and when played, any action yields either a success or a failure. Action k ∈ {1, ..., K} produces a success with probability 0 ≤                         
                            
                                
                                    θ
                                
                                
                                    k
                                
                            
                        
                     ≤ 1. The success probabilities (                        
                            
                                
                                    θ
                                
                                
                                    1
                                
                            
                        
                    ,…,                        
                             
                            
                                
                                    θ
                                
                                
                                    k
                                
                            
                        
                    ) are unknown to the agent … The objective … is to maximize the cumulative number of successes over T periods, where T is relatively large compared to the number of arms K. … The “arms” in this problem might represent different banner ads that can be displayed on a website. Users arriving at the site are shown versions of the website with different banner ads. A success is associated either with a click on the ad, or with a conversion (a sale of the item being advertised). The parameters                         
                            
                                
                                    θ
                                
                                
                                    k
                                
                            
                        
                     represent either the click-through-rate or conversion-rate among the population of users who frequent the site. The website hopes to balance exploration and exploitation in order to maximize the total number of successes.”; p.10 Algorithm 4 Thompson; and p.21 Section 6 Practical Modeling Considerations: “… previous sections has centered around a somewhat idealized view of Thompson sampling, which ignored the process of prior specification and assumed a simple model … In this section, we provide greater perspective on the process of prior specification and on extensions of Thompson sampling …”; and p.21 6.1 Prior Distribution Specification, 2nd-3rd paragraphs: “Given a prior, Thompson sampling can learn to select the most successful ad. … Taking knowledge into account reduces what must be learned and therefore reduces the time it takes for Thompson sampling to identify the most effective ads. … Suppose we have a data set collected from experience with previous products and their ads, each distinguished by stylistic features such as language, font, and background, together with accurate estimates of click-through probabilities. Let us consider an empirical approach to prior selection that leverages this data. First, partition past ads into K sets, with each kth partition consisting of those with stylistic features most similar to the kth ad under current consideration. … Intuitively, this process assumes that click-through probabilities of past ads in set k represent plausible values of                         
                            
                                
                                    θ
                                
                                
                                    k
                                
                            
                        
                    . The resulting prior is informative; among other things, it virtually rules out click-through probabilities greater than 0.05.”).).
Both Chew in view of Legrand, in further view of Burtini and Russo are analogous art since they both teach using Thompson sampling methods to select webpage content (containing different variations of content) to users in marketing-related scenarios.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the content selection modeling techniques taught in Chew in view of Legrand, in further view of Burtini and enhance it with the content selection modeling techniques taught in Russo, since taking into account previous knowledge (corresponding to prior specifications) that includes a user click history represented by click-through probabilities reduces the learning time of a model, and therefore reduces the time it takes for Thompson sampling to identify the most effective ads, thereby making a model more computationally efficient (Russo p.21 Section 6 Practical Modeling Considerations and p.21 Section 6.1 Prior Distribution Specification).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Catlin et al., U.S. PGPUB 2014/0372901, Using Visitor Context and Web Page Features to Select Web Pages for Display, published 12/18/2014, where Catlin teaches a system that contains a testing component for A/B testing or two or more content variations of one or more elements in a website in a marketing environment geared towards a visitor making online purchases, where the testing is performed to determine the best combination of content variations across multiple web page components, and selecting the optimized web page for a given visitor context to improve visitor interaction with the content (Catlin [0003]-[0006], [0009]; Figure 1, [0023], [0025], [0042]). Catlin also teaches an web page optimizer process that contains a predictive model in which visitor behavior and visitor information is provided as feature to estimate a desired outcome of whether a visitor purchased or did not purchase a product (Catlin Figure 1, [0023], [0029]-[0033]).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332. The examiner can normally be reached Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/WILLIAM WAI YIN KWAN/Examiner, Art Unit 2121                                                                                                                                                                                                        
/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121