DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application, filed on 09/25/2019, is a continuation of Application No. 14/841,585 (filed on 08/31/2015; now Patent No. 10,535,012). Claims 1-20 are pending and have been examined.

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 09/25/2019.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-5, 7-11, 16, 19, and 20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-4, 8, 9, 13, 15, and 16 of U.S. Patent No. US 10,535,012 B2. Although the claims at issue are not identical, they are not patentably distinct from each other because each of the instant claims (the claim being examined) is “generic to a species or sub-genus claimed in a conflicting patent or application, i.e., the entire scope of the reference claim falls within the scope of the examined claim.”  See MPEP 804(II)(B)(1).




U.S. Patent No. US 10,535,012 B2 (reference patent)
Claim 1

An apparatus for implementing a computing system to predict preferences, comprising: at least one processor device operatively coupled to a memory and configured to: 

calculate a parameter relating to a density of a prior distribution at each sample of a set of samples associated with the prior distribution, the at least one parameter including a distance from each sample to at least one neighboring sample; and 

estimate, for the plurality of samples, at least one differential entropy of at least one posterior distribution associated with at least one observation based on the parameter relating to the density of the prior distribution at each sample and the likelihood of observation for each sample, the estimation being performed without sampling the at least one posterior distribution to reduce consumption of resources of the computing system.


Claim 1

An apparatus to improve operation of a computing system for predicting personal preferences, comprising: a processor operatively coupled to a memory and configured to:

generate a plurality of samples from a prior distribution, the prior distribution including a distribution of values representing at least one preference of at least one person;

obtain, for each sample among the plurality of samples, a likelihood of an observation as an output of a likelihood function given the sample;

eliminate samples from the plurality of samples having a likelihood of observation less than a threshold value to generate a subset of the plurality of samples;

calculate at least one parameter relating to a density of the prior distribution at each sample in the subset, the at least one parameter including a distance from each sample to at least one neighboring sample;

estimate, for each sample in the subset, at least one differential entropy of at least one posterior distribution associated with at least one observation based on the at least one parameter relating to the density of the prior distribution at each sample and the likelihood of observation for each sample, the estimation being performed on samples in the subset and without sampling the at least one posterior distribution to reduce consumption of resources of the computing system; and

transmit, to at least one device associated with the at least one person, at least one electronic interaction generated based on an action, the action being selected based on expected values of the differential entropies estimated for all observations caused by the action.



The apparatus of claim 1, 

wherein the at least one processor device is further configured to: 

generate a plurality samples from the prior distribution; 

obtain, for each sample among the plurality of samples, a likelihood of an observation as an output of a likelihood function given the sample; and 

eliminate samples from the plurality of samples having a likelihood less than a threshold value to generate the set of samples.



An apparatus to improve operation of a computing system for predicting personal preferences, comprising: a processor operatively coupled to a memory and configured to:

generate a plurality of samples from a prior distribution, the prior distribution including a distribution of values representing at least one preference of at least one person;

obtain, for each sample among the plurality of samples, a likelihood of an observation as an output of a likelihood function given the sample;

eliminate samples from the plurality of samples having a likelihood of observation less than a threshold value to generate a subset of the plurality of samples;

calculate at least one parameter relating to a density of the prior distribution at each sample in the subset, the at least one parameter including a distance from each sample to at least one neighboring sample;

estimate, for each sample in the subset, at least one differential entropy of at least one posterior distribution associated with at least one observation based on the at least one parameter relating to the density of the prior distribution at each sample and the likelihood of observation for each sample, the estimation being performed on samples in the subset and without sampling the at least one posterior distribution to reduce consumption of resources of the computing system; and

transmit, to at least one device associated with the at least one person, at least one electronic interaction generated based on an action, the action being selected based on expected values of the differential entropies estimated for all observations caused by the action.





The apparatus of claim 1, wherein the distance from each sample to at least one neighboring sample is a distance from each sample to a kth-nearest neighbor, k being a natural number.





The apparatus of claim 1, wherein the processor is further configured to calculate a distance from each sample to a kth-nearest neighbor as the at least one parameter relating to the density at each sample, k being a natural number.


The apparatus of claim 1, wherein the at least one processor device is further configured to estimate the at least one differential entropy of the at least one posterior distribution by approximating a probability density function of the prior distribution at each sample using a volume of a sphere having a radius equal to the distance.
Claim 3 

The apparatus of claim 1, wherein the processor is further configured to estimate the at least one differential entropy of the at least one posterior distribution by approximating a probability density function of the prior distribution at each sample using a volume of a sphere having a radius equal to the distance.
Claim 5 

The apparatus of claim 1, wherein the at least one processor device is further configured to estimate the at least one differential entropy of the at least one posterior distribution having Euler's constant as a constant term.
Claim 4

The apparatus of claim 1, wherein the processor is further configured to estimate the at least one differential entropy of the at least one posterior distribution having Euler's constant as a constant term.
Claim 7 

The apparatus of claim 1, wherein the at least one processor device is further configured to obtain the at least one observation from a model having an internal state estimated by the prior distribution.
Claim 8 

The apparatus of claim 7, wherein the processor is further configured to obtain the observation from a model having an internal state estimated by the prior distribution.
Claim 8 

The apparatus of claim 7, wherein the model is a behavioral model of at least one person.
Claim 9

The apparatus of claim 8, wherein the model is a behavioral model of the at least one person.
Claim 9 

The apparatus of claim t, wherein the at least one processor device is further configured to: select an action from a plurality of candidate actions each causing one or more observations based on expected values of the differential entropies estimated for all observations caused by the action; and transmit, to at least one device associated with at least one person, at least one electronic interaction generated based on the action.
Claim 1 

“...transmit, to at least one device associated with the at least one person, at least one electronic interaction generated based on an action, the action being selected based on expected values of the differential entropies estimated for all observations caused by the action.”


A computer-implemented method for implementing a computer system to predict preferences, comprising: 

calculating a parameter relating to a density of a prior distribution at each sample of a set of samples associated with the prior distribution, the at least one parameter including a distance from each sample to at least one neighboring sample; and 

estimating, for the plurality of samples, at least one differential entropy of at least one posterior distribution associated with at least one observation based on the parameter relating to the density of the prior distribution at each sample and the likelihood of observation for each sample, the estimation being performed without sampling the at least one posterior distribution to reduce consumption of resources of the computing system.
Claim 13

A computer-implemented method for improving operation of a computing system for predicting personal preferences, comprising:

generating a plurality of samples from a prior distribution, the prior distribution including a distribution of values representing at least one preference of at least one person;

obtaining, for each sample among the plurality of samples, a likelihood of observation as an output of a likelihood function given the sample;

eliminating samples from the plurality of samples having a likelihood of observation less than a threshold value to generate a subset of the plurality of samples;

calculating at least one parameter relating to a density of the prior distribution at each sample in the subset, the at least one parameter including a distance from each sample to at least one neighboring sample;

estimating, for each sample in the subset, at least one-differential entropy of at least one posterior distribution associated with at least one observation based on the at least one parameter relating to the density of the prior distribution at each sample and the likelihood of observation for each sample, the estimation being performed on samples in the subset and without sampling the at least one posterior distribution to reduce consumption of resources of the computing system; and

transmitting, to at least one device associated with the at least one person, at least one electronic interaction generated based on an action, the action being selected based on expected values of the differential entropies estimated for all observations caused by the action.





The method of claim 10, further comprising: generating a plurality samples from the prior distribution; obtaining, for each sample among the plurality of samples, a likelihood of an observation as an output of a likelihood function given the sample; and eliminating samples from the plurality of samples having a likelihood less than a threshold value to generate the set of samples.





A computer-implemented method for improving operation of a computing system for predicting personal preferences, comprising:

generating a plurality of samples from a prior distribution, the prior distribution including a distribution of values representing at least one preference of at least one person;

obtaining, for each sample among the plurality of samples, a likelihood of observation as an output of a likelihood function given the sample;

eliminating samples from the plurality of samples having a likelihood of observation less than a threshold value to generate a subset of the plurality of samples;

calculating at least one parameter relating to a density of the prior distribution at each sample in the subset, the at least one parameter including a distance from each sample to at least one neighboring sample;

estimating, for each sample in the subset, at least one-differential entropy of at least one posterior distribution associated with at least one observation based on the at least one parameter relating to the density of the prior distribution at each sample and the likelihood of observation for each sample, the estimation being performed on samples in the subset and without sampling the at least one posterior distribution to reduce consumption of resources of the computing system; and

transmitting, to at least one device associated with the at least one person, at least one electronic interaction generated based on an action, the action being selected based on expected values of the differential entropies estimated for all observations caused by the action.





The method of claim 10, wherein the at least one processor device is further configured to obtain the at least one observation from a model having an internal state estimated by the prior distribution.





The method of claim 13, further comprising:
obtaining the observation from a behavior model of the person having an internal state estimated by the prior distribution question;
wherein the electronic interaction includes an electronic message for viewing on the at least one device.


A computer program product for implementing a computer system to predict preferences, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform operations comprising: 

calculating a parameter relating to a density of a prior distribution at each sample of a set of samples associated with the prior distribution, the at least one parameter including a distance from each sample to at least one neighboring sample; and 

estimating, for the plurality of samples, at least one differential entropy of at least one posterior distribution associated with at least one observation based on the parameter relating to the density of the prior distribution at each sample and the likelihood of observation for each sample, the estimation being performed without sampling the at least one posterior distribution to reduce consumption of resources of the computing system.

Claim 16 

A computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform operations for improving operation of a computing system for predicting personal preferences, the operations comprising:

generating a plurality of samples from a prior distribution, the prior distribution including a distribution of values representing at least one preference of a person;

obtaining, for each sample among the plurality of samples, a likelihood of observation as an output of a likelihood function given the sample;
eliminating samples from the plurality of samples having a likelihood of observation less than a threshold value to generate a subset of the plurality of samples;

calculating at least one parameter relating to a density of the prior distribution at each sample in the subset, the at least one parameter including a distance from each sample to at least one neighboring sample;

estimating, for each sample in the subset, at least one differential entropy of at least one posterior distribution associated with at least one observation based on the at least one parameter relating to the density of the prior distribution at each sample and the likelihood of observation for each sample, the estimation being performed on samples in the subset and without sampling the at least one posterior distribution to reduce consumption of resources of the computing system; and

transmitting, to at least one device associated with the at least one person, at least one electronic interaction generated based on an action, the action being selected based on expected values of the differential entropies estimated for all observations caused by the action.


The computer program product of claim 19, wherein the operations further include: 

generating a plurality samples from the prior distribution; 

obtaining, for each sample among the plurality of samples, a likelihood of an observation as an output of a likelihood function given the sample; 

eliminating samples from the plurality of samples having a likelihood less than a threshold value to generate the set of samples; 

selecting an action from a plurality of candidate actions each causing one or more observations based on expected values of the differential entropies estimated for all observations caused by the action; and 

transmitting, to at least one device associated with at least one person, at least one electronic interaction generated based on the action.
Claim 16 

A computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform operations for improving operation of a computing system for predicting personal preferences, the operations comprising:

generating a plurality of samples from a prior distribution, the prior distribution including a distribution of values representing at least one preference of a person;

obtaining, for each sample among the plurality of samples, a likelihood of observation as an output of a likelihood function given the sample;

eliminating samples from the plurality of samples having a likelihood of observation less than a threshold value to generate a subset of the plurality of samples;

calculating at least one parameter relating to a density of the prior distribution at each sample in the subset, the at least one parameter including a distance from each sample to at least one neighboring sample;

estimating, for each sample in the subset, at least one differential entropy of at least one posterior distribution associated with at least one observation based on the at least one parameter relating to the density of the prior distribution at each sample and the likelihood of observation for each sample, the estimation being performed on 

transmitting, to at least one device associated with the at least one person, at least one electronic interaction generated based on an action, the action being selected based on expected values of the differential entropies estimated for all observations caused by the action.



As indicated in the table above, all the claimed features in instant claim 1 are disclosed in reference claim 1. While the two claims are not identical, instant claim 1 is anticipated by reference claim 1. It is evident from the table that all limitations in instant claim 1 are linguistically comparable to the underlined limitations in reference claim 1 except for the limitation “calculate a parameter relating to a density of a prior distribution at each sample of a set of samples associated with the prior distribution” in instant claim 1, for which explanation is provided below:
Reference claim 1 recites “calculate at least one parameter relating to a density of the prior distribution at each sample in the subset” wherein the “subset” refers to “samples from the plurality of samples having a likelihood of observation less than a threshold value” and the plurality of samples are associated with the prior distribution (see the “generate...” and “obtain...” limitations of reference claim 1). Therefore, reference claim 1 anticipates instant claim 1. 
	Instant claims 10 and 19 recite analogous limitations as claim 1, and are rejected based on similar rationale as stated above for claim 1 (instant claim 10 compared to reference claim 13; instant claim 19 compared to reference claim 16). Each of the instant dependent claims as noted above is rejected based on the same rationale as the claim from which it depends. Please see table for more information.
s 10-14 and 16-18 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-4 and 8-9 of U.S. Patent No. US 10,535,012 B2. Although the claims at issue are not identical, they are not patentably distinct from each other because all the claimed limitations recited in the instant claims are transparently found in the reference claims of U.S. Patent No. US 10,535,012 B2 with obvious wording variations.
Instant Application
U.S. Patent No. US 10,535,012 B2 (reference patent)
Claim 10

A computer-implemented method for implementing a computer system to predict preferences, comprising: 

calculating a parameter relating to a density of a prior distribution at each sample of a set of samples associated with the prior distribution, the at least one parameter including a distance from each sample to at least one neighboring sample; and 

estimating, for the plurality of samples, at least one differential entropy of at least one posterior distribution associated with at least one observation based on the parameter relating to the density of the prior distribution at each sample and the likelihood of observation for each sample, the estimation being performed without sampling the at least one posterior distribution to reduce consumption of resources of the computing system.
Claim 1

An apparatus to improve operation of a computing system for predicting personal preferences, comprising: a processor operatively coupled to a memory and configured to:

generate a plurality of samples from a prior distribution, the prior distribution including a distribution of values representing at least one preference of at least one person;

obtain, for each sample among the plurality of samples, a likelihood of an observation as an output of a likelihood function given the sample;

eliminate samples from the plurality of samples having a likelihood of observation less than a threshold value to generate a subset of the plurality of samples;

calculate at least one parameter relating to a density of the prior distribution at each sample in the subset, the at least one parameter including a distance from each sample to at least one neighboring sample;

estimate, for each sample in the subset, at least one differential entropy of at least one posterior distribution associated with at least one observation based on the at least one parameter relating to the density of the prior distribution at each sample and the likelihood of observation for each sample, the estimation being performed on samples in the subset and without sampling the at least one posterior distribution to reduce consumption of resources of the computing system; and

transmit, to at least one device associated with the at least one person, at least one electronic interaction generated based on an action, the action being selected based on expected values of the differential entropies estimated for all observations caused by the action.

Claim 11

The method of claim 10, further comprising: generating a plurality samples from the prior distribution; obtaining, for each sample among the plurality of samples, a likelihood of an observation as an output of a likelihood function given the sample; and eliminating samples from the plurality of samples having a likelihood less than a threshold value to generate the set of samples.

Claim 1

An apparatus to improve operation of a computing system for predicting personal preferences, comprising: a processor operatively coupled to a memory and configured to:

generate a plurality of samples from a prior distribution, the prior distribution including a distribution of values representing at least one preference of at least one person;

obtain, for each sample among the plurality of samples, a likelihood of an observation as an output of a likelihood function given the sample;

eliminate samples from the plurality of samples having a likelihood of observation less than a threshold value to generate a subset of the plurality of samples;

calculate at least one parameter relating to a density of the prior distribution at each sample in the subset, the at least one parameter including a distance from each sample to at least one neighboring sample;

estimate, for each sample in the subset, at least one differential entropy of at least one posterior distribution associated with at least one observation based on the at least one parameter relating to the density of the prior distribution at each sample and the likelihood of observation for each sample, the estimation being performed on samples in the subset and without sampling the at least one posterior distribution to reduce consumption of resources of the computing system; and

transmit, to at least one device associated with the at least one person, at least one electronic interaction generated based on an action, the action being selected based on expected values of the differential entropies estimated for all observations caused by the action.


The method of claim 10, wherein the distance from each sample to at least one neighboring sample is a distance from each sample to a kth-nearest neighbor, k being a natural number.
Claim 2 

The apparatus of claim 1, wherein the processor is further configured to calculate a distance from each sample to a kth-nearest neighbor as the at least one parameter relating to the density at each sample, k being a natural number.
Claim 13

The method of claim 10, wherein estimating the at least one differential entropy of the at least one posterior distribution further includes approximating a probability density function of the prior distribution at each sample using a volume of a sphere having a radius equal to the distance.
Claim 3 

The apparatus of claim 1, wherein the processor is further configured to estimate the at least one differential entropy of the at least one posterior distribution by approximating a probability density function of the prior distribution at each sample using a volume of a sphere having a radius equal to the distance.
Claim 14

The method of claim 10, wherein the at least one differential entropy of the at least one posterior distribution is estimated having Euler's constant as a constant term.
Claim 4

The apparatus of claim 1, wherein the processor is further configured to estimate the at least one differential entropy of the at least one posterior distribution having Euler's constant as a constant term.
Claim 16 

The method of claim 10, wherein the at least one processor device is further configured to obtain the at least one observation from a model having an internal state estimated by the prior distribution.
Claim 8 

The apparatus of claim 7, wherein the processor is further configured to obtain the observation from a model having an internal state estimated by the prior distribution.
Claim 17

The method of claim 16, wherein the model is a behavioral model of at least one person.
Claim 9

The apparatus of claim 8, wherein the model is a behavioral model of the at least one person.
Claim 18

The method of claim 10, further comprising: selecting an action from a plurality of candidate actions each causing one or more observations based on expected values of the differential 


“...transmit, to at least one device associated with the at least one person, at least one electronic interaction generated based on an action, the action being selected based on 


Instant claim 10 differs from reference claim 1 in that instant claim 10 recites “A computer-implemented method for implementing a computer system to predict preferences” while reference claim 1 recites “An apparatus to improve operation of a computing system for predicting personal preferences, comprising: a processor operatively coupled to a memory and configured to”. It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to implement the computer-implemented method of instant claim 10 by utilizing the computer-implemented apparatus of reference claim 1. Each of the instant dependent claims as noted above is rejected based on the same rationale as the claim from which it depends.

Claims 6 and 15 are rejected on the ground of nonstatutory double patenting as being unpatentable over claim 6 of U.S. Patent No. US 10,535,012 B2 in view of Pant et al. (“An information-theoretic approach to assess practical identifiability of parametric dynamical systems”).

Instant Application
U.S. Patent No. US 10,535,012 B2 (reference patent)
Claim 6

The apparatus of claim 1, wherein the at least one processor device is further configured to estimate the at least one differential entropy of each of a plurality of posterior distributions based on the at least one parameter relating to the density at each sample and a likelihood of transition for each sample from the prior distribution to each posterior distribution, and wherein each likelihood of transition exceeds a threshold likelihood.
Claim 6

The apparatus of claim 1, wherein the subset of the plurality of samples are samples having a likelihood of transition exceeding a threshold likelihood.



The method of claim 10, wherein the at least one differential entropy of each of a plurality of posterior distributions is estimated based on the at least one parameter relating to the density at each sample and a likelihood of transition for each sample from the prior distribution to each posterior distribution, and wherein each likelihood of transition exceeds a threshold likelihood.



The apparatus of claim 1, wherein the subset of the plurality of samples are samples having a likelihood of transition exceeding a threshold likelihood.


Regarding instant claim 6, reference claim 6 does not teach “wherein the at least one processor device is further configured to estimate the at least one differential entropy of each of a plurality of posterior distributions based on the at least one parameter relating to the density at each sample and a likelihood of transition for each sample from the prior distribution to each posterior distribution”. However, Pant et al. teaches this limitation in pg. 67 Section 2. One of ordinary skill in the art would modify reference claim 6 with the teachings of Pant et al. One of ordinary skill in the arts would have been motivated to make this modification in order to provide a framework for quantification of information gain measurements that is easily parallelisable (Pant et al. pg. 66-67 Section 1). Instant claim 15 recites analogous limitations and is rejected based on the same rationale as instant claim 5.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Claim 1 recites the limitation "the likelihood" in line 8-9. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the likelihood” has been interpreted as “a likelihood”.
Claim 10 recites the limitation "the likelihood" in line 8. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the likelihood” has been interpreted as “a likelihood”.
Claim 19 recites the limitation "the likelihood" in line 10. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the likelihood” has been interpreted as “a likelihood”.
Each dependent claim is rejected based on the same rationale as the claim from which it depends.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention 

Claims 1, 3, 7, 8, 10, 12, 16, 17, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Balakrishnan et al. (US 8,589,319 B2) in view of Pant et al. (“An information-theoretic approach to assess practical identifiability of parametric dynamical systems”).
Regarding Claim 1,
Balakrishnan et al. teaches An apparatus for implementing a computing system to predict preferences, comprising: at least one processor device operatively coupled to a memory and configured to (Fig. 5 teaches processor and memory; Col. 7 lines 10-21 teaches predicting user preferences).
Balakrishnan et al. does not appear to explicitly teach calculate a parameter relating to a density of a prior distribution at each sample of a set of samples associated with the prior distribution, the at least one parameter including a distance from each sample to at least one neighboring sample; and estimate, for the plurality of samples, at least one differential entropy of at least one posterior distribution associated with at least one observation based on the parameter relating to the density of the prior distribution at each sample and the likelihood of observation for each sample, the estimation being performed without sampling the at least one posterior distribution to reduce consumption of resources of the computing system.
However, Pant et al. teaches calculate a parameter relating to a density of a prior distribution at each sample of a set of samples associated with the prior distribution, the at least one parameter including a distance from each sample to at least one neighboring sample (pg. 71 Section 6.1:

    PNG
    media_image1.png
    848
    710
    media_image1.png
    Greyscale

teaches calculating the distance between each sample to a kth-nearest neighbor relating to density; pg. 67 Section 2 Equation (2) teaches density of a prior distribution); and 
estimate, for the plurality of samples, at least one differential entropy of at least one posterior distribution associated with at least one observation based on the parameter relating to the density of the prior distribution at each sample and the likelihood of observation for each sample, the estimation being performed without sampling the at least one posterior distribution to reduce consumption of resources of the computing system (pg. 67 Section 2:

    PNG
    media_image2.png
    405
    592
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    217
    597
    media_image3.png
    Greyscale

teaches estimating the differential entropy of the posterior distribution by updating the prior probability distribution to the posterior distribution using Bayes’ theorem and using the probability density function px(x), which corresponds to estimating the differential entropy of a posterior distribution associated with a parameter related to density; Equation (2) teaches that the likelihood of observation is used to calculated the posterior probability distribution and that the posterior distribution is not sampled in the calculation of differential entropy; instead, the prior distribution is sampled; the estimation of the differential entropy of the posterior distribution is performed on the subset of px|y(x|y); pg. 66-67 Section 1: “Although the approach can be computationally expensive (depending on the computational complexity of the dynamical system under consideration), it is easily parallelisable” teaches the algorithm to calculate gain in information for the parameters (which includes calculating 
Balakrishnan et al. and Pant et al. are analogous art because they are directed to analysis related to posterior distributions.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate calculate a parameter relating to a density of a prior distribution at each sample of a set of samples associated with the prior distribution, the at least one parameter including a distance from each sample to at least one neighboring sample; and estimate, for the plurality of samples, at least one differential entropy of at least one posterior distribution associated with at least one observation based on the parameter relating to the density of the prior distribution at each sample and the likelihood of observation for each sample, the estimation being performed without sampling the at least one posterior distribution to reduce consumption of resources of the computing system as taught by Pant et al. to the disclosed invention of Balakrishnan et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to provide a framework for quantification of information gain measurements that is easily parallelisable (Pant et al. pg. 66-67 Section 1).
Regarding Claim 3,
Balakrishnan et al. in view of Pant et al. teaches the apparatus of claim 1.
Pant et al. further teaches wherein the distance from each sample to at least one neighboring sample is a distance from each sample to a kth-nearest neighbor, k being a natural number (pg. 71 Section 6.1:

    PNG
    media_image1.png
    848
    710
    media_image1.png
    Greyscale

teaches calculating the distance between each sample to a kth-nearest neighbor relating to density; pg. 72 Section 7.1.1 teaches an example in which k is 10, a natural number).
Balakrishnan et al. and Pant et al. are analogous art because they are directed to analysis related to posterior distributions.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the distance from each sample to at least one neighboring sample is a distance from each sample to a kth-nearest neighbor, k being a natural number as taught by Pant et al. to the disclosed invention of Balakrishnan et al.

Regarding Claim 7,
Balakrishnan et al. in view of Pant et al. teaches the apparatus of claim 1.
Balakrishnan et al. further teaches wherein the at least one processor device is further configured to obtain the at least one observation from a model having an internal state estimated by the prior distribution (Fig. 5 teaches processor and memory; Col. 2 line 66 to Col. 3 line 8: “In the movie domain, for example, the user may be asked whether she prefers the "The Godfather" or "Annie Hall." The user at the client device 22 provides a response 40, and the response 40 communicates back to the server 20. The recommender application 26 may then incorporate the response 40 into any of the latent factor models 28 and update the user's parameters 30. Using these updated user parameters, the recommender application 26 may then retrieve another pairwise question 38 from the database 36 of pairwise questions, and the user again provides the response 40” teaches the model incorporates user's feedback and update user parameters in order to select the next question to ask, thus rendering this process of updating to correspond to an internal state of the model; Col. 8 line 63 to Col. 9 line 3: “Starting with a prior distribution for the user parameter vector, the IG criterion is used to find a pair of items and a feedback is sought for them. The pairwise response is combined with the prior distribution using Bayes rule, employing the likelihood given in Equation #5. This results in the posterior distribution for the user vector, which can be treated as the subsequent prior distribution for the next step of feedback in this sequential process” reasonably teaches that the process of updating user parameter based on feedback is estimated by combining prior distribution with the pairwise response using Bayes rule, thus reasonably corresponding to estimating an internal state of the model (the updating of parameters) by the prior distribution).
Regarding Claim 8,
Balakrishnan et al. in view of Pant et al. teaches the apparatus of claim 7.
Balakrishnan et al. further teaches wherein the model is a behavioral model of at least one person (Fig. 5 teaches processor and memory; Col. 2 line 66 to Col. 3 line 8: “In the movie domain, for example, the user may be asked whether she prefers the "The Godfather" or "Annie Hall." The user at the client device 22 provides a response 40, and the response 40 communicates back to the server 20. The recommender application 26 may then incorporate the response 40 into any of the latent factor models 28 and update the user's parameters 30. Using these updated user parameters, the recommender application 26 may then retrieve another pairwise question 38 from the database 36 of pairwise questions, and the user again provides the response 40” teaches the model incorporates user's feedback and update user parameters in order to select the next question to ask, thus rendering the model to be a model that analyzes the user’s behavior).
Regarding Claim 10,
Claim 10 recites analogous limitations to claim 1, therefore claim 10 is rejected based on the same rationale as claim 1.
Balakrishnan et al. teaches A computer-implemented method for implementing a computer system to predict preferences, comprising (Fig. 5 teaches processor and memory; Col. 7 lines 10-21 teaches predicting user preferences).
Regarding Claim 12,
Claim 12 recites analogous limitations to claim 3, therefore claim 12 is rejected based on the same rationale as claim 3.
Balakrishnan et al. teaches A computer-implemented method for implementing a computer system to predict preferences, comprising (Fig. 5 teaches processor and memory; Col. 7 lines 10-21 teaches predicting user preferences).
Regarding Claim 16,
Claim 16 recites analogous limitations to claim 7, therefore claim 16 is rejected based on the same rationale as claim 7.
Balakrishnan et al. teaches A computer-implemented method for implementing a computer system to predict preferences, comprising (Fig. 5 teaches processor and memory; Col. 7 lines 10-21 teaches predicting user preferences).
Regarding Claim 17,
Claim 17 recites analogous limitations to claim 8, therefore claim 17 is rejected based on the same rationale as claim 8.
Balakrishnan et al. teaches A computer-implemented method for implementing a computer system to predict preferences, comprising (Fig. 5 teaches processor and memory; Col. 7 lines 10-21 teaches predicting user preferences).
Regarding Claim 19,
Claim 19 recites analogous limitations to claim 1, therefore claim 19 is rejected based on the same rationale as claim 1.
Balakrishnan et al. teaches A computer program product for implementing a computer system to predict preferences, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform operations comprising (see Col. 15 lines 42-50; Col. 7 lines 10-21 teaches predicting user preferences).



Claims 2 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Balakrishnan et al. (US 8,589,319 B2) in view of Pant et al. (“An information-theoretic approach to assess practical identifiability of parametric dynamical systems”) and further in view of Smallwood (US 8,255,263 B2).
Regarding Claim 2,
Balakrishnan et al. in view of Pant et al. teaches the apparatus of claim 1.
Balakrishnan et al. further teaches wherein the at least one processor device is further configured to (Fig. 5 teaches processor and memory).
Balakrishnan et al. in view of Pant et al. does not appear to explicitly teach generate a plurality samples from the prior distribution; obtain, for each sample among the plurality of samples, a likelihood of an observation as an output of a likelihood function given the sample; and eliminate samples from the plurality of samples having a likelihood less than a threshold value to generate the set of samples.
However, Smallwood teaches generate a plurality samples from the prior distribution (Col. 7 lines 10-24: “After asking the question and receiving an answer, the level of uncertainty about the user's WTPs will change and the new Gpdf may be determined according to Bayes' rule: F(W|D)=Z G(W)L(D|W)…where W is the vector of the user's WTPs, which are unknown; G(W) is the prior Gpdf for the user's WTPs; D is the observed data, i.e., the user's answer to the question; L(D|W) is a likelihood function for the observed data given knowledge of the user's WTPs; Z is a normalizing constant, i.e., the inverse of the summation of G(W)L(D|W) over all values of W” teaches using the prior Gaussian probability density function G(W) for the user’s willingness-to-pay (WTP) to generate sample values of the user’s WTPs); 
obtain, for each sample among the plurality of samples, a likelihood of an observation as an output of a likelihood function given the sample (Col. 7: lines 20-21: “L(DIW) is a likelihood function for the observed data given knowledge of the user's WTPs” teaches obtaining the likelihood of the 
eliminate samples from the plurality of samples having a likelihood less than a threshold value to generate the set of samples (Col. 10 lines 24-31: “The total value for all the options associated with each model are compared and the option with the highest value for each model becomes the representative for the model for the purposes of calculating recommended products. The top models, for example, the top ten or the top five models with their associated best options may be shown to the user as the recommended products for his or her consideration” teaches selecting the top five or top ten product models to recommend to the user in which the selection is based on the total value, which correspond to eliminating products that is less likely to be recommended; since product recommendation is based on the total value (which includes the WTP values), the selection of top products (elimination of non-top products) is based on some threshold total value; also see Col. 9 lines 5-16).
Balakrishnan et al., Pant et al., and Smallwood are analogous art because they are directed to analysis related to posterior distributions.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate generate a plurality samples from the prior distribution; obtain, for each sample among the plurality of samples, a likelihood of an observation as an output of a likelihood function given the sample; and eliminate samples from the plurality of samples having a likelihood less than a threshold value to generate the set of samples as taught by Smallwood to the disclosed invention of Balakrishnan et al. in view of Pant et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “offer consumers an easier way to find products that best meet individual consumer needs and preferences for products, thereby increasing their satisfaction and the efficiency of the free market” 
Regarding Claim 11,
Claim 11 recites analogous limitations to claim 2, therefore claim 11 is rejected based on the same rationale as claim 2.
Balakrishnan et al. teaches A computer-implemented method for implementing a computer system to predict preferences, comprising (Fig. 5 teaches processor and memory; Col. 7 lines 10-21 teaches predicting user preferences).

Claims 4 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Balakrishnan et al. (US 8,589,319 B2) in view of Pant et al. (“An information-theoretic approach to assess practical identifiability of parametric dynamical systems”) and further in view of Ajgl et al. (“Differential entropy estimation by particles”).
Regarding Claim 4,
Balakrishnan et al. in view of Pant et al. teaches the apparatus of claim 1.
Balakrishnan et al. further teaches wherein the at least one processor device is further configured to (Fig. 5 teaches processor and memory).
Pant et al. further teaches estimate the at least one differential entropy of the at least one posterior distribution by approximating a probability density function of the prior distribution at each sample (pg. 67 Section 2:

    PNG
    media_image2.png
    405
    592
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    217
    597
    media_image3.png
    Greyscale

teaches estimating the differential entropy of the posterior distribution by updating the prior probability distribution to the posterior distribution using Bayes’ theorem and using the prior distribution’s probability density function px(x)).
Balakrishnan et al. and Pant et al. are analogous art because they are directed to analysis related to posterior distributions.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate w estimate the at least one differential entropy of the at least one posterior distribution by approximating a probability density function of the prior distribution at each sample as taught by Pant et al. to the disclosed invention of Balakrishnan et al.

Balakrishnan et al. in view of Pant et al. does not appear to explicitly teach using a volume of a sphere having a radius equal to the distance.
However, Ajgl et al. teaches using a volume of a sphere having a radius equal to the distance (pg. 11993 ¶-1: “Another way is to use the volume of nx dimensional sphere with the radius                         
                            
                                
                                    p
                                
                                
                                    1
                                
                                
                                    i
                                
                            
                        
                    , i.e. the distance to the nearest neighbour” teaches a sphere with radius equal to the distance).
Balakrishnan et al., Pant et al., and Ajgl et al. are analogous art because they are directed to analysis related to distributions.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate using a volume of a sphere having a radius equal to the distance as taught by Ajgl et al. to the disclosed invention of Balakrishnan et al. in view of Pant et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to provide nonparametric entropy estimator that can be applied in the fusion problem, while conventional running particle filter approaches lack such application (Ajgl et al. pg. 11992 ¶-2).
Regarding Claim 13,
Claim 13 recites analogous limitations to claim 4, therefore claim 13 is rejected based on the same rationale as claim 4.
Balakrishnan et al. teaches A computer-implemented method for implementing a computer system to predict preferences, comprising (Fig. 5 teaches processor and memory; Col. 7 lines 10-21 teaches predicting user preferences).

Claims 5 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Balakrishnan et al. (US 8,589,319 B2) in view of Pant et al. (“An information-theoretic approach to assess practical identifiability of parametric dynamical systems”) and further in view of Gupta et al. (“Parametric Bayesian Estimation of Differential Entropy and Relative Entropy”).
Regarding Claim 5,
Balakrishnan et al. in view of Pant et al. teaches the apparatus of claim 1.
Balakrishnan et al. further teaches wherein the at least one processor device is further configured to (Fig. 5 teaches processor and memory).
Balakrishnan et al. in view of Pant et al. does not appear to explicitly teach estimate the at least one differential entropy of the at least one posterior distribution having Euler's constant as a constant term.
However, Gupta et al. teaches estimate the at least one differential entropy of the at least one posterior distribution having Euler's constant as a constant term (pg. 826 ¶-1: “we estimate the differential entropy as: EN[h(N)], where the expectation is taken with respect to the posterior distribution over N” teaches that differential entropy is a characteristic of posterior distribution; pg. 821 ¶-3: “The special case that best validates the high-rate quantization assumptions is when the number of quantization cells is as large as possible, and they show that this special case produces the nearest-neighbor differential entropy estimator originally proposed by Kozachenko and Leonenko in 1987 [9]:

    PNG
    media_image4.png
    118
    1452
    media_image4.png
    Greyscale

where 𝛌 is the Euler-Mascheroni constant” teaches that an estimator for differential entropy (characteristic of posterior distribution) has a Euler-Mascheroni constant, which corresponds to Euler’s constant).

It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate estimate the at least one differential entropy of the at least one posterior distribution having Euler's constant as a constant term as taught by Gupta et al. to the disclosed invention of Balakrishnan et al. in view of Pant et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to provide an approach for estimation of differential entropy and relative entropy that has significant performance improvement over other estimates approaches (Gupta et al. pg. 819 ¶-2).
Regarding Claim 14,
Claim 14 recites analogous limitations to claim 5, therefore claim 14 is rejected based on the same rationale as claim 5.
Balakrishnan et al. teaches A computer-implemented method for implementing a computer system to predict preferences, comprising (Fig. 5 teaches processor and memory; Col. 7 lines 10-21 teaches predicting user preferences).

Claims 6 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Balakrishnan et al. (US 8,589,319 B2) in view of Pant et al. (“An information-theoretic approach to assess practical identifiability of parametric dynamical systems”) and further in view of Suzuki (US 2011/0060708 A1).
Regarding Claim 6,
Balakrishnan et al. in view of Pant et al. teaches the apparatus of claim 1.
Balakrishnan et al. further teaches wherein the at least one processor device is further configured to (Fig. 5 teaches processor and memory).
Pant et al. further teaches estimate the at least one differential entropy of each of a plurality of posterior distributions based on the at least one parameter relating to the density at each sample and a likelihood of transition for each sample from the prior distribution to each posterior distribution (pg. 67 Section 2:

    PNG
    media_image2.png
    405
    592
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    217
    597
    media_image3.png
    Greyscale

teaches estimating the differential entropy of the posterior distribution by updating the prior probability distribution to the posterior distribution using Bayes’ theorem and using the probability density function px(x) and the likelihood of py|x(y|x), which corresponds to estimating the differential entropy of a posterior distribution based on parameter related to density and likelihood of transition).
Balakrishnan et al. and Pant et al. are analogous art because they are directed to analysis related to posterior distributions.

One of ordinary skill in the arts would have been motivated to make this modification in order to provide a framework for quantification of information gain measurements that is easily parallelisable (Pant et al. pg. 66-67 Section 1).
Balakrishnan et al. in view of Pant et al. does not appear to explicitly teach wherein each likelihood of transition exceeds a threshold likelihood.
However, Suzuki teaches wherein each likelihood of transition exceeds a threshold likelihood (Fig. 31 S182: “Set a transition probability equal to or greater than a threshold (here, 0.01) to 0.9…” teaches a transition probability exceeding a threshold probability; Fig. 46 teaches the input data to the ACHMM includes samples).
Balakrishnan et al., Pant et al., and Suzuki are analogous art because they are directed to analysis related to posterior distributions.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein each likelihood of transition exceeds a threshold likelihood as taught by Suzuki to the disclosed invention of Balakrishnan et al. in view of Pant et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to provide a system that enables improvement to the posterior probability of the ACHMM (Suzuki pg. 44 [0944]).


Regarding Claim 15,
Claim 15 recites analogous limitations to claim 6, therefore claim 15 is rejected based on the same rationale as claim 6.
Balakrishnan et al. teaches A computer-implemented method for implementing a computer system to predict preferences, comprising (Fig. 5 teaches processor and memory; Col. 7 lines 10-21 teaches predicting user preferences).

Allowable Subject Matter
Claims 9, 18, and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, and if the ground(s) of rejection to claims 9, 18, and 20 are withdrawn.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YING YU CHEN whose telephone number is (571)270-1484.  The examiner can normally be reached on Monday-Friday 7:30 am-5:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained 






/YING YU CHEN/               Examiner, Art Unit 2125