DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
Information disclosure statements (IDS) were submitted on 30 November 2022 and 17 August 2022. The submissions are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner. 

	
Response to Amendment
This action is in response to submission filed 21 November 2022 for application 16/230,620.  Claims 1-3, 8-10, and 15-17 have been amended. Claims 4, 11, and 18 are canceled. Currently claims 1-3, 5-10, 12-17, 19, and 20 are pending and have been examined.

Response to Arguments
Applicant’s arguments, see pages 12-16, filed 21 November 2022, with respect to the feature “the merge layer to combine a total number of tuners known to be included in the first one of the return path data households with a first feature vector based on the second set of features to determine a second merged feature vector” as recited in independent claim 1 (and similarly in independent claims 8 and 15) have been considered but are moot because the new ground of rejection (citing new references Zheng et al (A Deep Learning Approach for Expert Identification in Question Answering Communities, 2017) and Harvey et al (US 20110288907 A1) for teaching the new limitation) does not rely on any reference combination applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1, 8, and 15 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 15, and 8 respectively of copending Application No. 16/706,398 (reference application) in view of Sullivan et al (US 20170064358 A1) and further in view of Eldering et al (US 20030149975 A1), Morfi et al (Deep Learning on Low-Resource Datasets, 2018), Zheng et al (A Deep Learning Approach for Expert Identification in Question Answering Communities, 2017), and Harvey et al (US 20110288907 A1).

Instant application 16/230,620
Co-pending application 16/706,398
A demographic estimation system comprising: 
at least one memory;
instructions in the system;
and at least one processor to execute the instructions to at least:


generate features from return path data reported from set-top boxes associated with return path data households, 

implement a neural network to process the features generated from the return path data to predict demographic classification probabilities for the return path data households, the neural network to be trained based on panel data reported from meters that monitor media devices associated with panelist households, 

and assign one or more demographic categories to respective ones of the return path data households based on the predicted demographic classification probabilities.


A demographic estimation system comprising: 
at least one memory;
machine readable instructions;
and processor circuitry to at least one of instantiate or execute the machine readable instructions to:

generate features from return path data reported from set-top boxes associated with return path data households; 

execute a neural network to process the features generated from the return path data to predict demographic classification probabilities for the return path data households, the neural network to be trained based on panel data reported from meters that monitor media devices associated with panelist households; 


execute a first assignment procedure to assign one or more demographic categories to respective ones of the return path data households based on the predicted demographic classification probabilities; 
        However, co-pending application 16/706,398 does not explicitly disclose: the features including a first set of features associated with a first one of the return path data households, the first set of features including a set of view blocks determined from the return path data reported by a first one of the set-top boxes associated with the first one of the return path data households, respective ones of the view blocks to be associated with respective different time intervals, a first one of the view blocks corresponding to a first one of the time intervals to include a channel change rate that is based on a ratio of (i) a number of channel changes that occurred during the first one of the time intervals to (ii) a duration of the first one of the time intervals; the neural network -3-Response to the Final Office Action dated December 21, 2021 including a time distributed dense layer and a merge layer, the time distributed dense layer to reduce the first set of features into a second set of features less in number than the first set of features, the second set of features associated with the first one of the return path data households, the time distributed dense layer including a set of weights to map the view blocks of the first set of features into the second set of features, the merge layer to combine a total number of tuners known to be included in the first one of the return path data households with a first feature vector based on the second set of features to determine a second merged feature vector. 

                 Sullivan teaches: the features including a first set of features associated with a first one of the return path data households ([0075] For example, in the first row of the feature matrix that is associated with the first panelist household. Note: First panelist household corresponds to the first one of the return path data households), the first set of features including a set of view blocks determined from the return path data reported by a first one of the set-top boxes associated with the first one of the return path data households, respective ones of the view blocks to be associated with respective different time intervals, a first one of the view blocks corresponding to a first one of the time intervals ([0049] For example, each tuning event of the tuning data 108 is identified by a channel (e.g., ABC, NBC, USA Network, Comedy Central, NBCSports, HGTV, etc.) and a time (e.g., a particular time such as 7:10 A.M. or 8:31 P.M., a predetermined time-period segment such as 7:00-7:15 A.M. or 8:00-8:30 P.M., etc.) associated with the tuning event. [0075] For example, in the first row of the feature matrix that is associated with the first panelist household. Note: First panelist household corresponds to the first one of the return path data households. First row of the feature matrix corresponds to the first set of features. The tuning event example corresponds to a view block and the time-period segments corresponds to the different time intervals);
the second set of features associated with the first one of the return path data households ([0075] For example, in the first row of the feature matrix that is associated with the first panelist household. Note: First panelist household corresponds to the first one of the return path data households. First row of the feature matrix  corresponds to first set of features that are reduced into second set of features),
the view blocks of the first set of features ([0049] For example, each tuning event of the tuning data 108 is identified by a channel (e.g., ABC, NBC, USA Network, Comedy Central, NBCSports, HGTV, etc.) and a time (e.g., a particular time such as 7:10 A.M. or 8:31 P.M., a predetermined time-period segment such as 7:00-7:15 A.M. or 8:00-8:30 P.M., etc.) associated with the tuning event. [0075] For example, in the first row of the feature matrix that is associated with the first panelist household. Note: First panelist household corresponds to the first one of the return path data households. First row of the feature matrix corresponds to the first set of features. The tuning event example corresponds to a view block).

It would have been obvious to incorporate at least one memory, instructions in the system, and at least one processor to execute the instructions, the features including a first set of features associated with a first one of the return path data households, the first set of features including a set of view blocks determined from the return path data reported by a first one of the set-top boxes associated with the first one of the return path data households, respective ones of the view blocks to be associated with respective different time intervals, a first one of the view blocks corresponding to a first one of the time intervals, the second set of features associated with the first one of the return path data households, and the view blocks of the first set of features of Sullivan into the invention of claim 1 of the co-pending application. One would have been motivated to do this modification because doing so would give the benefit of a identifying each tuning event by a channel and a time period as taught by Sullivan [0061].
Eldering teaches: to include a channel change rate that is based on a ratio of (i) a number of channel changes that occurred during the first one of the time intervals to (ii) a duration of the first one of the time intervals ([0112] The viewing characteristics can identify such attributes as channel change rate, dwell time, etc. Moreover, the viewing characteristics may be broken out by day or day part);
It would have been obvious to incorporate a channel change rate that is based on a ratio of (i) a number of channel changes that occurred during the first one of the time intervals to (ii) a duration of the first one of the time intervals of Eldering into the invention of claim 1 of the co-pending application. One would have been motivated to do this modification because doing so would give the benefit of further defining the viewing characteristics.
Morfi teaches: the neural network including a time distributed dense layer to reduce the first set of features into a second set of features less in number than the first set of features ([Page 4, Section 2.2.1] Neural Network Architecture. [Page 5, Paragraph 1] Next we apply time distributed dense layers to reduce feature-length dimensionality. Note: Reduce feature-length dimensionality corresponds to reducing the first set of features into a second set of features less in number than the first set of features), the time distributed dense layer including a set of weights to map the first set of features into the second set of features ([Page 5, Paragraph 1] Next we apply time distributed dense layers to reduce feature-length dimensionality. [Page 6, Paragraph 2] Training is performed by updating the network weights).
It would have been obvious to incorporate the neural network including a time distributed dense layer to reduce the first set of features into a second set of features less in number than the first set of features, the time distributed dense layer including a set of weights to map the first set of features into the second set of features of Morfi into the invention of claim 1 of the co-pending application. One would have been motivated to do this modification because doing so would give the benefit of reducing feature-length dimensionality as taught by Morfi [Page 5, Paragraph 1].
Zheng teaches: and a merge layer ([Page 3, Column 2, Section: Architectures, Paragraph 2] the Merge layer);
to combine with a first feature vector based on the second set of features to determine a second merged feature vector ([Page 4, Figure 1, Paragraph 1] we merge these three vectors together as the new feature vector).
It would have been obvious to incorporate the merge layer to combine with a first feature vector based on the second set of features to determine a second merged feature vector of Zheng into the invention of claim 1 of the co-pending application. One would have been motivated to do this modification because doing so would give the benefit of decreasing the dimensions for this new feature vector as taught by Zheng [Page 4, Figure 1, Paragraph 1].
Harvey teaches: a total number of tuners known to be included in the first one of the return path data households ([Abstract]  Data may also be collected from various types of metering devices. [0064] As applied herein with regard to certain embodiments, the terms "tuning" and "viewing" may be used interchangeably. Also, the terms "viewership," "viewing," and "viewer" can be defined as television usage, for example, as measured by household and DSTB tuning records. [0211] For example, a panelist might agree to employ or participate with a certain threshold number of data measurement devices or identification tools in exchange for receiving the incentive. Note: Metering devices and DSTB (digital set top boxes) corresponds to Tuners. Tuning records corresponds to Return path data and panelist corresponds to return path data household).
It would have been obvious to incorporate a total number of tuners known to be included in the first one of the return path data households of Harvey into the invention of claim 1 of the co-pending application. One would have been motivated to do this modification because doing so would give the benefit of the data being matched for calculating metrics as taught by Harvey [Abstract].



Claims 8 and 15 are rejected for similar reasons set forth in the rejection of claims 1.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

Claims 5, 12, and 19 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 15, and 8 respectively of copending Application No. 16/706,398 in view of Sullivan et al (US 20170064358 A1) and further in view of Eldering et al (US 20030149975 A1), Morfi et al (Deep Learning on Low-Resource Datasets, 2018), Zheng et al (A Deep Learning Approach for Expert Identification in Question Answering Communities, 2017), and Harvey et al (US 20110288907 A1).
Regarding claim 5
The system of Claim 1 of the co-pending application, Sullivan, Eldering, and Morfi teaches: the demographic estimation system of claim 1 (as shown above).
Sullivan further teaches: wherein the at least one processor ([0127] FIG. 11 is a block diagram of an example processor) is to solve an objective function subject to a set of constraints to assign the one or more demographic categories to the respective ones of the return path data households, the objective function based on the predicted demographic classification probabilities ([0055] For example, the demographic distribution calculated by the example distribution calculator 124 identifies a count or percentage of panelists who consumed the media associated with the tuning event are of demographic constraints of interest (e.g., constraints of an age/gender demographic dimension, a race dimension, an income dimension, and/or an education dimension, etc.). (Note: Identifying a count or percentage corresponds to an objective function). [0056] For example, for a tuning event of the tuning data 108 associated with “Premier League Live” on NBCSports at 7:30 A.M. on Sunday, the distribution calculator 124 collects demographics data associated with panelists who viewed the same channel (i.e., NBCSports) at substantially the same time (e.g., 7:32 A.M. on Sunday) and calculates a demographic distribution for those panelists (e.g., 20% are 18-45 year-old females, 40% are 18-45 year-old males, 10% are 46-64 year-old females, 20% are 46-64 year-old males, 5% are 65+ year-old females, and 5% are 65+ year old males). As a result, a demographic distribution represents probabilities or likelihoods that a consumer of media (e.g., one of the members 112, 114, 116 of the household 102) matches particular demographic dimensions of interest. For example, a person who views “Premier League Live” on NBCSports at 7:30 A.M. on Sunday is 20% likely to be 18-45 year-old female, 40% likely to be a 18-45 year-old male, 10% likely to be a 46-64 year-old female, 20% likely to be a 46-64 year-old male, 5% likely to be a 65+ year-old female, and 5% likely to be a 65+ year old male). 
It would have been obvious to incorporate the objective function subject to a set of constraints of Sullivan into the invention of claim 1 of the co-pending application. One would have been motivated to do this modification because doing so would give the benefit of a demographic distribution representing probabilities or likelihoods that a consumer of media matches particular demographic dimensions of interest.

Claims 12 and 19 are rejected for reasons set forth in the rejections of claims 5.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  


Claims 1, 5, 8, 12, 15, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Sullivan et al (US 20170064358 A1) and further in view of Eldering et al (US 20030149975 A1), Morfi et al (Deep Learning on Low-Resource Datasets, 2018), Zheng et al (A Deep Learning Approach for Expert Identification in Question Answering Communities, 2017), and Harvey et al (US 20110288907 A1).
Regarding claim 1 
Sullivan teaches: A demographic estimation system comprising: 
at least one memory;
instructions in the system;
and at least one processor to execute the instructions to at least ([0090] In this example, the machine readable instructions comprise a program for execution by a processor such as the processor 1112 shown in the example processor platform 1100 discussed below in connection with FIG. 11. The program may be embodied in software stored on a tangible computer readable storage medium such as a CD-ROM, a floppy disk, a hard drive, a digital versatile disk (DVD), a Blu-ray disk, or a memory associated with the processor 1112, but the entire program and/or parts thereof could alternatively be executed by a device other than the processor 1112 and/or embodied in firmware or dedicated hardware. [0091] As mentioned above, the example processes of FIGS. 3-5 may be implemented using coded instructions (e.g., computer and/or machine readable instructions) stored on a tangible computer readable storage medium such as a hard disk drive, a flash memory, a read-only memory (ROM), a compact disk (CD), a digital versatile disk (DVD), a cache, a random-access memory (RAM) and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief instances, for temporarily buffering, and/or for caching of the information)):
generate features from return path data reported from set-top boxes associated with return path data households ([0003] collect tuning data from set-top boxes of panelist households. [0075] the illustrated example constructs feature matrices associated with the respective training group and testing group of the panelist households. An example feature matrix constructed by the decision tree trainer 208 includes rows associated respective panelist households and columns associated with respective household features. Additionally or alternatively, some columns of example feature matrices are associated with other household characteristics (e.g., a total number of minutes consumed by the household, a number of minutes consumed by the household per predetermined time-period segments (e.g. per quarter-hours of the day), a number of STBs within a household, etc.). Note: Tuning data corresponds to Return path data and panelist household corresponds to return path data household), the features including a first set of features associated with a first one of the return path data households ([0075] For example, in the first row of the feature matrix that is associated with the first panelist household. Note: First panelist household corresponds to the first one of the return path data households), the first set of features including a set of view blocks determined from the return path data reported by a first one of the set-top boxes associated with the first one of the return path data households, respective ones of the view blocks to be associated with respective different time intervals, a first one of the view blocks corresponding to a first one of the time intervals ([0049] For example, each tuning event of the tuning data 108 is identified by a channel (e.g., ABC, NBC, USA Network, Comedy Central, NBCSports, HGTV, etc.) and a time (e.g., a particular time such as 7:10 A.M. or 8:31 P.M., a predetermined time-period segment such as 7:00-7:15 A.M. or 8:00-8:30 P.M., etc.) associated with the tuning event. [0075] For example, in the first row of the feature matrix that is associated with the first panelist household. Note: First panelist household corresponds to the first one of the return path data households. The tuning event example corresponds to a view block and the time-period segments corresponds to the different time intervals)
implement a neural network to process the features generated from the return path data to predict demographic classification probabilities for the return path data households, the neural network to be trained based on panel data reported from meters that monitor media devices associated with panelist household ([0086] alternative examples of the household estimator 210 utilize other forms of machine learning (e.g., neural networks, etc.) to estimate the demographics of the household 102. In such examples, the decision tree trainer 208 and/or another machine learning trainer constructs the corresponding machine learning classifier (e.g., neural networks) utilized to estimate the demographics of the household 102. [0003] collect tuning data from set-top boxes of panelist households. [0056] For example, for a tuning event of the tuning data 108 associated with “Premier League Live” on NBCSports at 7:30 A.M. on Sunday, the distribution calculator 124 collects demographics data associated with panelists who viewed the same channel (i.e., NBCSports) at substantially the same time (e.g., 7:32 A.M. on Sunday) and calculates a demographic distribution for those panelists (e.g., 20% are 18-45 year-old females, 40% are 18-45 year-old males, 10% are 46-64 year-old females, 20% are 46-64 year-old males, 5% are 65+ year-old females, and 5% are 65+ year old males). As a result, a demographic distribution represents probabilities or likelihoods that a consumer of media (e.g., one of the members 112, 114, 116 of the household 102) matches particular demographic dimensions of interest. [0022] To enable the AMEs to collect such consumption data, the AMEs typically provide panelist households with meter(s) that monitor media presentation devices (e.g., televisions, stereos, speakers, computers, portable devices, gaming consoles, and/or online media presentation devices, etc.) of the household. Note: Tuning data corresponds to Return path data and panelist household corresponds to return path data household); the second set of features associated with the first one of the return path data households ([0075] For example, in the first row of the feature matrix that is associated with the first panelist household. Note: First panelist household corresponds to the first one of the return path data households. First row of the feature matrix  corresponds to first set of features that are reduced into second set of features), the view blocks of the first set of features ([0049] For example, each tuning event of the tuning data 108 is identified by a channel (e.g., ABC, NBC, USA Network, Comedy Central, NBCSports, HGTV, etc.) and a time (e.g., a particular time such as 7:10 A.M. or 8:31 P.M., a predetermined time-period segment such as 7:00-7:15 A.M. or 8:00-8:30 P.M., etc.) associated with the tuning event. [0075] For example, in the first row of the feature matrix that is associated with the first panelist household. Note: First panelist household corresponds to the first one of the return path data households. First row of the feature matrix corresponds to the first set of features. The tuning event example corresponds to a view block);
and assign one or more demographic categories to respective ones of the return path data households based on the predicted demographic classification probabilities ([0056] For example, for a tuning event of the tuning data 108 associated with “Premier League Live” on NBCSports at 7:30 A.M. on Sunday, the distribution calculator 124 collects demographics data associated with panelists who viewed the same channel (i.e., NBCSports) at substantially the same time (e.g., 7:32 A.M. on Sunday) and calculates a demographic distribution for those panelists (e.g., 20% are 18-45 year-old females, 40% are 18-45 year-old males, 10% are 46-64 year-old females, 20% are 46-64 year-old males, 5% are 65+ year-old females, and 5% are 65+ year old males). As a result, a demographic distribution represents probabilities or likelihoods that a consumer of media (e.g., one of the members 112, 114, 116 of the household 102) matches particular demographic dimensions of interest. For example, a person who views “Premier League Live” on NBCSports at 7:30 A.M. on Sunday is 20% likely to be 18-45 year-old female, 40% likely to be a 18-45 year-old male, 10% likely to be a 46-64 year-old female, 20% likely to be a 46-64 year-old male, 5% likely to be a 65+ year-old female, and 5% likely to be a 65+ year old male. [0058] Based on the tuning data 108, the demographics distributions associated with respective tuning events and/or the average demographics distribution of the panelists, the characteristic estimator 126 estimates household characteristics of the household 102 such as (1) a number of members of the household 102 (e.g., three household members 112, 114, 116) and (2) the demographics of each of the estimated household members (e.g., the demographics of each of the members 112, 114, 116)).
However, Sullivan does not explicitly disclose: to include a channel change rate that is based on a ratio of (i) a number of channel changes that occurred during the first one of the time intervals to (ii) a duration of the first one of the time intervals; the neural network including a time distributed dense layer and a merge layer, the time distributed dense layer to reduce the first set of features into a second set of features less in number than the first set of features, the time distributed dense layer including a set of weights to map the first set of features into the second set of features, the merge layer to combine a total number of tuners known to be included in the first one of the return path data households with a first feature vector based on the second set of features to determine a second merged feature vector.
Eldering teaches, in an analogous system: to include a channel change rate that is based on a ratio of (i) a number of channel changes that occurred during the first one of the time intervals to (ii) a duration of the first one of the time intervals ([0112] The viewing characteristics can identify such attributes as channel change rate, dwell time, etc. Moreover, the viewing characteristics may be broken out by day or day part).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the demographic estimation system of Sullivan to incorporate the teachings of Eldering to include a channel change rate that is based on a ratio of (i) a number of channel changes that occurred during the first one of the time intervals to (ii) a duration of the first one of the time intervals. One would have been motivated to do this modification because doing so would give the benefit of further defining the viewing characteristics as taught by Eldering paragraph [0112].
Morfi teaches, in an analogous system: the neural network including a time distributed dense layer to reduce the first set of features into a second set of features less in number than the first set of features ([Page 4, Section 2.2.1] Neural Network Architecture. [Page 5, Paragraph 1] Next we apply time distributed dense layers to reduce feature-length dimensionality. Note: Reduce feature-length dimensionality corresponds to reducing the first set of features into a second set of features less in number than the first set of features), the time distributed dense layer including a set of weights to map the first set of features into the second set of features ([Page 5, Paragraph 1] Next we apply time distributed dense layers to reduce feature-length dimensionality. [Page 6, Paragraph 2] Training is performed by updating the network weights).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Sullivan and Eldering to incorporate the teachings of Morfi to use a neural network including a time distributed dense layer to reduce the first set of features into a second set of features less in number than the first set of features and the time distributed dense layer including a set of weights to map the first set of features into the second set of features. One would have been motivated to do this modification because doing so would give the benefit of reducing feature-length dimensionality as taught by Morfi [Page 5, Paragraph 1].
Zheng teaches, in an analogous system: and a merge layer ([Page 3, Column 2, Section: Architectures, Paragraph 2] the Merge layer);
to combine with a first feature vector based on the second set of features to determine a second merged feature vector ([Page 4, Figure 1, Paragraph 1] we merge these three vectors together as the new feature vector).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Sullivan, Eldering, and Morfi to incorporate the teachings of Zheng to use a merge layer and to combine with a first feature vector based on the second set of features to determine a second merged feature vector. One would have been motivated to do this modification because doing so would give the benefit of decreasing the dimensions for this new feature vector as taught by Zheng [Page 4, Figure 1, Paragraph 1].
Harvey teaches, in an analogous system: a total number of tuners known to be included in the first one of the return path data households ([Abstract]  Data may also be collected from various types of metering devices. [0064] As applied herein with regard to certain embodiments, the terms "tuning" and "viewing" may be used interchangeably. Also, the terms "viewership," "viewing," and "viewer" can be defined as television usage, for example, as measured by household and DSTB tuning records. [0211] For example, a panelist might agree to employ or participate with a certain threshold number of data measurement devices or identification tools in exchange for receiving the incentive. Note: Metering devices and DSTB (digital set top boxes) corresponds to Tuners. Tuning records corresponds to Return path data and panelist corresponds to return path data household).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Sullivan, Eldering, Morfi, and Zheng to incorporate the teachings of Harvey to use a total number of tuners known to be included in the first one of the return path data households. One would have been motivated to do this modification because doing so would give the benefit of the data being matched for calculating metrics as taught by Harvey [Abstract].

Regarding claim 5
The system of Sullivan, Eldering, Morfi, Zheng, and Harvey teaches: The demographic estimation system of claim 1 (as shown above).
Sullivan further teaches: wherein the at least one processor ([0127] FIG. 11 is a block diagram of an example processor) is to solve an objective function subject to a set of constraints to assign the one or more demographic categories to the respective ones of the return path data households, the objective function based on the predicted demographic classification probabilities ([0055] For example, the demographic distribution calculated by the example distribution calculator 124 identifies a count or percentage of panelists who consumed the media associated with the tuning event are of demographic constraints of interest (e.g., constraints of an age/gender demographic dimension, a race dimension, an income dimension, and/or an education dimension, etc.). (Note: Identifying a count or percentage corresponds to an objective function). [0056] For example, for a tuning event of the tuning data 108 associated with “Premier League Live” on NBCSports at 7:30 A.M. on Sunday, the distribution calculator 124 collects demographics data associated with panelists who viewed the same channel (i.e., NBCSports) at substantially the same time (e.g., 7:32 A.M. on Sunday) and calculates a demographic distribution for those panelists (e.g., 20% are 18-45 year-old females, 40% are 18-45 year-old males, 10% are 46-64 year-old females, 20% are 46-64 year-old males, 5% are 65+ year-old females, and 5% are 65+ year old males). As a result, a demographic distribution represents probabilities or likelihoods that a consumer of media (e.g., one of the members 112, 114, 116 of the household 102) matches particular demographic dimensions of interest. For example, a person who views “Premier League Live” on NBCSports at 7:30 A.M. on Sunday is 20% likely to be 18-45 year-old female, 40% likely to be a 18-45 year-old male, 10% likely to be a 46-64 year-old female, 20% likely to be a 46-64 year-old male, 5% likely to be a 65+ year-old female, and 5% likely to be a 65+ year old male).	

Regarding claim 8
Sullivan teaches: A non-transitory computer readable medium including computer readable instructions that, when executed, cause a processor to at least: ([0091] may be implemented using coded instructions (e.g., computer and/or machine readable instructions) stored on a non-transitory computer and/or machine readable medium. [0014] FIG. 11 is a block diagram of an example processor system structured to execute the example machine readable instructions): 
generate features from return path data reported from set-top boxes associated with return path data households ([0003] collect tuning data from set-top boxes of panelist households. [0075] the illustrated example constructs feature matrices associated with the respective training group and testing group of the panelist households. An example feature matrix constructed by the decision tree trainer 208 includes rows associated respective panelist households and columns associated with respective household features. Additionally or alternatively, some columns of example feature matrices are associated with other household characteristics (e.g., a total number of minutes consumed by the household, a number of minutes consumed by the household per predetermined time-period segments (e.g. per quarter-hours of the day), a number of STBs within a household, etc.). Note: Tuning data corresponds to Return path data and panelist household corresponds to return path data household), the features including a first set of features associated with a first one of the return path data households ([0075] For example, in the first row of the feature matrix that is associated with the first panelist household. Note: First panelist household corresponds to the first one of the return path data households),  the first set of features including a set of view blocks determined from the return path data reported by a first one of the set-top boxes associated with the first one of the return path data households, respective ones of the view blocks to be associated with respective different time intervals, a first one of the view blocks corresponding to a first one of the time intervals ([0049] For example, each tuning event of the tuning data 108 is identified by a channel (e.g., ABC, NBC, USA Network, Comedy Central, NBCSports, HGTV, etc.) and a time (e.g., a particular time such as 7:10 A.M. or 8:31 P.M., a predetermined time-period segment such as 7:00-7:15 A.M. or 8:00-8:30 P.M., etc.) associated with the tuning event. [0075] For example, in the first row of the feature matrix that is associated with the first panelist household. Note: First panelist household corresponds to the first one of the return path data households. The tuning event example corresponds to a view block and the time-period segments corresponds to the different time intervals);
implement a neural network to process the features generated from the return path data to predict demographic classification probabilities for the return path data households, the neural network to be trained based on panel data reported from meters that monitor media devices associated with panelist households ([0086] alternative examples of the household estimator 210 utilize other forms of machine learning (e.g., neural networks, etc.) to estimate the demographics of the household 102. In such examples, the decision tree trainer 208 and/or another machine learning trainer constructs the corresponding machine learning classifier (e.g., neural networks) utilized to estimate the demographics of the household 102. [0003] collect tuning data from set-top boxes of panelist households. [0056] For example, for a tuning event of the tuning data 108 associated with “Premier League Live” on NBCSports at 7:30 A.M. on Sunday, the distribution calculator 124 collects demographics data associated with panelists who viewed the same channel (i.e., NBCSports) at substantially the same time (e.g., 7:32 A.M. on Sunday) and calculates a demographic distribution for those panelists (e.g., 20% are 18-45 year-old females, 40% are 18-45 year-old males, 10% are 46-64 year-old females, 20% are 46-64 year-old males, 5% are 65+ year-old females, and 5% are 65+ year old males). As a result, a demographic distribution represents probabilities or likelihoods that a consumer of media (e.g., one of the members 112, 114, 116 of the household 102) matches particular demographic dimensions of interest. [0022] To enable the AMEs to collect such consumption data, the AMEs typically provide panelist households with meter(s) that monitor media presentation devices (e.g., televisions, stereos, speakers, computers, portable devices, gaming consoles, and/or online media presentation devices, etc.) of the household. Note: Tuning data corresponds to Return path data and panelist household corresponds to return path data household); the second set of features associated with the first one of the return path data households ([0075] For example, in the first row of the feature matrix that is associated with the first panelist household. Note: First panelist household corresponds to the first one of the return path data households. First row of the feature matrix  corresponds to first set of features that are reduced into second set of features), the view blocks of the first set of features ([0049] For example, each tuning event of the tuning data 108 is identified by a channel (e.g., ABC, NBC, USA Network, Comedy Central, NBCSports, HGTV, etc.) and a time (e.g., a particular time such as 7:10 A.M. or 8:31 P.M., a predetermined time-period segment such as 7:00-7:15 A.M. or 8:00-8:30 P.M., etc.) associated with the tuning event. [0075] For example, in the first row of the feature matrix that is associated with the first panelist household. Note: First panelist household corresponds to the first one of the return path data households. First row of the feature matrix corresponds to the first set of features. The tuning event example corresponds to a view block);
 and assign one or more demographic categories to respective ones of the return path data households based on the predicted demographic classification probabilities ([0056] For example, for a tuning event of the tuning data 108 associated with “Premier League Live” on NBCSports at 7:30 A.M. on Sunday, the distribution calculator 124 collects demographics data associated with panelists who viewed the same channel (i.e., NBCSports) at substantially the same time (e.g., 7:32 A.M. on Sunday) and calculates a demographic distribution for those panelists (e.g., 20% are 18-45 year-old females, 40% are 18-45 year-old males, 10% are 46-64 year-old females, 20% are 46-64 year-old males, 5% are 65+ year-old females, and 5% are 65+ year old males). As a result, a demographic distribution represents probabilities or likelihoods that a consumer of media (e.g., one of the members 112, 114, 116 of the household 102) matches particular demographic dimensions of interest. For example, a person who views “Premier League Live” on NBCSports at 7:30 A.M. on Sunday is 20% likely to be 18-45 year-old female, 40% likely to be a 18-45 year-old male, 10% likely to be a 46-64 year-old female, 20% likely to be a 46-64 year-old male, 5% likely to be a 65+ year-old female, and 5% likely to be a 65+ year old male. [0058] Based on the tuning data 108, the demographics distributions associated with respective tuning events and/or the average demographics distribution of the panelists, the characteristic estimator 126 estimates household characteristics of the household 102 such as (1) a number of members of the household 102 (e.g., three household members 112, 114, 116) and (2) the demographics of each of the estimated household members (e.g., the demographics of each of the members 112, 114, 116)).
However, Sullivan does not explicitly disclose: to include a channel change rate that is based on a ratio of (i) a number of channel changes that occurred during the first one of the time intervals to (ii) a duration of the first one of the time intervals; the neural network including a time distributed dense layer and a merge layer, the time distributed dense layer to reduce the first set of features into a second set of features less in number than the first set of features, the time distributed dense layer including a set of weights to map the first set of features into the second set of features, the merge layer to combine a total number of tuners known to be included in the first one of the return path data households with a first feature vector based on the second set of features to determine a second merged feature vector.
Eldering teaches, in an analogous system: to include a channel change rate that is based on a ratio of (i) a number of channel changes that occurred during the first one of the time intervals to (ii) a duration of the first one of the time intervals ([0112] The viewing characteristics can identify such attributes as channel change rate, dwell time, etc. Moreover, the viewing characteristics may be broken out by day or day part).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the demographic estimation system of Sullivan to incorporate the teachings of Eldering to include a channel change rate that is based on a ratio of (i) a number of channel changes that occurred during the first one of the time intervals to (ii) a duration of the first one of the time intervals. One would have been motivated to do this modification because doing so would give the benefit of further defining the viewing characteristics as taught by Eldering paragraph [0112].
Morfi teaches, in an analogous system: the neural network including a time distributed dense layer, the time distributed dense layer to reduce the first set of features into a second set of features less in number than the first set of features ([Page 4, Section 2.2.1] Neural Network Architecture. [Page 5, Paragraph 1] Next we apply time distributed dense layers to reduce feature-length dimensionality. Note: Reduce feature-length dimensionality corresponds to reducing the first set of features into a second set of features less in number than the first set of features), the time distributed dense layer including a set of weights to map the first set of features into the second set of features ([Page 5, Paragraph 1] Next we apply time distributed dense layers to reduce feature-length dimensionality. [Page 6, Paragraph 2] Training is performed by updating the network weights).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Sullivan and Eldering to incorporate the teachings of Morfi to use a neural network including a time distributed dense layer to reduce the first set of features into a second set of features less in number than the first set of features and the time distributed dense layer including a set of weights to map the first set of features into the second set of features. One would have been motivated to do this modification because doing so would give the benefit of reducing feature-length dimensionality as taught by Morfi [Page 5, Paragraph 1].
Zheng teaches, in an analogous system: and a merge layer ([Page 3, Column 2, Section: Architectures, Paragraph 2] the Merge layer);
to combine with a first feature vector based on the second set of features to determine a second merged feature vector ([Page 4, Figure 1, Paragraph 1] we merge these three vectors together as the new feature vector).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Sullivan, Eldering, and Morfi to incorporate the teachings of Zheng to use a merge layer and to combine with a first feature vector based on the second set of features to determine a second merged feature vector. One would have been motivated to do this modification because doing so would give the benefit of decreasing the dimensions for this new feature vector as taught by Zheng [Page 4, Figure 1, Paragraph 1].
Harvey teaches, in an analogous system: a total number of tuners known to be included in the first one of the return path data households ([Abstract]  Data may also be collected from various types of metering devices. [0064] As applied herein with regard to certain embodiments, the terms "tuning" and "viewing" may be used interchangeably. Also, the terms "viewership," "viewing," and "viewer" can be defined as television usage, for example, as measured by household and DSTB tuning records. [0211] For example, a panelist might agree to employ or participate with a certain threshold number of data measurement devices or identification tools in exchange for receiving the incentive. Note: Metering devices and DSTB (digital set top boxes) corresponds to Tuners. Tuning records corresponds to Return path data and panelist corresponds to return path data household).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Sullivan, Eldering, Morfi, and Zheng to incorporate the teachings of Harvey to use a total number of tuners known to be included in the first one of the return path data households. One would have been motivated to do this modification because doing so would give the benefit of the data being matched for calculating metrics as taught by Harvey [Abstract].

Regarding claim 12
The system of Sullivan, Eldering, Morfi, Zheng, and Harvey teaches: The computer readable medium of claim 8 (as shown above).
Sullivan further teaches: wherein the instructions cause the processor to solve an objective function subject to a set of constraints to assign the one or more demographic categories to the respective ones of the return path data households, the objective function based on the predicted demographic classification probabilities ([0055] For example, the demographic distribution calculated by the example distribution calculator 124 identifies a count or percentage of panelists who consumed the media associated with the tuning event are of demographic constraints of interest (e.g., constraints of an age/gender demographic dimension, a race dimension, an income dimension, and/or an education dimension, etc.). (Note: Identifying a count or percentage corresponds to an objective function). [0056] For example, for a tuning event of the tuning data 108 associated with “Premier League Live” on NBCSports at 7:30 A.M. on Sunday, the distribution calculator 124 collects demographics data associated with panelists who viewed the same channel (i.e., NBCSports) at substantially the same time (e.g., 7:32 A.M. on Sunday) and calculates a demographic distribution for those panelists (e.g., 20% are 18-45 year-old females, 40% are 18-45 year-old males, 10% are 46-64 year-old females, 20% are 46-64 year-old males, 5% are 65+ year-old females, and 5% are 65+ year old males). As a result, a demographic distribution represents probabilities or likelihoods that a consumer of media (e.g., one of the members 112, 114, 116 of the household 102) matches particular demographic dimensions of interest. For example, a person who views “Premier League Live” on NBCSports at 7:30 A.M. on Sunday is 20% likely to be 18-45 year-old female, 40% likely to be a 18-45 year-old male, 10% likely to be a 46-64 year-old female, 20% likely to be a 46-64 year-old male, 5% likely to be a 65+ year-old female, and 5% likely to be a 65+ year old male).

Regarding claim 15
Sullivan teaches: A demographic estimation method comprising: generating, by executing an instruction with a processor, features from return path data reported from set-top boxes associated with return path data households ([0090] In this example, the machine readable instructions comprise a program for execution by a processor such as the processor 1112 shown in the example processor platform 1100 discussed below in connection with FIG. 11. [0003] collect tuning data from set-top boxes of panelist households. [0075] the illustrated example constructs feature matrices associated with the respective training group and testing group of the panelist households. An example feature matrix constructed by the decision tree trainer 208 includes rows associated respective panelist households and columns associated with respective household features. Additionally or alternatively, some columns of example feature matrices are associated with other household characteristics (e.g., a total number of minutes consumed by the household, a number of minutes consumed by the household per predetermined time-period segments (e.g. per quarter-hours of the day), a number of STBs within a household, etc.). Note: Tuning data corresponds to Return path data and panelist household corresponds to return path data household), the features including a first set of features associated with a first one of the return path data households ([0075] For example, in the first row of the feature matrix that is associated with the first panelist household. Note: First panelist household corresponds to the first one of the return path data households),  the first set of features including a set of view blocks determined from the return path data reported by a first one of the set-top boxes associated with the first one of the return path data households, respective ones of the view blocks to be associated with respective different time intervals, a first one of the view blocks corresponding to a first one of the time intervals ([0049] For example, each tuning event of the tuning data 108 is identified by a channel (e.g., ABC, NBC, USA Network, Comedy Central, NBCSports, HGTV, etc.) and a time (e.g., a particular time such as 7:10 A.M. or 8:31 P.M., a predetermined time-period segment such as 7:00-7:15 A.M. or 8:00-8:30 P.M., etc.) associated with the tuning event. [0075] For example, in the first row of the feature matrix that is associated with the first panelist household. Note: First panelist household corresponds to the first one of the return path data households. First row of the feature matrix corresponds to the first set of features. The tuning event example corresponds to a view block and the time-period segments corresponds to the different time intervals)
implementing, by executing an instruction with the processor, a neural network to process the features generated from the return path data to predict demographic classification probabilities for the return path data households, the neural network to be trained based on panel data reported from meters that monitor media devices associated with panelist households ([0086] alternative examples of the household estimator 210 utilize other forms of machine learning (e.g., neural networks, etc.) to estimate the demographics of the household 102. In such examples, the decision tree trainer 208 and/or another machine learning trainer constructs the corresponding machine learning classifier (e.g., neural networks) utilized to estimate the demographics of the household 102. [0003] collect tuning data from set-top boxes of panelist households. [0056] For example, for a tuning event of the tuning data 108 associated with “Premier League Live” on NBCSports at 7:30 A.M. on Sunday, the distribution calculator 124 collects demographics data associated with panelists who viewed the same channel (i.e., NBCSports) at substantially the same time (e.g., 7:32 A.M. on Sunday) and calculates a demographic distribution for those panelists (e.g., 20% are 18-45 year-old females, 40% are 18-45 year-old males, 10% are 46-64 year-old females, 20% are 46-64 year-old males, 5% are 65+ year-old females, and 5% are 65+ year old males). As a result, a demographic distribution represents probabilities or likelihoods that a consumer of media (e.g., one of the members 112, 114, 116 of the household 102) matches particular demographic dimensions of interest. [0022] To enable the AMEs to collect such consumption data, the AMEs typically provide panelist households with meter(s) that monitor media presentation devices (e.g., televisions, stereos, speakers, computers, portable devices, gaming consoles, and/or online media presentation devices, etc.) of the household. Note: Tuning data corresponds to Return path data and panelist household corresponds to return path data household); the second set of features associated with the first one of the return path data households ([0075] For example, in the first row of the feature matrix that is associated with the first panelist household. Note: First panelist household corresponds to the first one of the return path data households. First row of the feature matrix  corresponds to first set of features that are reduced into second set of features), the view blocks of the first set of features ([0049] For example, each tuning event of the tuning data 108 is identified by a channel (e.g., ABC, NBC, USA Network, Comedy Central, NBCSports, HGTV, etc.) and a time (e.g., a particular time such as 7:10 A.M. or 8:31 P.M., a predetermined time-period segment such as 7:00-7:15 A.M. or 8:00-8:30 P.M., etc.) associated with the tuning event. [0075] For example, in the first row of the feature matrix that is associated with the first panelist household. Note: First panelist household corresponds to the first one of the return path data households. First row of the feature matrix corresponds to the first set of features. The tuning event example corresponds to a view block).
and assigning, by executing an instruction with the processor, one or more demographic categories to respective ones of the return path data households based on the predicted demographic classification probabilities ([0056] For example, for a tuning event of the tuning data 108 associated with “Premier League Live” on NBCSports at 7:30 A.M. on Sunday, the distribution calculator 124 collects demographics data associated with panelists who viewed the same channel (i.e., NBCSports) at substantially the same time (e.g., 7:32 A.M. on Sunday) and calculates a demographic distribution for those panelists (e.g., 20% are 18-45 year-old females, 40% are 18-45 year-old males, 10% are 46-64 year-old females, 20% are 46-64 year-old males, 5% are 65+ year-old females, and 5% are 65+ year old males). As a result, a demographic distribution represents probabilities or likelihoods that a consumer of media (e.g., one of the members 112, 114, 116 of the household 102) matches particular demographic dimensions of interest. For example, a person who views “Premier League Live” on NBCSports at 7:30 A.M. on Sunday is 20% likely to be 18-45 year-old female, 40% likely to be a 18-45 year-old male, 10% likely to be a 46-64 year-old female, 20% likely to be a 46-64 year-old male, 5% likely to be a 65+ year-old female, and 5% likely to be a 65+ year old male. [0058] Based on the tuning data 108, the demographics distributions associated with respective tuning events and/or the average demographics distribution of the panelists, the characteristic estimator 126 estimates household characteristics of the household 102 such as (1) a number of members of the household 102 (e.g., three household members 112, 114, 116) and (2) the demographics of each of the estimated household members (e.g., the demographics of each of the members 112, 114, 116)).
However, Sullivan does not explicitly disclose: to include a channel change rate that is based on a ratio of (i) a number of channel changes that occurred during the first one of the time intervals to (ii) a duration of the first one of the time intervals; the neural network including a time distributed dense layer to reduce the first set of features into a second set of features less in number than the first set of features, the time distributed dense layer including a set of weights to map the... first set of features into the second set of features.
Eldering teaches, in an analogous system: to include a channel change rate that is based on a ratio of (i) a number of channel changes that occurred during the first one of the time intervals to (ii) a duration of the first one of the time intervals ([0112] The viewing characteristics can identify such attributes as channel change rate, dwell time, etc. Moreover, the viewing characteristics may be broken out by day or day part).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the demographic estimation system of Sullivan to incorporate the teachings of Eldering to include a channel change rate that is based on a ratio of (i) a number of channel changes that occurred during the first one of the time intervals to (ii) a duration of the first one of the time intervals. One would have been motivated to do this modification because doing so would give the benefit of further defining the viewing characteristics as taught by Eldering paragraph [0112].
Morfi teaches, in an analogous system: the neural network including a time distributed dense layer, the time distributed dense layer to reduce the first set of features into a second set of features less in number than the first set of features ([Page 4, Section 2.2.1] Neural Network Architecture. [Page 5, Paragraph 1] Next we apply time distributed dense layers to reduce feature-length dimensionality. Note: Reduce feature-length dimensionality corresponds to reducing the first set of features into a second set of features less in number than the first set of features), the time distributed dense layer including a set of weights to map the first set of features into the second set of features ([Page 5, Paragraph 1] Next we apply time distributed dense layers to reduce feature-length dimensionality. [Page 6, Paragraph 2] Training is performed by updating the network weights).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Sullivan and Eldering to incorporate the teachings of Morfi to use a neural network including a time distributed dense layer to reduce the first set of features into a second set of features less in number than the first set of features and the time distributed dense layer including a set of weights to map the first set of features into the second set of features. One would have been motivated to do this modification because doing so would give the benefit of reducing feature-length dimensionality as taught by Morfi [Page 5, Paragraph 1].
Zheng teaches, in an analogous system: and a merge layer ([Page 3, Column 2, Section: Architectures, Paragraph 2] the Merge layer);
to combine with a first feature vector based on the second set of features to determine a second merged feature vector ([Page 4, Figure 1, Paragraph 1] we merge these three vectors together as the new feature vector).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Sullivan, Eldering, and Morfi to incorporate the teachings of Zheng to use a merge layer and to combine with a first feature vector based on the second set of features to determine a second merged feature vector. One would have been motivated to do this modification because doing so would give the benefit of decreasing the dimensions for this new feature vector as taught by Zheng [Page 4, Figure 1, Paragraph 1].
Harvey teaches, in an analogous system: a total number of tuners known to be included in the first one of the return path data households ([Abstract]  Data may also be collected from various types of metering devices. [0064] As applied herein with regard to certain embodiments, the terms "tuning" and "viewing" may be used interchangeably. Also, the terms "viewership," "viewing," and "viewer" can be defined as television usage, for example, as measured by household and DSTB tuning records. [0211] For example, a panelist might agree to employ or participate with a certain threshold number of data measurement devices or identification tools in exchange for receiving the incentive. Note: Metering devices and DSTB (digital set top boxes) corresponds to Tuners. Tuning records corresponds to Return path data and panelist corresponds to return path data household).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Sullivan, Eldering, Morfi, and Zheng to incorporate the teachings of Harvey to use a total number of tuners known to be included in the first one of the return path data households. One would have been motivated to do this modification because doing so would give the benefit of the data being matched for calculating metrics as taught by Harvey [Abstract].

Regarding claim 19
The system of Sullivan, Eldering, Morfi, Zheng, and Harvey teaches: The method of claim 15 (as shown above).
Sullivan further teaches:  wherein the assigning of the one or more demographic categories to the respective ones of the return path data households includes solving an objective function subject to a set of constraints to assign the one or more demographic categories to the respective ones of the return path data households, the objective function based on the predicted demographic classification probabilities ([0055] For example, the demographic distribution calculated by the example distribution calculator 124 identifies a count or percentage of panelists who consumed the media associated with the tuning event are of demographic constraints of interest (e.g., constraints of an age/gender demographic dimension, a race dimension, an income dimension, and/or an education dimension, etc.). (Note: Identifying a count or percentage corresponds to an objective function). [0056] For example, for a tuning event of the tuning data 108 associated with “Premier League Live” on NBCSports at 7:30 A.M. on Sunday, the distribution calculator 124 collects demographics data associated with panelists who viewed the same channel (i.e., NBCSports) at substantially the same time (e.g., 7:32 A.M. on Sunday) and calculates a demographic distribution for those panelists (e.g., 20% are 18-45 year-old females, 40% are 18-45 year-old males, 10% are 46-64 year-old females, 20% are 46-64 year-old males, 5% are 65+ year-old females, and 5% are 65+ year old males). As a result, a demographic distribution represents probabilities or likelihoods that a consumer of media (e.g., one of the members 112, 114, 116 of the household 102) matches particular demographic dimensions of interest. For example, a person who views “Premier League Live” on NBCSports at 7:30 A.M. on Sunday is 20% likely to be 18-45 year-old female, 40% likely to be a 18-45 year-old male, 10% likely to be a 46-64 year-old female, 20% likely to be a 46-64 year-old male, 5% likely to be a 65+ year-old female, and 5% likely to be a 65+ year old male).

Claims 2, 3, 9, 10, 16, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Sullivan et al (US 20170064358 A1) in view of Eldering et al (US 20030149975 A1), Morfi et al (Deep Learning on Low-Resource Datasets, 2018), Zheng et al (A Deep Learning Approach for Expert Identification in Question Answering Communities, 2017), Harvey et al (US 20110288907 A1) and further in view of Zhu et al (US 20180253637 A1.
Regarding claim 2
The system of Sullivan, Eldering, Morfi, Zheng, and Harvey teaches: The demographic estimation system of claim 1 (as shown above), 
Sullivan further teaches: wherein the neural network includes ([0097] and/or another machine learning trainer may construct a machine learning classifier other than a decision tree classifier (e.g., neural networks, support vector machines, a clustering mechanism, Bayesian networks) based on the data of the panelist households):
the predicted demographic classification probabilities associated with the first one of the return path data households ([0056] calculates a demographic distribution for those panelists (e.g., 20% are 18-45 year-old females, 40% are 18-45 year-old males, 10% are 46-64 year-old females, 20% are 46-64 year-old males, 5% are 65+ year-old females, and 5% are 65+ year old males). As a result, a demographic distribution represents probabilities. [0075] first panelist household. Note: First Panelist household corresponds to first one of the return path data households).
However, the system of Sullivan and Eldering does not explicitly disclose: a recurrent neural network layer to process the second set of features to determine the first feature vector; a hidden layer to process the second merged feature vector; and an output layer in communication with the hidden layer to output. 
Morfi teaches, in an analogous system: a recurrent neural network layer to process the second set of features to determine the first feature vector ([Page 4, Section 2.2.1, Paragraph 1] we use a state-of-the-art stacked convolutional and recurrent neural network architecture. Note: Also see Table 1 showing Time distributed dense layer).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Sullivan and Eldering to incorporate the teachings of Morfi to use a recurrent neural network layer to process the second set of features to determine the first feature vector. One would have been motivated to do this modification because doing so would give the benefit of sequentially producing the labels as taught by Morfi [Page 4, Last Paragraph].
Zhu teaches, in an analogous system: a hidden layer to process the second merged feature vector ([0034] Each single RNN layer, such as hidden layer 1 at 420 and layer 2 at 422, outputs 128 state output at each timestamp, yielding in total 56×128 output. The hidden layers basically the trained layer that applies learned rules to the input data to reach a prediction); 
and an output layer in communication with the hidden layer to output ([0034] Each single RNN layer, such as hidden layer 1 at 420 and layer 2 at 422, outputs 128 state output. Only on the final RNN layer, output layer 425 one may take the 128 state output at the final timestamp).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of the system of Sullivan, Eldering, and Morfi to incorporate the teachings of Zhu to use a hidden layer to process the second merged feature vector and an output layer in communication with the hidden layer to output. One would have been motivated to do this modification because doing so would give the benefit of using hidden layers of the RNN as basically the trained layer that applies learned rules to the input data to reach a prediction as taught by Zhu [0034].

Regarding claim 3
The system of Sullivan, Eldering, Morfi, Zheng, Harvey, and Zhu teaches: The demographic estimation system of claim 2 (as shown above). 
Sullivan further teaches: the first one of the view blocks corresponding to the first one of the time intervals is to identify the first one of the time intervals and media sources tuned by the first one of the set-top boxes during the first one of the time intervals ([0024] Because collecting information from panelist households can be difficult and costly, AMEs and other entities interested in measuring media/audiences have begun to collect information from other sources such as set-top boxes. tuning data” refers to information pertaining to tuning events (e.g., a STB being turned on or off, channel changes, volume changes, tuning duration times, etc.) of a STB. [0049] The STB 110 of the illustrated example collects and/or records tuning data associated with tuning events of the STB 110 and/or the media presentation device 112 (e.g., turning the STB 110 on or off, changing the channel presented via the media presentation device 112, increasing or lowering the volume, remaining on a channel for a duration of time, etc). For example, each tuning event of the tuning data 108 is identified by a channel (e.g., ABC, NBC, USA Network, Comedy Central, NBCSports, HGTV, etc.) and a time (e.g., a particular time such as 7:10 A.M. or 8:31 P.M., a predetermined time-period segment such as 7:00-7:15 A.M. or 8:00-8:30 P.M., etc.) associated with the tuning event. [0075] the first panelist household. Note: Panelist household corresponds to return path data household. The example corresponds to a view block in which the channel corresponds to the media source tuned and the time-period segment corresponds to the time intervals).

Regarding claim 9
The system of Sullivan, Eldering, Morfi, Zheng, and Harvey teaches: The computer readable medium of claim 8, wherein the instructions cause the processor to (as shown above):
Sullivan further teaches: the predicted demographic classification probabilities associated with the first one of the return path data households ([0056] calculates a demographic distribution for those panelists (e.g., 20% are 18-45 year-old females, 40% are 18-45 year-old males, 10% are 46-64 year-old females, 20% are 46-64 year-old males, 5% are 65+ year-old females, and 5% are 65+ year old males). As a result, a demographic distribution represents probabilities. [0075] first panelist household. Note: First Panelist household corresponds to first one of the return path data households).
However, the system of Sullivan and Eldering does not explicitly disclose: implement a recurrent neural network layer to process the second set of features to determine the first feature vector; implement a hidden layer to process the second merged feature vector; and implement an output layer in communication with the hidden layer to output.
Morfi teaches, in an analogous system: implement a recurrent neural network layer to process the second set of features to determine the first feature vector ([Page 4, Section 2.2.1, Paragraph 1] we use a state-of-the-art stacked convolutional and recurrent neural network architecture. Note: Also see Table 1 showing Time distributed dense layer).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Sullivan and Eldering to incorporate the teachings of Morfi to use a recurrent neural network layer to process the second set of features to determine the first feature vector. One would have been motivated to do this modification because doing so would give the benefit of sequentially producing the labels as taught by Morfi [Page 4, Last Paragraph].
Zhu teaches, in an analogous system: implement a hidden layer to process the second merged feature vector ([0034] Each single RNN layer, such as hidden layer 1 at 420 and layer 2 at 422, outputs 128 state output at each timestamp, yielding in total 56×128 output. The hidden layers basically the trained layer that applies learned rules to the input data to reach a prediction); 
and implement an output layer in communication with the hidden layer to output ([0034] Each single RNN layer, such as hidden layer 1 at 420 and layer 2 at 422, outputs 128 state output. Only on the final RNN layer, output layer 425 one may take the 128 state output at the final timestamp).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of the system of Sullivan, Eldering, Morfi, Zheng, and Harvey to incorporate the teachings of Zhu to use a hidden layer to process the second merged feature vector and an output layer in communication with the hidden layer to output. One would have been motivated to do this modification because doing so would give the benefit of using hidden layers of the RNN as basically the trained layer that applies learned rules to the input data to reach a prediction as taught by Zhu [0034].

Regarding claim 10
The system of Sullivan, Eldering, Morfi, Zheng, Harvey, and Zhu teaches: The computer readable medium of claim 9 (as shown above). 
Sullivan further teaches: wherein the first one of the view blocks corresponding to the first one of the time intervals is to identify the first one of the time intervals and media sources tuned by the first one of the set-top boxes during the first one of the time intervals ([0024] Because collecting information from panelist households can be difficult and costly, AMEs and other entities interested in measuring media/audiences have begun to collect information from other sources such as set-top boxes. tuning data” refers to information pertaining to tuning events (e.g., a STB being turned on or off, channel changes, volume changes, tuning duration times, etc.) of a STB. [0049] The STB 110 of the illustrated example collects and/or records tuning data associated with tuning events of the STB 110 and/or the media presentation device 112 (e.g., turning the STB 110 on or off, changing the channel presented via the media presentation device 112, increasing or lowering the volume, remaining on a channel for a duration of time, etc). For example, each tuning event of the tuning data 108 is identified by a channel (e.g., ABC, NBC, USA Network, Comedy Central, NBCSports, HGTV, etc.) and a time (e.g., a particular time such as 7:10 A.M. or 8:31 P.M., a predetermined time-period segment such as 7:00-7:15 A.M. or 8:00-8:30 P.M., etc.) associated with the tuning event. [0075] the first panelist household. Note: Panelist household corresponds to return path data household. The example corresponds to a view block in which the channel corresponds to the media source tuned and the time-period segment corresponds to the time intervals).

Regarding claim 16
The system of Sullivan, Eldering, Morfi, Zheng, and Harvey teaches: The method of claim 15 (as shown above), 
Sullivan further teaches: wherein the implementing of the neural network includes ([0097] and/or another machine learning trainer may construct a machine learning classifier other than a decision tree classifier (e.g., neural networks, support vector machines, a clustering mechanism, Bayesian networks) based on the data of the panelist households):
the predicted demographic classification probabilities associated with the first one of the return path data households (calculates a demographic distribution for those panelists (e.g., 20% are 18-45 year-old females, 40% are 18-45 year-old males, 10% are 46-64 year-old females, 20% are 46-64 year-old males, 5% are 65+ year-old females, and 5% are 65+ year old males). As a result, a demographic distribution represents probabilities  [0056]. first panelist household. Note: First Panelist household corresponds to first one of the return path data households).
However, Sullivan does not explicitly disclose: implementing a recurrent neural network layer to process the second set of features to determine the first feature vector; implementing a hidden layer to process the second merged feature vector; and implementing an output layer in communication with the hidden layer to output.
Morfi teaches, in an analogous system: implementing a recurrent neural network layer to process the second set of features to determine the first feature vector ([Page 4, Section 2.2.1, Paragraph 1] we use a state-of-the-art stacked convolutional and recurrent neural network architecture. Note: Also see Table 1 showing Time distributed dense layer).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Sullivan and Eldering to incorporate the teachings of Morfi to use a recurrent neural network layer to process the second set of features to determine the first feature vector. One would have been motivated to do this modification because doing so would give the benefit of sequentially producing the labels as taught by Morfi [Page 4, Last Paragraph].
Zhu teaches, in an analogous system: implementing a hidden layer to process the second merged feature vector ([0034] Each single RNN layer, such as hidden layer 1 at 420 and layer 2 at 422, outputs 128 state output at each timestamp, yielding in total 56×128 output. The hidden layers basically the trained layer that applies learned rules to the input data to reach a prediction); 
and implementing an output layer in communication with the hidden layer to output ([0034] Each single RNN layer, such as hidden layer 1 at 420 and layer 2 at 422, outputs 128 state output. Only on the final RNN layer, output layer 425 one may take the 128 state output at the final timestamp).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of the system of Sullivan, Eldering, and Morfi to incorporate the teachings of Zhu to use a hidden layer to process the second merged feature vector and an output layer in communication with the hidden layer to output. One would have been motivated to do this modification because doing so would give the benefit of using hidden layers of the RNN as basically the trained layer that applies learned rules to the input data to reach a prediction as taught by Zhu [0034].

Regarding claim 17
The system of Sullivan, Eldering, Morfi, Zheng, Harvey, and Zhu teaches: The method of claim 16 (as shown above). 
Sullivan further teaches: wherein the first one of the view blocks corresponding to the first one of the time intervals is to identify the first one of the time intervals and media sources tuned by the first one of the set-top boxes during the first one of the time intervals ([0024] Because collecting information from panelist households can be difficult and costly, AMEs and other entities interested in measuring media/audiences have begun to collect information from other sources such as set-top boxes. tuning data” refers to information pertaining to tuning events (e.g., a STB being turned on or off, channel changes, volume changes, tuning duration times, etc.) of a STB. [0049] The STB 110 of the illustrated example collects and/or records tuning data associated with tuning events of the STB 110 and/or the media presentation device 112 (e.g., turning the STB 110 on or off, changing the channel presented via the media presentation device 112, increasing or lowering the volume, remaining on a channel for a duration of time, etc). For example, each tuning event of the tuning data 108 is identified by a channel (e.g., ABC, NBC, USA Network, Comedy Central, NBCSports, HGTV, etc.) and a time (e.g., a particular time such as 7:10 A.M. or 8:31 P.M., a predetermined time-period segment such as 7:00-7:15 A.M. or 8:00-8:30 P.M., etc.) associated with the tuning event. [0075] the first panelist household. Note: Panelist household corresponds to return path data household. The example corresponds to a view block in which the channel corresponds to the media source tuned and the time-period segment corresponds to the time intervals).


Claims 6, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Sullivan et al (US 20170064358 A1) in view of Eldering et al (US 20030149975 A1) and further in view of Morfi et al (Deep Learning on Low-Resource Datasets, 2018), Zheng et al (A Deep Learning Approach for Expert Identification in Question Answering Communities, 2017), Harvey et al (US 20110288907 A1), and Trovero et al (US 8112302 B1).
Regarding claim 6
The system of Sullivan, Eldering, Morfi, Zheng, and Harvey teaches: The demographic estimation system of claim 5 (as shown above).
Sullivan further teaches: wherein: a second one of the constraints is to constrain respective ones of different possible household sizes assigned across the return path data households and of the respective ones of the different possible household sizes specified by the service provider associated with the return path data ([0027] As used herein, groupings within a characteristic (e.g., a household characteristic) are referred to as “household features,” “features” or “predictors.” Example features include demographic constraints, groupings of a “number of household members” household characteristic (e.g., a “one-member household” feature, a “two-member household” feature). [0033] In some example methods, estimating the household characteristic includes estimating a number of household members of the household and a demographic of a household member. In some such examples methods, estimating the demographic of the household member includes determining a marginal of a demographic dimension for the household member. The demographic dimension includes the first demographic constraint and the second demographic constraint. Note: Number of household members corresponds to the household size); 
and a third one of the constraints is to constrain respective numbers of demographic categories assigned to the respective ones of the return path data households to correspond to the respective household sizes assigned to the respective ones of the return path data households ([0026] Example household characteristics include a number of household members, demographics of the household members, a number of television sets within the household. [0027] groupings of a “number of household members” household characteristic (e.g., a “one-member household” feature, a “two-member household” feature), groupings of a “number of television sets” household characteristic (e.g., a “one-television household” feature, a “two-television household” feature, etc.). [0058] Based on the tuning data 108, the demographics distributions associated with respective tuning events and/or the average demographics distribution of the panelists, the characteristic estimator 126 estimates household characteristics of the household 102 such as (1) a number of members of the household 102 (e.g., three household members 112, 114, 116) and (2) the demographics of each of the estimated household members (e.g., the demographics of each of the members 112, 114, 116). Thus, to measure a size and composition of media audiences, the characteristic estimator 126 of the example AME 104 analyzes the tuning data 108 of the household 102 and the demographics and consumption data of the panelist households to estimate the household characteristic of the household).
However, the system of Sullivan, Eldering, Morfi, Zheng, and Harvey does not explicitly disclose: a first one of the constraints is to constrain respective ones of the demographic categories assigned across the return path data households to produce sums that correspond to respective total estimates for the respective ones of the demographic categories specified by a service provider associated with the return path data; to sum to respective total numbers.
Trovero teaches, in an analogous system: a first one of the constraints is to constrain respective ones of the categories assigned across the data to produce sums that correspond to respective total estimates for the respective ones of the categories specified by a service provider; to produce sums that correspond to respective total numbers ([Column 1, Lines 44-51] As an illustration, the sales of a particular product by a retail company is the sum of the sales of the same product in all stores belonging to the company. However, imposing such constraints during the forecasting process can be difficult or impossible. Therefore, the series are often forecast independently at different levels so that the resulting forecasts do not abide by the constraints binding the original series. Note: The concept of sum of sales in all stores corresponds to sum to total).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Sullivan, Eldering, Morfi, Zheng, and Harvey to incorporate the teachings of Trovero to use the concept of sum of sales in all stores. One would have been motivated to do this modification because doing so would give the benefit of enterprises having their data organized hierarchically as taught by Trovero [Column 1, Lines 44-51].

Regarding claim 13
The system of Sullivan, Eldering, Morfi, Zheng, and Harvey teaches: The computer readable medium of claim 12 (as shown above).
Sullivan further teaches: wherein: a second one of the constraints is to constrain respective ones of different possible household sizes assigned across the return path data households and of the respective ones of the different possible household sizes specified by the service provider associated with the return path data ([0027] As used herein, groupings within a characteristic (e.g., a household characteristic) are referred to as “household features,” “features” or “predictors.” Example features include demographic constraints, groupings of a “number of household members” household characteristic (e.g., a “one-member household” feature, a “two-member household” feature). Note: Number of household members corresponds to the household size); 
and a third one of the constraints is to constrain respective numbers of demographic categories assigned to the respective ones of the return path data households to correspond to the respective household sizes assigned to the respective ones of the return path data households ([0026] Example household characteristics include a number of household members, demographics of the household members, a number of television sets within the household. [0027] groupings of a “number of household members” household characteristic (e.g., a “one-member household” feature, a “two-member household” feature), groupings of a “number of television sets” household characteristic (e.g., a “one-television household” feature, a “two-television household” feature, etc.). [0058] Based on the tuning data 108, the demographics distributions associated with respective tuning events and/or the average demographics distribution of the panelists, the characteristic estimator 126 estimates household characteristics of the household 102 such as (1) a number of members of the household 102 (e.g., three household members 112, 114, 116) and (2) the demographics of each of the estimated household members (e.g., the demographics of each of the members 112, 114, 116). Thus, to measure a size and composition of media audiences, the characteristic estimator 126 of the example AME 104 analyzes the tuning data 108 of the household 102 and the demographics and consumption data of the panelist households to estimate the household characteristic of the household).
However, the system of Sullivan, Eldering, Morfi, Zheng, and Harvey does not explicitly disclose: a first one of the constraints is to constrain respective ones of the demographic categories assigned across the return path data households to produce sums that correspond to respective total estimates for the respective ones of the demographic categories specified by a service provider associated with the return path data; to produce sums that correspond to respective total numbers.
Trovero teaches, in an analogous system: a first one of the constraints is to constrain respective ones of the categories assigned across the data to produce sums that correspond to respective total estimates for the respective ones of the categories specified by a service provider; to produce sums that correspond to respective total numbers ([Column 1, Lines 44-51] As an illustration, the sales of a particular product by a retail company is the sum of the sales of the same product in all stores belonging to the company. However, imposing such constraints during the forecasting process can be difficult or impossible. Therefore, the series are often forecast independently at different levels so that the resulting forecasts do not abide by the constraints binding the original series. Note: The concept of sum of sales in all stores corresponds to sum to total).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Sullivan, Eldering, Morfi, Zheng, and Harvey to incorporate the teachings of Trovero to use the concept of sum of sales in all stores. One would have been motivated to do this modification because doing so would give the benefit of enterprises having their data organized hierarchically as taught by Trovero [Column 1, Lines 44-51].



Regarding claim 20
The system of Sullivan, Eldering, Morfi, Zheng, and Harvey teaches: The method of claim 19 (as shown above).
Sullivan further teaches: wherein: a second one of the constraints is to constrain respective ones of different possible household sizes assigned across the return path data households of the respective ones of the different possible household sizes specified by the service provider associated with the return path data ([0027] As used herein, groupings within a characteristic (e.g., a household characteristic) are referred to as “household features,” “features” or “predictors.” Example features include demographic constraints, groupings of a “number of household members” household characteristic (e.g., a “one-member household” feature, a “two-member household” feature). Note: Number of household members corresponds to the household size); 
and a third one of the constraints is to constrain respective numbers of demographic categories assigned to the respective ones of the return path data households to correspond to the respective household sizes assigned to the respective ones of the return path data households ([0026] Example household characteristics include a number of household members, demographics of the household members, a number of television sets within the household. [0027] groupings of a “number of household members” household characteristic (e.g., a “one-member household” feature, a “two-member household” feature), groupings of a “number of television sets” household characteristic (e.g., a “one-television household” feature, a “two-television household” feature, etc.). [0058] Based on the tuning data 108, the demographics distributions associated with respective tuning events and/or the average demographics distribution of the panelists, the characteristic estimator 126 estimates household characteristics of the household 102 such as (1) a number of members of the household 102 (e.g., three household members 112, 114, 116) and (2) the demographics of each of the estimated household members (e.g., the demographics of each of the members 112, 114, 116). Thus, to measure a size and composition of media audiences, the characteristic estimator 126 of the example AME 104 analyzes the tuning data 108 of the household 102 and the demographics and consumption data of the panelist households to estimate the household characteristic of the household).
However, the system of Sullivan, Eldering, Morfi, Zheng, and Harvey does not explicitly disclose: a first one of the constraints is to constrain respective ones of the demographic categories assigned across the return path data households to produce sums that correspond to respective total estimates for the respective ones of the demographic categories specified by a service provider associated with the return path data; to produce sums that correspond to respective total numbers.
Trovero teaches, in an analogous system: a first one of the constraints is to constrain respective ones of the categories assigned across the data to produce sums that correspond to respective total estimates for the respective ones of the categories specified by a service provider; to produce sums that correspond to respective total numbers ([Column 1, Lines 44-51] As an illustration, the sales of a particular product by a retail company is the sum of the sales of the same product in all stores belonging to the company. However, imposing such constraints during the forecasting process can be difficult or impossible. Therefore, the series are often forecast independently at different levels so that the resulting forecasts do not abide by the constraints binding the original series. Note: The concept of sum of sales in all stores corresponds to sum to total).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Sullivan, Eldering, Morfi, Zheng, and Harvey to incorporate the teachings of Trovero to use the concept of sum of sales in all stores. One would have been motivated to do this modification because doing so would give the benefit of enterprises having their data organized hierarchically as taught by Trovero [Column 1, Lines 44-51].

Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Sullivan et al (US 20170064358 A1) in view of Eldering et al (US 20030149975 A1), Morfi et al (Deep Learning on Low-Resource Datasets, 2018), Zheng et al (A Deep Learning Approach for Expert Identification in Question Answering Communities, 2017), Harvey et al (US 20110288907 A1), and further in view of Mezard et al (US 7036720 B2).
Regarding claim 7
The system of Sullivan, Eldering, Morfi, Zheng, and Harvey teaches: The demographic estimation system of claim 5 (as shown above).
Sullivan further teaches: wherein the demographic categories correspond to respective second sets of demographic categories assigned to the respective return path data households, and the at least one processor is ([0127] FIG. 11 is a block diagram of an example processor) on respective first sets of demographic categories assigned to the respective return path data households to determine the second sets of demographic categories ([0019] As used herein, each grouping of a demographic dimension is referred to as a “demographic marginal” (also referred to herein as a “demographic group” and/or a “demographic bucket”). For example, a “gender” demographic dimension includes a “male” demographic marginal and a “female” demographic marginal. Note: Gender corresponds to the first set of demographic category; "male" and "female" correspond to the second set of demographic categories).
However, the system of Sullivan, Eldering, Morfi, Zheng, and Harvey does not explicitly disclose: includes: to perform a simulated annealing procedure.
Mezard teaches, in an analogous system: to perform a simulated annealing procedure ([Column 3, lines 13, 14] a simulated annealing procedure is initiated).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Sullivan, Eldering, Morfi, Zheng, and Harvey to incorporate the teachings of Mezard to use a simulated annealing procedure. One would have been motivated to do this modification because doing so would give the benefit of initiating a local search procedure in order to resolve the remaining set of unpolarized variables and their constraints as taught by Mezard [Column 3, lines 12-15].

Regarding claim 14
The system of Sullivan, Eldering, Morfi, Zheng, and Harvey teaches: The computer readable medium of claim 12 (as shown above).
Sullivan further teaches: wherein the demographic categories correspond to respective second sets of demographic categories assigned to the respective return path data households, and the instructions cause the processor to ([0127] FIG. 11 is a block diagram of an example processor) on respective first sets of demographic categories assigned to the respective return path data households to determine the second sets of demographic categories ([0019] As used herein, each grouping of a demographic dimension is referred to as a “demographic marginal” (also referred to herein as a “demographic group” and/or a “demographic bucket”). For example, a “gender” demographic dimension includes a “male” demographic marginal and a “female” demographic marginal. Note: Gender corresponds to the first set of demographic category; "male" and "female" correspond to the second set of demographic categories).
However, the system of Sullivan, Eldering, Morfi, Zheng, and Harvey does not explicitly disclose: perform a simulated annealing procedure.
Mezard teaches, in an analogous system: perform a simulated annealing procedure ([Column 3, lines 13, 14] a simulated annealing procedure is initiated).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Sullivan, Eldering, Morfi, Zheng, and Harvey to incorporate the teachings of Mezard to use a simulated annealing procedure. One would have been motivated to do this modification because doing so would give the benefit of initiating a local search procedure in order to resolve the remaining set of unpolarized variables and their constraints as taught by Mezard [Column 3, lines 12-15].

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
McMillan (US 20150181269 A1) discloses METHODS AND APPARATUS TO VERIFY AND/OR CORRECT MEDIA LINEUP INFORMATION.

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHAITANYA RAMESH JAYAKUMAR whose telephone number is (571)272-3369. The examiner can normally be reached Mon-Fri 7am-1pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on (571)272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/C.R.J./Examiner, Art Unit 2128                                                                                                                                                                                                        
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128