DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-5, 8-17, 19-21, 28, and 29 is/are rejected under 35 U.S.C. 102(a)(1) and (a)(2) as being anticipated by Sharma et al. (U.S. Patent No. 7,921,036), hereinafter “Sharma”, as cited in the IDS filed 24 February 2020.
Regarding claim 1, Sharma teaches:
A computer-implemented method for viewing verification (See the Abstract.) comprising: 
obtaining a plurality of images of an individual captured concurrently with an electronic display presenting one or more screen images (See Fig. 1 and Col. 11, lines 22-25: “The present invention is directed to the playing of media (images, video, etc.) that is appropriate for the people who are either looking at digital signage 101 or in the view of the camera system or sensor 100.”); 
obtaining a plurality of image classifiers for facial (See a support vector machine (SVM) applied to face images to classify gender in Col. 12, lines 56-64; SVM applied to face images to classify age in Col. 12, lines 65-67 and Col. 13, lines 1-11; neural network face detector that extracts skin color in Col. 13, lines 38-45; neural network face detector that extracts hair color in Col. 13, lines 46-54; see Col. 19, lines 10-14: “The tracked faces are fed to the facial expression recognition 253 step to recognize the changes in facial features, from which the emotional state of each of person in the audience is estimated.”) and head pose analysis (See Col. 19, lines 7-10: “The face detection 225 step finds any human faces from the images, and the face tracking 226 step individually tracks them and estimates the facial poses.”); 
analyzing the plurality of images, using one or more processors, to identify a face of the individual in one of the plurality of images, wherein the one of the plurality of images contains an image of the face captured while the individual is facing the electronic display and wherein the analyzing is accomplished using one or more image classifiers from the plurality of image classifiers (See Col. 14, lines 4-11: “After the features 104 of the detected person are received 135, the PR next determines which person in the person database best matches the newly detected person. The set of features that represent each person is composed into a feature vector. In the exemplary embodiment, a metric can be used (e.g., a dot product or Hausdorff) to determine how well two feature vectors match. The feature vector that provides the best match is chosen.”); and 
calculating a viewing verification metric using the plurality of image classifiers (See Fig. 19, media response score 257 and Col. 19, lines 26-33: “The media response analysis ultimately computes the media response in the form of an overall media response score 257 for the played media content, that summarizes its effectiveness by combining the changes in emotional state 244, the degree of attention, and the viewing duration. For example, if the emotional state changed from neutral to positive or if the degree of attention and the viewing duration increased, the resulting media response score should improve.”) wherein the calculating evaluates a verified viewing duration of the screen images by the individual based on the plurality of images and the analyzing (See Col. 19, lines 10-18: “The tracked faces are fed to the facial expression recognition 253 step to recognize the changes in facial features, from which the emotional state of each of person in the audience is estimated. Both the estimated pose from the face tracking 226 step and the recognized facial expression are used to estimate the person's degree of attention 255 to the played media. The viewing duration 254 can also be derived from the estimated facial pose.”).

Regarding claim 2, Sharma teaches:
The method of claim 1 further comprising determining an engagement score based on the analyzing (See Col. 19, lines 14-17: “Both the estimated pose from the face tracking 226 step and the recognized facial expression are used to estimate the person's degree of attention 255 to the played media.”).

Regarding claim 3, Sharma teaches:
The method of claim 2 further comprising determining an emotional response score based on the analyzing (Col. 19, lines 10-14: “The tracked faces are fed to the facial expression recognition 253 step to recognize the changes in facial features, from which the emotional state of each of person in the audience is estimated.”).

Regarding claim 4, Sharma teaches:
The method of claim 1 further comprising analyzing an identity of the individual based on the face of the individual (See Figs. 2-3 and Col. 14, lines 4-11: “After the features 104 of the detected person are received 135, the PR next determines which person in the person database best matches the newly detected person. The set of features that represent each person is composed into a feature vector. In the exemplary embodiment, a metric can be used (e.g., a dot product or Hausdorff) to determine how well two feature vectors match. The feature vector that provides the best match is chosen.”).

Regarding claim 5, Sharma teaches:
The method of claim 4 further comprising tracking viewing by the individual using the identity and updating the viewing verification metric (See Fig. 11 and Col. 16, lines 44-61: “The person classifier continuously receives 135 data from the person detector. After the features of the detected person are received 135, the PR next determines which person in the person database best matches the newly detected person. The set of features that represent each person are composed into a feature vector. In the exemplary embodiment, a metric can be used (e.g., a dot product or Hausdorff) to determine how well two feature vectors match. The feature vector that provides the best match is chosen. If the data is for an existing person, then the data for the existing person in the person DB is updated 138 with the event. If the data is for a new person, then the person is added to the person DB 112, and the event data warehouse 402 is updated accordingly. The statistics of the people who have been influenced or captured by the particular embodiment of the invention can be gathered over a period of time. In either case, the PR notifies the next processing modules, such as the person classifier 105 and the person tracker 300, that a new event has occurred 139.” Then see person detector 103 in Fig. 13 that updates the viewing duration and other scores that comprise the media response score.).

Regarding claim 8, Sharma teaches:
The method of claim 1 wherein the obtaining is in response to tags associated with media rendered on the electronic display (See Col. 17, lines 18-27: “In this example, the set of aggregated values for the gender attribute will consist of both values, e.g., female and male tags, as values for the gender attribute. If the values of the gender attribute show more female members than male members in the group of people, the gender attribute for the group can be represented by "female," since the majority of people are female, and the media can be customized in such a way to serve women, based on the media selection rule for a female audience.”).

Regarding claim 9, Sharma teaches:
The method of claim 1 wherein the viewing verification metric is used in determining a viewership score (See “media selection rules” that are based on the media response score in Fig. 7 and Col. 15, lines 48-56: “The media feedback processor 400 gathers a media-independent audience profile from the person classifier 105, media response data from the person tracker 300, and play log data from the media player 150, and then associates the three types of data to compute the media selection rules 431 that assign weights to each content, depending on which audience segment positively responded to a specific content. This data is then sent to a database for the media selection rules 431 and to the media feedback warehouse 401.”).

Regarding claim 10, Sharma teaches:
The method of claim 1 wherein the electronic display renders an object and the viewing verification metric includes scoring viewing of the object (See Col. 9, lines 30-36: “1) playing random content on the means for playing content 2) automatically analyzing the response(s) of the people to the randomly played content, 3) scoring the content based on the response analysis according to demographic, behavioral, and emotional attributes of the people, and 4) playing a matching content for the people, based on the score according to the demographic, behavioral, and emotional attributes.”).

Regarding claim 11, Sharma teaches:
The method of claim 1 wherein the viewing verification metric of the individual enables determining viewability of digital media content from the electronic display (See Fig. 19, media response score 257 and Col. 19, lines 26-33: “The media response analysis ultimately computes the media response in the form of an overall media response score 257 for the played media content, that summarizes its effectiveness by combining the changes in emotional state 244, the degree of attention, and the viewing duration. For example, if the emotional state changed from neutral to positive or if the degree of attention and the viewing duration increased, the resulting media response score should improve.”).

Regarding claim 12, Sharma teaches:
The method of claim 11 wherein viewability includes evaluation of presence of the digital media content and whether the digital media content is viewable by the individual (See Fig. 19, media response score 257 and Col. 19, lines 26-33: “The media response analysis ultimately computes the media response in the form of an overall media response score 257 for the played media content, that summarizes its effectiveness by combining the changes in emotional state 244, the degree of attention, and the viewing duration. For example, if the emotional state changed from neutral to positive or if the degree of attention and the viewing duration increased, the resulting media response score should improve.”).

Regarding claim 13, Sharma teaches:
The method of claim 11 further comprising modifying the digital media content based on the viewing verification metric (See Col. 9, lines 34-39: “3) scoring the content based on the response analysis according to demographic, behavioral, and emotional attributes of the people, and 4) playing a matching content for the people, based on the score according to the demographic, behavioral, and emotional attributes.”).

Regarding claim 14, Sharma teaches:
The method of claim 11 wherein the determining includes scoring the digital media content (See Col. 9, lines 34-36: “3) scoring the content based on the response analysis according to demographic, behavioral, and emotional attributes of the people).

Regarding claim 15, Sharma teaches:
The method of claim 1 wherein the one or more image classifiers is used to evaluate head pose orientation for the individual (See Col. 19, lines 7-10: “The face detection 225 step finds any human faces from the images, and the face tracking 226 step individually tracks them and estimates the facial poses.” The face detection step 225 are based on the image classifiers described in Col. 12, lines 27-46.).

Regarding claim 16, Sharma teaches:
The method of claim 1 further comprising performing eye gaze detection using the plurality of image classifiers (See Col. 8, lines 8-14: “It also extracts the relevant features of each person, including visually perceptible attributes. The features include, but are not limited to, the following: gender, age range, gaze characteristics, height, hair color, skin color, clothing, and time spent in front of the display.”).

Regarding claim 17, Sharma teaches:
The method of claim 1 wherein the analyzing the plurality of images is accomplished without eye tracking (See Fig. 13, where the person tracker module does not use eye tracking. There is no disclosure of eye tracking.).

Regarding claim 19, Sharma teaches:
The method of claim 1 wherein the analyzing is used as part of a viewership determination across a plurality of people (See Figs. 20-23, where the analyzing is used to determine appropriate content that will increase viewership among audiences.).

Regarding claim 20, Sharma teaches:
The method of claim 1 further comprising: obtaining a second plurality of images of a second individual; analyzing the second plurality of images, using the one or more (See Col. 8, lines 57-66: “The present invention captures a plurality of images for an individual or people using a single or a plurality of means for capturing images. A single or a plurality of face images of the individual or the people are detected from the plurality of images. The present invention automatically extracts visually perceptible attributes of the individual or the people from the single or plurality of face images. In the exemplary embodiment, the visually perceptible attributes comprise demographic information, local behavior analysis, and emotional status.”).

Regarding claim 21, Sharma teaches:
The method of claim 20 further comprising combining the viewing verification metric for the individual with the viewing verification metric for the second individual into an aggregated viewing verification metric (See Col. 8, lines 57-66: “The present invention captures a plurality of images for an individual or people using a single or a plurality of means for capturing images. A single or a plurality of face images of the individual or the people are detected from the plurality of images. The present invention automatically extracts visually perceptible attributes of the individual or the people from the single or plurality of face images. In the exemplary embodiment, the visually perceptible attributes comprise demographic information, local behavior analysis, and emotional status.” Since these attributes are used to determine the media response score, the score can be calculated for an individual or multiple people (i.e., an aggregated viewing verification metric).).

Sharma teaches claim 28 for the reasons given in the treatment of claim 1. Sharma further teaches:
A computer program product embodied in a non-transitory computer readable medium for viewing verification, the computer program product comprising code which causes one or more processors to perform operations of: (See the Abstract and Col. 8, lines 52-54, CPU implies a computer program product embodied in a non-transitory computer readable medium.)

Sharma teaches claim 29 for the reasons given in the treatment of claim 1. Sharma further teaches:
A computer system for viewing verification comprising: a memory which stores instructions; and one or more processors coupled to the memory, wherein the one or more processors, when executing the instructions which are stored, are configured to: (See the Abstract and Col. 8, lines 52-54, CPU and implied memory.) 







Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 6 and 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sharma (U.S. Patent No. 7,921,036) in view of Mehra (U.S. Pub. No. 2012/0218398).
Claim 6 is met by the combination of Sharma and Mehra, wherein
Sharma teaches:
The method of claim 1 wherein 
Sharma does not appear to disclose the following; however, Mehra teaches:
the calculating the viewing verification metric, using the plurality of image classifiers, evaluates an amount of time the individual looks away from the electronic display while the electronic display shows the one or more screen images (See [0085]: “The present system can be used to disqualify or select an image having a subject whose eyes are closed, and can take multiple images to prevent having no images that lack blinking or to ensure that the person is not just looking away for a very short time such that no action will be initiated without the person looking away for at least a threshold time.”).
Motivation to combine:
Sharma and Mehra together teach the limitations of claim 6. Mehra is directed to a similar field of art (control of an application after determining user inattentiveness to a display). Therefore, Sharma and Mehra are combinable. Modifying the system and method of Sharma by adding the capability of evaluating an amount of time the individual looks away from the electronic display while the electronic display shows the one or more screen images, as taught by Mehra, would yield the expected and predictable result of ensuring the user is inattentive before an action is taken. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine Sharma and Mehra in this way.

Claim 7 is met by the combination of Sharma and Mehra, wherein
Sharma teaches:
The method of claim 1 wherein 
Sharma does not appear to disclose the following; however, Mehra teaches:
the calculating the viewing verification metric, using the plurality of image classifiers, evaluates an amount of time eyes are closed for the individual while the electronic display shows the one or more screen images (See [0086]-[0087]: “These may be detected when a person looks below a camera position for a somewhat long time appearing as a squint or as a person who has fallen asleep, versus a person looking below the camera for just a second appearing as a blink…The description herein generally refers to handling a scene wherein an object person appears to be blinking, squinting or sleeping (e.g., looking below the camera for different periods of time) or has eyes wide open (e.g., looking above the camera).”).
Motivation to combine:
See the motivation to combine in the treatment of claim 6.



Claim 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sharma (U.S. Patent No. 7,921,036) in view of Biebesheimer et al. (U.S. Pub. No. 2004/0181457), hereinafter “Biebesheimer”, as cited in the IDS filed 24 February 2020.
Claim 22 is met by the combination of Sharma and Biebesheimer, wherein
Sharma teaches:
The method of claim 1 further comprising 
Sharma does not appear to disclose the following; however, Miller teaches:
opting in by the individual for collection of the plurality of images (See [0048], [0056], [0072], and [0110].).
Motivation to combine:
Sharma and Biebesheimer together teach the limitations of claim 22. Biebesheimer is directed to a similar field of art (facial expression and response monitoring). Therefore, Sharma and Biebesheimer are combinable. Modifying the system and method of Sharma by adding the capability of opting in by the individual for collection of the plurality of images, as taught by Biebesheimer, would yield the expected and predictable result of ensuring privacy for individuals who do not opt-in. Therefore, it would have been obvious to a person ordinary skill in the art before the effective filing date of the claimed invention to combine Sharma and Biebesheimer in this way.


Claims 25-27 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sharma (U.S. Patent No. 7,921,036) in view of Jiang et al. (Learning Visual Attention to Identify People with Autism Spectrum Disorder, 2017, ICCV, Pages 3267-3276), hereinafter “Jiang”.
Claim 25 is met by the combination of Sharma and Jiang, wherein
Sharma teaches:
The method of claim 1 wherein 
Sharma does not appear to disclose the following; however, Jiang teaches:
the calculating is performed using deep learning (See Fig. 1, use of a deep neural network in calculation of attention/viewing behavior metric in the form of Autism Spectrum Disorder (ASD) risk. Then see Fig. 2 and explanation of network architecture, which includes convolutional layers and is accordingly a convolutional neural network, in the paragraph bridging the left and right columns on page 3270. Further see the first paragraph on page 3270, where the neural network is trained using features describing viewing behavior such as fixation duration.).
Motivation to combine:
Sharma and Jiang together teach the limitations of claim 25. Jiang is directed to a similar field of art (calculating a score related to user attention to content presented a display). Therefore, Sharma and Jiang are combinable. Modifying the system and method of Sharma by adding the capability of performing the calculating [of a viewing verification metric using the plurality of image classifiers wherein the calculating evaluates a verified viewing duration of the screen images by the individual based on the plurality of images and the analyzing] using deep learning, as taught by Jiang, would yield the expected and predictable result of improved performance in evaluation of viewing behavior by taking advantage of the feature learning of convolutional neural networks. Therefore, it would have been obvious to a person of ordinary 

Claim 26 is met by the combination of Sharma and Jiang, wherein
The combination of Sharma and Jiang teaches:
The method of claim 25 wherein 
And Jiang further teaches:
the deep learning is performed using a deep neural network (See Fig. 1, use of a deep neural network in calculation of attention/viewing behavior metric in the form of Autism Spectrum Disorder (ASD) risk.).
Motivation to combine:
See the motivation to combine in the treatment of claim 25.
 
Claim 27 is met by the combination of Sharma and Jiang, wherein
The combination of Sharma and Jiang teaches:
The method of claim 25 wherein 
And Jiang further teaches:
the deep learning is performed using a convolutional neural network (See Fig. 1, use of a deep neural network in calculation of attention/viewing behavior metric in the form of Autism Spectrum Disorder (ASD) risk. Then see Fig. 2 and explanation of network architecture, which includes convolutional layers and is accordingly a convolutional neural network, in the paragraph bridging the left and right columns on page 3270. Further see the first paragraph on page 3270, where the neural network is trained using features describing viewing behavior such as fixation duration.).
Motivation to combine:
See the motivation to combine in the treatment of claim 25.




Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHAN S LEE whose telephone number is (571)272-1981. The examiner can normally be reached 11 AM - 7 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached on 571-272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Jonathan S Lee/Primary Examiner, Art Unit 2661