DETAILED ACTION
Notice to Applicant

1.            The following is a FINAL office action upon examination of application number 16/238,084. Claims 1-20 are pending in the application and have been examined on the merits discussed below.

2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment

3.	In the response filed May 14, 2021, Applicant amended claims 1-3, 7-9, 13-16, and 20, and did not cancel any claims. No new claims were presented for examination. 

4.	Applicant’s amendments to claim 7 to reflect the markings required for identifying changes made to the claim relative to its previous version are hereby acknowledged.

5.	The claim rejections under 35 U.S.C. 101 were previously withdrawn [See Office Action, 03/01/2021].

Response to Arguments

6.	Applicant's arguments filed May 14, 2021, have been fully considered.

7.	Applicant submits “Yuxin et al. fails to disclose “an authentication component that, based on detection of a discrepancy within the survey count: in response to the discrepancy exceeding 

	In response to Applicant’s argument that “Yuxin et al. fails to disclose “an authentication component that, based on detection of a discrepancy within the survey count: in response to the discrepancy exceeding a threshold: performs a secondary level processing of information associated with the audience, and adjusts the survey count based on the secondary level processing of the information; and in response to the discrepancy not exceeding a threshold, ignoring the discrepancy” as recited in claim 1, and similarly claims 7 and 14,” it is noted that Yuxin is not asserted as teaching the disputed limitation. Applicant's arguments with respect to the §103 rejection of claims 1, 7, and 14 have been considered, but are primarily raised in support of the amendments to independent claims 1/7/14, and are therefore believed to be fully addressed via the new ground of rejection set forth under §103 in the instant office action, which incorporates a new reference, to address the amended limitations in claims 1/7/14 and support the §103 rejection of the amended claims.

8.	Applicant submits “The cited paragraphs, as well as the remainder of Dudley et al., also fail to disclose “an authentication component that, based on detection of a discrepancy within the survey count: in response to the discrepancy exceeding a threshold: performs a secondary level processing of information associated with the audience, and adjusts the survey count based on the secondary level processing of the information; and in response to the discrepancy not exceeding a threshold, ignoring the discrepancy” as recited in claim 1, and similarly claims 7 and 14.” [Applicant’s Remarks, 05/14/2021, page 10]



9.	Applicant submits “Sarin et al. also is silent regarding “an authentication component that, based on detection of a discrepancy within the survey count: in response to the discrepancy exceeding a threshold: performs a secondary level processing of information associated with the audience, and adjusts the survey count based on the secondary level processing of the information; and in response to the discrepancy not exceeding a threshold, ignoring the discrepancy” as recited in claim 1, and similarly claims 7 and 14.” [Applicant’s Remarks, 05/14/2021, page 12]

	In response to Applicant’s argument that “Sarin et al. also is silent regarding “an authentication component that, based on detection of a discrepancy within the survey count: in response to the discrepancy exceeding a threshold: performs a secondary level processing of information associated with the audience, and adjusts the survey count based on the secondary level processing of the information; and in response to the discrepancy not exceeding a threshold, 

10.	Applicant submits “Yuxin et al., Dudley et al., Sarin et al., and Lee et al., alone or in combination, do not disclose, teach or suggest each and every element of the subject claims.”  [Applicant’s Remarks, 05/14/2021, page 13]

In response to Applicant’s argument that “Yuxin et al., Dudley et al., Sarin et al., and Lee et al., alone or in combination, do not disclose, teach or suggest each and every element of the subject claims,” it is noted that this argument is a mere allegation of patentability by the Applicant with no supporting rationale or explanation. Merely stating that the claims do not teach a feature does not offer any insight as to why the specific sections of the prior art relied upon by the Examiner fail to disclose the claimed features. Applicant's arguments amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references. The Examiner further notes that Applicant's arguments with respect to the §103 rejection of claims 1, 7, and 14 have been considered, but are primarily raised in support of the amendments to independent claims 1/7/14, and are therefore believed to be fully addressed via the new ground of rejection set forth under §103 in the instant office action.



Claim Rejections - 35 USC § 103

12.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

13.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

14.	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or    nonobviousness.




16.	Claims 1, 4-5, 7, 10-11, 13-14, 17-18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Yuxin et al., Pub. No.: US 2010/0207874 A1, [hereinafter Yuxin], in view of Sarin et al., Pub. No.: US 2013/0124207 A1, [hereinafter Sarin], in further view of Soundararajan et al., Pub. No.: US 2015/0189378 A1, [hereinafter Soundararajan].

As per claim 1, Yuxin teaches a system, comprising: a first camera (abstract, discussing an interactive content delivery system includes means for displaying visual content to a group of individuals, means for detecting a collective gesture of the group, and a control system, coupled to the means for displaying visual content and the means for detecting a collective gesture; paragraph 0012, discussing that as shown in FIG. 1, the video camera computer can be interconnected to the display computer, allowing feedback and analysis from the video camera computer to be used by the display computer. The display computer can also provide feedback to the video camera computer regarding camera settings to allow the change of focus, zoom, field of view, and physical orientation of the camera, if the mechanisms to do such are associated with the camera. The camera computer can include an input/output device, having a keyboard, monitor, and other input and output devices for allowing direct input and output of data to and from the camera computer…; paragraph 0029, discussing that once some display output is provided, the system then captures an audience view with the imaging device(s). This step can 

a memory that stores computer executable components; and a processor that executes the computer executable components stored in the memory (abstract, discussing an interactive content delivery system includes means for displaying visual content to a group of individuals, means for detecting a collective gesture of the group, and a control system, coupled to the means for displaying visual content and the means for detecting a collective gesture. The control system is configured to modify the content in response to the collective gesture; paragraph 0015, discussing that the controller can be any type of personal computer, portable computer, or workstation computer that includes a processing unit, a system memory, and a system bus that couples the processing unit to the various components of the computer. The processing unit may include one or more processors, each of which may be in the form of any one of various commercially available processors. Generally, each processor receives instructions and data from a read-only memory and/or a random access memory. The controller can also include a hard drive, a floppy drive, and CD ROM drive that are connected to the system bus by respective interfaces…);

wherein the computer executable components comprise: an image capturing component to capture a first image of an audience (paragraph 0010, discussing that the system includes at least one imaging device (e.g. a camera) pointed at an audience (located in an audience area 16 that represents at least a portion of the field of view of the imaging device), and a video camera computer, interconnected to the imaging device and configured to run gesture detection and recognition algorithms. The video camera computer is a video image analysis computing device that is configured to analyze visual images taken by the imaging device. The imaging device can be configured to take video images (i.e. a series of sequential video frames that capture motion) 

an image processing component that processes the first image to determine a survey count, wherein the survey count indicates a number of times a survey image was identified in the first image (paragraph 0017, discussing that the display provides visual and/or audio-visual content to the audience. The content can be in the form of survey questions. This content can include requests, or other types of prompts for audience response in the form of some gesture. The gesture requested can be a certain motion of the body, including relatively subtle motions, such as a facial gesture or nod of the head, to more obvious gestures, such as raising or waving a hand, or moving the entire body in some way; paragraph 0018, discussing that in FIG. 2 the group 14 has been prompted to raise a hand in response to some query provided by the display system…; paragraph 0020, discussing that vision based gesture detection and recognition have been widely studied in the past decade. Vision-based detection captures an audience's view via image-processing techniques in order to detect and classify specific gestures. For example, audience participants positioned near the imaging device can collectively perform some type of individual gesture (e.g. raise a hand) in response to the content, or they can perform a collaborative gesture; paragraph 0021, discussing that using group gesture detection techniques that have been developed, the image analysis that the system performs can function in at least two basic ways. One mode of operation of the system is to measure the level of audience interest as a function of the audience response (e.g. looking for a majority response). For example, the   of a particular option that was then being offered. The displayed content can then be adapted or modified based upon this majority vote to provide content indicated by the audience response; paragraph 0033, discussing that the next step in the method is to recognize the collaborative behavior. This step can include aggregating or correlating the results or detecting similarity in gestures to recognize collaborative gestures/behaviors. In this step the system analyzes the input data and computes scores based on various detectors to rank the probability of the appearance of one or more collaborative gestures/behaviors... One type of collaborative gesture can be the raising of hands in the audience, and detecting this gesture using gesture recognition technologies. This can include the raise of one hand or two hands to deliver feedback from the audience regarding the displayed content, or a finger pointing gesture performed by multiple people at the same time.  This scenario can be similar to the "majority vote" approach, in which the number of raised hands is counted and considered as the voting behavior (i.e., the survey count indicates a number of times a survey image was identified in the first image); paragraph 0041, discussing that the proposed interactivity mechanism is deliberately designed to take the response from multiple users in collaboration and/or coordination and rewards the involvement of multiple users. For example, in one embodiment the system detects gestures from multiple individuals regardless of their appearance and identifies the similarities among these gestures present from different individuals. The system counts the number of users that present the collaborative gestures (i.e., survey count) and then delivers an appropriate response based on the number of users; paragraph 0030). 

an image capturing component, it does not explicitly teach that the image capturing component controls the first camera to capture a first image of an audience in response to detecting, via a microphone, an audio phrase that matches a survey phrase; and an authentication component that based on detection of a discrepancy within the survey count: in response to the discrepancy exceeding a threshold: performs a secondary level processing of information associated with the audience, and adjusts the survey count based on the secondary level processing of the information; and in response to the discrepancy not exceeding a threshold, ignoring the discrepancy. Sarin in the analogous art of image capturing applications teaches:

an image capturing component that, in response to detecting, via a microphone, an audio phrase that matches a survey phrase, controls the first camera to capture a first image of an audience (paragraph 0003, discussing a computing device (e.g., a smart phone, digital camera, or other device with image capture functionality) causes an image capture device to capture one or more digital images based on audio input (e.g., a voice command) received by the computing device. For example, a user speaks a word or phrase, and the user's voice is converted to audio input data by the computing device. The computing device compares (e.g., using an audio matching algorithm) the audio input data to an expected voice command associated with an image capture application, and determines whether the word or phrase spoken by the user matches the expected voice command; paragraph 0030, discussing that the mobile device includes a microphone and speaker, along with two proximity sensors, situated below the surface of the mobile device; paragraph 0038, discussing that a user can generate user input that affects camera control (i.e., control the first camera to capture a first image of an audience) and handling of digital image data. The user input can be, for example, voice input…; paragraph 0042, discussing that a voice command can be any word, phrase, or utterance... For example, in an image capture scenario, a user can speak a command while in front of the camera, behind the camera, beside the camera, wearing the camera, or in any other orientation. A voice command 

Yuxin is directed toward audience measurement systems. Sarin is directed toward tools for voice-controlled image capture operations. Therefore they are deemed to be analogous as 

The Yuxin-Sarin combination does not explicitly teach an authentication component that based on detection of a discrepancy within the survey count: in response to the discrepancy exceeding a threshold: performs a secondary level processing of information associated with the audience, and adjusts the survey count based on the secondary level processing of the information; and in response to the discrepancy not exceeding a threshold, ignoring the discrepancy. However, Soundararajan in the analogous art of audience measurement systems teaches these concepts. Soundararajan teaches:
an authentication component that based on detection of a discrepancy within the survey count: in response to the discrepancy exceeding a threshold: performs a secondary level processing of information associated with the audience (paragraph 0001, discussing that the disclosure relates generally to audience measurement, and, more particularly, to methods and apparatus to count people in an audience; paragraph 0024, discussing that  in some instances, the secondary presence data indicates that a person is not present in the media exposure environment of the primary media presentation device even though the person is logged in as an audience member via the people meter. In such instances, examples disclosed update and/or adjust the people meter data by, for example, removing the incorrectly counted audience member 

adjusts the survey count based on the secondary level processing of the information (paragraph 0041, discussing that in some examples, the first portable device 114 of FIG. 1 communicates the secondary data obtained by the first portable device 114 to the example base metering device 110 of FIG. 1…In the illustrated example of FIG. 1, the example base metering device 110 of FIG. 1 compares the primary presence information received from the example primary people meter 112 of FIG. 1 with the secondary presence data obtained via the example portable devices 114, 118 to, for example, confirm the accuracy of the data obtained via the 

in response to the discrepancy not exceeding a threshold, ignoring the discrepancy (paragraph 0083, discussing that the identity estimator 506 generates a probability or confidence level that any particular one of the persons 120, 122 associated with the house 100 corresponds to the person detected as near the portable device 114. In some such examples, where the probability of one of the persons 120, 122 exceeds a certain threshold and/or is larger than the probabilities of the other persons 120, 122, the example identity estimator 506 of FIG. 5 identifies that one particular person 120, 122 as the person detected as near the portable device 114. In some examples, if the probability of correspondence between the detected person and each of the persons 120, 122 is sufficiently low (e.g., below a threshold), the example identity estimator 506 identifies the detected person as likely corresponding to a visitor (i.e., in response to the discrepancy not exceeding a threshold, ignoring the discrepancy). That is, the example identity estimator 506 of FIG. 5 identifies the detected person as a visitor to the house 100, or identifies the person as someone of no consequence to the media exposure environment 102. In some examples, the secondary presence data collected by the portable device 114 indicates that more than one person is in proximity to the portable device 114. In such examples, the identity estimator 506 of FIG. 5 analyzes each such person in a similar manner as described above; paragraph 0087, discussing that the first four scenarios 602, 604, 606, 608 correspond to when a person is 

The Yuxin-Sarin combination is directed toward audience measurement systems and methods. Soundararajan is directed toward methods and apparatus to count people in an audience. Therefore they are deemed to be analogous as they both are directed towards audience measurement systems. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the Yuxin-Sarin combination to include an authentication component that based on detection of a discrepancy within the survey count: in response to the discrepancy exceeding a threshold: performs a secondary level processing of information associated with the audience, and adjusts the survey count based on the secondary level processing of the information; and in response to the discrepancy not exceeding a threshold, ignoring the discrepancy, as taught by Soundararajan, since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same functions as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. The combination provides a more comprehensive method by enhancing audience measurement data with additional information about the detected audience members, thereby associating the activity information of the person with other data to augment the data for further research analysis [Soundararajan, paragraph 0090].

As per claim 4, the Yuxin-Sarin-Soundararajan combination teaches the system of claim 1. Yuxin further teaches comprising: an audio processing component (paragraph 0010, discussing that the system includes at least one imaging device 12 (e.g. a camera) pointed at an audience, and a video camera computer, interconnected to the imaging device and configured to run gesture detection and recognition algorithms. The video camera computer is a video image analysis computing device that is configured to analyze visual images taken by the imaging device. The imaging device can be configured to take video images at any desired frame rate, or it can take still images; paragraph 0017, discussing that the display and the audio speaker (i.e., audio processing component) provides visual and/or audio-visual content to the audience. The content can be in the form of survey questions, or any other type of content. This content can include requests, or other types of prompts for audience response in the form of some gesture. These requests or prompts are for group responses. The gesture requested can be a certain motion of the body, including relatively subtle motions, such as a facial gesture or nod of the head, to more obvious gestures, such as raising or waving a hand, or moving the entire body in some way; claim 3, discussing an interactive content delivery system in accordance with claim 2, further comprising an audio broadcast device, synchronized with the video display device; paragraphs 0018, 0040).

Yuxin does not explicitly teach an audio processing component that processes one or more audio phrases captured by the microphone to determine if the one or more audio phrases comprises the audio phrase that matches the survey phrase. However, Sarin in the analogous art of image capturing applications teaches this concept (paragraph 0003, discussing a computing device (e.g., a smart phone, digital camera, or other device with image capture functionality) causes an image capture device to capture one or more digital images based on audio input (e.g., a voice command) received by the computing device. For example, a user speaks a word or phrase, and the user's voice is converted to audio input data by the computing device. The computing device compares (e.g., using an audio matching algorithm) the audio input data to an 

Yuxin is directed toward audience measurement analysis systems. Sarin is directed toward tools for voice-controlled image capture operations. Therefore they are deemed to be analogous as they both are directed towards audience measurement systems. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Yuxin to include an audio processing component that processes one or more audio phrases captured by the microphone to determine if the one or more audio phrases comprises the audio phrase that matches the survey phrase, as taught by Sarin since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same functions as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. The combination provides a more robust method by providing quick access to image-capture functionality based on audio command input received.

As per claim 5, the Yuxin-Sarin-Soundararajan combination teaches the system of claim 1. Yuxin further teaches further comprising: a report generating component that generates a report indicating the survey count associated with a survey phrase (paragraph 0033, discussing that the next step in the method is to recognize the collaborative behavior. This step can include aggregating or correlating the results or detecting similarity in gestures to recognize collaborative gestures/behaviors. In this step the system analyzes the input data and computes scores based 

Claims 7 and 14 recite substantially similar limitations that stand rejected via the art citations and rationale applied to claim 1, as discussed above. Further, as per claims 7 and 14 Yuxin teaches a computer implemented method and a computer program product that provides survey information using image processing, the computer program product comprising a computer readable storage medium having program instructions embodied (paragraph 0015, discussing that the controller can be any type of personal computer, portable computer, or workstation computer that includes a processing unit, a system memory, and a system bus that couples the processing unit to the various components of the computer. The processing unit may include one or more processors, each of which may be in the form of any one of various commercially available processors. Generally, each processor receives instructions and data from a read-only memory and/or a random access memory.  The controller can also include a hard drive, a floppy drive, and CD ROM drive that are connected to the system bus by respective interfaces…).

Claims 10 and 17 recite substantially similar limitations that stand rejected via the art citations and rationale applied to claim 4, as discussed above.
Claims 11 and 18 recite substantially similar limitations that stand rejected via the art citations and rationale applied to claim 5, as discussed above.

As per claim 13, the Yuxin-Sarin-Soundararajan combination teaches the computer implemented method of claim 7. Although not taught by the Yuxin-Sarin combination, Soundararajan teaches wherein the threshold is set at a value where an adjustment of the survey count would impact an outcome of a survey associated with the survey phrase (paragraph 0041, discussing that the example base metering device of FIG. 1 compares the primary presence information received from the example primary people meter of FIG. 1 with the secondary presence data obtained via the example portable devices to, for example, confirm the accuracy of the data obtained via the primary people meter. In some examples, if there is a discrepancy between the primary presence information gathered via the example primary people meter and the secondary presence information gathered via the portable devices, the example base metering device of FIG. 1 adjusts the primary presence information by removing incorrectly 

The Yuxin-Sarin combination is directed toward audience measurement systems and methods. Soundararajan is directed toward methods and apparatus to count people in an audience. Therefore they are deemed to be analogous as they both are directed towards audience measurement systems. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the Yuxin-Sarin combination to include a threshold set at a value where an adjustment of the survey count would impact an outcome of a survey associated with the survey phrase, as taught by Soundararajan, since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same functions as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. The combination provides a more comprehensive method by enhancing audience measurement data with additional information about the detected audience members, thereby associating the activity information of the person with other data to augment the data for further research analysis [Soundararajan, paragraph 0090].

Claim 20 recites substantially similar limitations that stand rejected via the art citations and rationale applied to claim 13, as discussed above.

s 2, 6, 8, 12, 15, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Yuxin in view of Sarin, in view of Soundararajan, in further view of Lee et al., Pub. No.: US 2014/0161305 A1, [hereinafter Lee].

As per claim 2, the Yuxin-Sarin-Soundararajan combination teaches the system of claim 1. Yuxin further teaches wherein the second level processing of information comprises processing an image of the audience (paragraph 0029, discussing that once some display output is provided, the system then captures an audience view with the imaging device(s). This step can involve capturing a single snapshot or a series of frames/video. It can involve capturing a view of the entire camera field of view, or only a portion of the field of view. Additionally, it is to be understood that multiple imaging devices can be used simultaneously to capture video images for processing; paragraph 0031, discussing that a variety of computer vision and image processing techniques may be performed either sequentially or in parallel to analyze the captured image data. For example, low-level detection algorithms such as foreground/silhouette extraction and background subtraction may first be formed to identify the sufficiently large and moving contiguous regions in the scene. The algorithms then proceed to track over time and adjust these regions to conform with the captured data. Other detection algorithms may be present to help identify simpler gestures/behaviors (i.e., second level processing) that together form a much more sophisticated one; paragraph 0010).

Yuxin does not explicitly teach wherein the second level processing of information comprises processing a second image of the audience. However, Lee in the analogous art of audience measurement systems teaches this concept (abstract, discussing methods and apparatus to monitor environments. An example method includes analyzing a plurality of three-dimensional data points having respective depth values representative of distances between a sensor and respective objects of an environment; when a first set of the 

The Yuxin-Sarin-Soundararajan combination is directed toward audience measurement systems. Lee is directed toward systems and methods for methods and apparatus to monitor environments. Therefore they are deemed to be analogous as they both are directed towards audience measurement systems. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the Yuxin-Sarin-Soundararajan combination to request the image processing component to conduct a second level processing that comprises processing a second image of the audience in response to detecting the discrepancy, as taught by Lee, since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same functions as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. The combination provides a more comprehensive method by verifying the tally of participant’s responses, thereby helping researchers to obtain an accurate count of responses.

As per claim 6, the Yuxin-Sarin-Soundararajan combination teaches the system of claim 1. Yuxin further teaches wherein the image capturing component controls a second camera to capture a second image of the audience (paragraph 0010, discussing that the system includes at least one imaging device (e.g. a camera) pointed at an audience, and a video camera computer, interconnected to the imaging device and configured to run gesture detection and recognition algorithms. The video camera computer is a video image analysis computing device that is configured to analyze visual images taken by the imaging device. The imaging device can be configured to take video images (i.e. a series of sequential video frames that capture motion) at any desired frame rate, or it can take still images (i.e., capture a second image of the audience); paragraph 0017, discussing that the display (and the audio speaker 22) provides visual and/or audio-visual content to the audience. The content can be in the form of survey questions, or any other type of content. This content can include requests, or other types of prompts for audience response in the form of some gesture. These requests or prompts are for group responses. The gesture requested can be a certain motion of the body, including relatively subtle motions, such as a facial gesture or nod of the head, to more obvious gestures, such as raising or waving a hand, or moving the entire body in some way…; paragraph 0018, discussing examples of certain group gestures that the system can recognize are depicted in FIGS. 2 and 3. In these examples, prompts for particular gestures are used... As one example, in FIG. 2 the group 14 has been prompted to raise a hand in response to some query provided by the display system. Whenever the system prompts a group response, several results are possible; paragraph 0029, discussing that once some display output is provided, the system then captures an audience view with the imaging device(s). This step can involve capturing a single snapshot or a series of frames/video. It can involve capturing a view of the entire camera field of view, or only a portion of the field of view. Additionally, it is to be understood that multiple imaging devices can be used simultaneously to capture video images for processing; paragraph 0040, discussing that multiple interactive display systems with collaborative gesture detection can be provided at different geographical 

wherein the image processing component processes the second image to determine the survey count (paragraph 0017, discussing that the display (and the audio speaker 22) provides visual and/or audio-visual content to the audience. The content can be in the form of survey questions, or any other type of content. This content can include requests, or other types of prompts for audience response in the form of some gesture. These requests or prompts are for group responses. The gesture requested can be a certain motion of the body, including relatively subtle motions, such as a facial gesture or nod of the head, to more obvious gestures, such as raising or waving a hand, or moving the entire body in some way; paragraph 0018, discussing that in FIG. 2 the group 14 has been prompted to raise a hand in response to some query provided by the display system. Whenever the system prompts a group response, several results are possible. Most of the participants are likely to respond in the manner requested and raise one hand, these group members being indicated at 28a. However, some participants may provide an alternate, though potentially qualifying response, such as by raising two hands; paragraph 0020, discussing that vision based gesture detection and recognition have been widely studied in the past decade. Vision-based detection captures an audience's view via image-processing techniques such as background subtraction, silhouette detection, etc., in order to detect and classify specific gestures. For example, audience participants positioned near the imaging device can collectively perform some type of individual gesture (e.g. raise a hand) in response to the content, or they can perform a collaborative gesture; paragraph 0021, discussing that using group gesture detection techniques that have been developed, the image analysis that the system performs can function in at least two basic ways. One mode of operation of the system is to 

Yuxin does not explicitly teach in response to detecting the audio phrase that matches the survey phrase. However, Sarin in the analogous art of image capturing applications teaches this concept (paragraph 0003, discussing a computing device (e.g., a smart phone, digital camera, or other device with image capture functionality) causes an image capture device to capture one or more digital images based on audio input received by the computing device. For example, a user speaks a word or phrase, and the user's voice is converted to audio input data by the computing device. The computing device compares (e.g., using an audio matching algorithm) the audio input data to an expected voice command associated with an image capture application, and determines whether the word or phrase spoken by the user matches the expected voice command (i.e., detecting the audio phrase that matches the survey phrase); paragraph 0038, discussing that a user can generate user input that affects camera control and handling of digital image data. The user input can be, for example, voice input…; paragraph 0042, discussing that a voice command can be any word, phrase, or utterance…; paragraph 0043, discussing that the mobile device can use audio matching algorithms to compare audio input with expected commands. When audio input is recognized as a sufficient match for an expected command by the mobile device, feedback can be provided to users. Such feedback can provide assurance to users that a voice command was recognized and that mobile device will take a photo momentarily (i.e., in response to detecting the audio phrase that matches the survey phrase); paragraph 0051, discussing that if the voice-controlled image capture tool is running on a mobile device in a locked/low-power state, the tool can listen for voice commands and, if a voice command is recognized, the mobile device can generate command events to cause one or more digital images 

Yuxin is directed toward audience measurement systems. Sarin is directed toward tools for voice-controlled image capture operations. Therefore they are deemed to be analogous as they both are directed towards audience measurement systems. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Yuxin to include controlling a second camera to capture a second image of the audience in response to detecting the audio phrase that matches the survey phrase, as taught by Sarin, since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same functions as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. The combination provides a more robust method by providing quick access to image-capture functionality based on audio command input received.

The Yuxin-Sarin-Soundararajan combination does not explicitly teach wherein the image processing component processes the second image to determine the survey count by counting how many times the survey image was matched in the first image and the second image. However, Lee in the analogous art of audience measurement system teaches this concept (paragraph 0057, discussing that when the example body recognizer 322 of FIG. 3 detects a body of a person in a particular frame of two-dimensional data, an indication of the recognition is conveyed to the example body counter and/or the example frame database. In the illustrated example of FIG. 3, the example body counter combines the body detections provided by the body recognizer 308 of the three-dimensional data analyzer 300 and the body detections provided by the body recognizer 322 of the two-dimensional data analyzer 302 corresponding to a common frame or frames to generate body counts for the respective frames (i.e.,  the second image is processed to determine the survey count by counting how many times the survey image was matched in the first image and the second image). In some examples, the body counter 316 also executes one or more filters and/or check functions to avoid counting the same person more than once. Further, the example body recognizer 322 calculates and/or stores a location of the detected person in, for example, the frame database in connection with a timestamp. In some examples, the location of the detected sent by the example body recognizer 322 is an x-y coordinate similar to the example coordinate 402 of FIG. 4; paragraph 0059, discussing that in some examples, the 2D-based recognition analys(es) performed by the example two-dimensional data analyzer 302 are initiated for each frame of data.  However, in the illustrated example of FIG. 3, the two-dimensional data analyzer 302 includes an analysis trigger 326 to determine whether one or more of the 2D-based recognition analyses (e.g., the analysis performed by the body recognizer 322 and/or the analysis performed by the face recognizer 324) are to be performed on a certain frame of data and/or what type of 2D-based recognition analysis is to be performed. In particular, the example analysis trigger 326 of FIG. 3 triggers 2D-based recognition analyses in 
The Yuxin-Sarin-Soundararajan combination is directed toward audience measurement systems. Lee is directed toward systems and methods for methods and apparatus to monitor environments. Therefore they are deemed to be analogous as they both are directed towards audience measurement systems. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the Yuxin-Sarin-Soundararajan combination to include processing the second image to determine the survey count by counting 

Claims 8 and 15 recite substantially similar limitations that stand rejected via the art citations and rationale applied to claim 2, as discussed above.
Claims 12 and 19 recite substantially similar limitations that stand rejected via the art citations and rationale applied to claim 6, as discussed above.

18.	Claims 3, 9, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Yuxin in view of Sarin, in view of Soundararajan, in further view of Liu, Pub. No.: US 2017/0255820 A1, [hereinafter Liu].

As per claim 3, the Yuxin-Sarin-Soundararajan combination teaches the system of claim 1. Yuxin further teaches wherein the information comprises secondary image information from the first image, wherein the secondary image information is not the survey image and is selected from a group consisting of facial data (paragraph 0033, discussing that the next step in the method is to recognize the collaborative behavior. This step can include aggregating or correlating the results or detecting similarity in gestures to recognize collaborative gestures/behaviors (i.e., information comprises secondary image information from the first image). In this step the system analyzes the input data and computes scores based on various detectors to rank the probability of the appearance of one or more collaborative gestures/behaviors. There are many examples of such collaborative gestures. One type of collaborative gesture can be the raising of hands in the 

The Yuxin-Sarin-Soundararajan combination does not explicitly teach wherein the secondary image information is selected from a group consisting of clothing data, and jewelry data. However, Liu in the analogous art of image analysis systems teaches this concept (paragraph 0014, discussing that embodiments of the invention use facial recognition technology or software, and some embodiments use appearance recognition technology (ART) or software.  As the term is used herein, ART uses not only traditional facial features (relatively unchanging 

The Yuxin-Sarin-Soundararajan combination is directed toward audience measurement systems. Liu is directed toward systems and methods for identification of individuals in images. Therefore they are deemed to be analogous as they both are directed towards image analysis systems. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the Yuxin-Sarin-Soundararajan combination to include secondary image information  selected from a group consisting of clothing data, and jewelry data, as taught by Liu, since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same functions as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were 

Claims 9 and 16 recite substantially similar limitations that stand rejected via the art citations and rationale applied to claim 3, as discussed above.

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
A.	Luff, Pub. No.: US 2010/0195865 A1 – describes methods and apparatus to count persons in a monitored environment.
B.	Pattikonda, Pub. No.: US 2008/0008360 A1 –  describes a system and method for counting people. 
C.	Ramaswamy et al., Pub. No.: US 2006/0062429 A1 –  describes methods and apparatus to count people appearing in an image.	
D.	Malik et al., Pub. No.: US 2009/0217315 A1 –  describes a method and system for audience measurement.
E.	Srinivasan et al., Pub. No.: US 2014/0254876 A1  –  relates generally for audience measurement and, more particularly, to methods and apparatus to count people in images. 
F.	Gall et al., Pub. No.: Pub. No.: US 2017/0223947 A1 – describes that if particular data collected and analyzed in real-time does not meet a certain stipulated threshold or fall within a specified range, or meet other stipulated criteria, the uploaded itinerary instructions can instruct the system to ignore such data, or modify/adjust such data, or collect one or more types of additional or different data.
 THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARLENE GARCIA-GUERRA whose telephone number is (571) 270-3339. The examiner can normally be reached on M-F 7:30a.m.-5:00p.m. EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
 If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Brian M. Epstein can be reached on (571) 270-5389. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like claim 
/Darlene Garcia-Guerra/
Examiner, Art Unit 3683

/SUSANNA M. DIAZ/Primary Examiner, Art Unit 3683