Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
	This final office action is in response to the arguments/amendments, filed 2/28/2022. Claims 1-4, 9, 13, 22, 26-27, 30, 34-35, 37, 42-45, 64, 66 and 69 have been amended. Claims 1-4, 9, 13, 22, 26-27, 30, 34-35, 37, 42-45, 64, 66 and 69 are currently pending and have been examined below. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 45, 48, and 66 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable WO 2014178853 (“Sandholm”) in view of US Patent Application Publication Number 20170083796 (“Kim”). 
Claim 1
As per claim 1, Sandholm teaches a computer-vision system ([0033] “video stream module . . . of computing device” and [0034] “video stream processing module”) that: 
generates from a pixel stream a digital representation of a person ([0019] “process a video stream obtained by capture device” and “video stream processing instructions may detect faces of potential receiving users in the video stream and then extract face images from the video stream to send 
determines attributes or characteristics of the person from that digital representation ([0034] “face images may be extracted from the video stream and then provided to cloud server whenever a face is detected. Face detection module may analyze the video stream to detect the faces of potential receiving users for video stream processing module. For example, an object-class detection algorithm specifically configured to detect facial features may be used to detect faces in the video stream.”  Examiner interprets facial features as attributes or characteristics of a person.”);  
based on those attributes or characteristics, outputs data to a cloud-based analytics system that enables that analytics system to identify and also to authenticate the person ([0019] “The detected face images may be provided to the cloud service in a request for face recognition processing, where the results of the face recognition processing are received by temporary token receiving instructions.” And, [0026] “requesting cloud recognition processing from cloud server. Face images receiving instructions may perform face recognition on the face images to identify a matching face profile of a registered receiving user.” And, [0068] “providing ad-hoc, face-recognition-driven authentication by a cloud server.”). 
Sandholm teaches a computer vision system but does not explicitly teach the following feature taught by Kim: 
the computer-vision system including a neural network that has been trained to recognize objects, the objects recognizable by the neural network including a person, wherein the neural network recognizes an object as a person ([0096] “training of a person recognition neural network.” And, [0068] “When the image recognizer receives an input image, the image recognizer recognizes a person given as a recognition target in the input image by using the convolutional neural network . . . the recognition target is not limited to a person, but the recognition target may be a traffic sign or the like.” And, [0074] “the image recognizer 80 performs a two-dimensional recognition process to determine whether a recognition target such as a person exists in the input image. If a recognition target exists in the input image, the image recognizer 80 outputs information indicating that a person exists in the input image.”). 
Therefore, it would have been obvious to modify Sandholm to include the computer-vision system including a neural network that has been trained to recognize objects, the objects recognizable by the 

Claim 45
As per claim 45, Sandholm further teaches: 
where the digital representation is created locally at a computer-vision system, or at a hub, or in the cloud, or distributed across computer-vision systems and one or more hubs and the cloud ([0019] “process a video stream obtained by capture device” and “video stream processing instructions may detect faces of potential receiving users in the video stream and then extract face images from the video stream to send to a cloud service for processing.” Examiner interprets face images generated from a video stream as a digital representation of a person generated from a pixel stream and notes the digital representation is created locally at a computer-vision system.) or (ii) where the digital representation is included in a 'track record' that uses the reformatting of real-time metadata into a per-object (e.g. per-person) record of one or more of their trajectory, pose and identity of that object. 

Claim 48
As per claim 48, Sandholm further teaches: 
	(i) where the digital representation includes an estimate or measurement of depth or distance from the sensor of a person or object or part of the environment; or (ii) where the digital representation  includes an estimate or measurement of depth or distance from the sensor of a person or object or part of the environment, and where depth sensing uses a calibration object of approximately known size, or stereoscopic cameras or structured light; or (iii) where the digital representation includes facial recognition data ([0019] “process a video stream obtained by capture device” and “video stream processing instructions may detect faces of potential receiving users in the video stream and then extract face images from the video stream to send to a cloud service for processing” where “the detected face images may be provided to the cloud service in a request for face recognition processing, where the results of the  or (iv) where sensor metadata is fed into a hub, gateway or controller that pushes events to smart devices in a network as specific commands, and differentiates the events created on a per service basis to allow each service to receive different data that is relevant to their service from the group of sensors as a single intelligent sensor: or (v) where event streams are sent to cloud analytics apps; or (vi) where event streams are sent to cloud analytics apps, and where an event subscription service, to which a system controller subscribes, receives event notifications and data from the devices or sensors. 

Claim 66
As per claim 66, Sandholm teaches an appliance that includes a sensor ([0017] “Capture device 118 may include one or more image sensors for capturing images that are stored on the computing device 100. For example, capture device 118 may be an embedded camera device, a web camera, an Internet protocol (IP) camera.”) that in turn includes an embedded computer-vision engine ([0033] “video stream module . . . of computing device” and [0034] “video stream processing module”) that: 
generates from a pixel stream a digital representation of a person ([0019] “process a video stream obtained by capture device” and “video stream processing instructions may detect faces of potential receiving users in the video stream and then extract face images from the video stream to send to a cloud service for processing.” Examiner interprets face images generated from a video stream as a digital representation of a person generated from a pixel stream.); 
determines attributes or characteristics of the person from that digital representation ([0034] “face images may be extracted from the video stream and then provided to cloud server whenever a face is detected. Face detection module may analyze the video stream to detect the faces of potential receiving users for video stream processing module. For example, an object-class detection algorithm specifically configured to detect facial features may be used to detect faces in the video stream.”  Examiner interprets facial features as attributes or characteristics of a person.”);  
based on those attributes or characteristics, outputs data to a cloud-based analytics system that enables that analytics system to identify and also to authenticate the person ([0019] “The detected face images may be provided to the cloud service in a request for face recognition processing, where the results of the face recognition processing are received by temporary token receiving instructions.” And, [0026] “requesting cloud recognition processing from cloud server. Face images receiving instructions may perform face recognition on the face images to identify a matching face profile of a registered receiving user.” And, [0068] “providing ad-hoc, face-recognition-driven authentication by a cloud server.”). 
Sandholm teaches a computer vision system but does not explicitly teach the following feature taught by Kim: 
a neural network that has been trained to recognize objects, the objects recognizable by the neural network including a person, wherein the neural network recognizes an object as a person ([0096] “training of a person recognition neural network.” And, [0068] “When the image recognizer receives an input image, the image recognizer recognizes a person given as a recognition target in the input image by using the convolutional neural network . . . the recognition target is not limited to a person, but the recognition target may be a traffic sign or the like.” And, [0074] “the image recognizer 80 performs a two-dimensional recognition process to determine whether a recognition target such as a person exists in the input image. If a recognition target exists in the input image, the image recognizer 80 outputs information indicating that a person exists in the input image.”). 
Therefore, it would have been obvious to modify Sandholm to include the computer-vision system including a neural network that has been trained to recognize objects, the objects recognizable by the neural network including a person, wherein the neural network recognizes an object as a person as taught by Kim because “by using the convolutional neural network (fully convolutional neural network) . . . it is possible to perform an image recognition process on a real-time basis” (Kim [0006]) allowing for faster and more accurate user identify recognition. 

s 2 and 3 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over WO 2014178853 (“Sandholm”) in view of US Patent Application Publication Number 20170083796 (“Kim”) as applied to claim 1 above, and in further view of US Patent Application Publication Number 20150161435 (“Jung”). 
Claim 2
	As per claim 2, Sandholm teaches extracting a facial image for use by the cloud-based analytics system to identify and authenticate the person ([0019]) but does not explicitly teach the following feature taught by Jung: 
the attributes or characteristics of the person include their pose, and the system analyses that pose to extract a facial image from the pixel stream that is the best facial image for use by the cloud-based analytics system to identify and authenticate the person ([0104] “The frontal face detection apparatus 100 using a facial pose according to the present invention repeatedly detects a final frontal face per frame of a video, selects each final face image having the lowest facial pose score.” And, [0080] “as the facial pose score is lower, an optimal facial pose image easy for face recognition is generated.” And, [0009] “for a frontal face image suitable for recognition in video images, an image theoretically suitable for recognition is basically regarded as an image, in which a pose angle corresponding to the roll, pitch and yaw of the face is close to 0°.” And, [0011] “recognition is performed based on the detected face image.” And, [0014] “extract an optimal frontal face image easy for recognition.”). 
Therefore, it would have been obvious to modify the combination of Sandholm and Kim to include the attributes or characteristics of the person include their pose, and the system analyses that pose to extract a facial image from the pixel stream that is the best facial image for use by the cloud-based analytics system to identify and authenticate the person as taught by Jung because “detecting a frontal face is an important factor in face recognition” (Jung [0007]) and “us[ing] a frontal face image obtained by precisely measuring a facial pose” [0012]) allows “an optimal facial pose image easy for face recognition [to be] generated” (Jung [0080]). 



Claim 3
As per claim 3, Sandholm further teaches: 
the computer-vision system outputs the facial image to the cloud-based analytics system, but does not output the full-frame real-time video to the cloud-based analytics system ([0019] “process a video stream obtained by capture device” and “video stream processing instructions may detect faces of potential receiving users in the video stream and then extract face images from the video stream to send to a cloud service for processing.” Examiner notes that the extracted face images(s) are transmitted to the cloud-based analytics system, not the full video). 

4 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over WO 2014178853 (“Sandholm”) in view of US Patent Application Publication Number 20170083796 (“Kim”) as applied to claim 1 above, and in further view of US Patent Application Publication Number 20150026708 (“Ahmed”). 
Claim 4
As per claim 4, Sandholm does not explicitly teach but Ahmed teaches: 
(i) which the data output is indexed in real-time by the cloud-based analytics system with the identity of persons: or (ii) in which the computer-vision system is implanted in an ASIC or SoC located in a camera, such as a security camera; or (iii) in which the computer-vision system operates in real-time, directly processing raw image sensor data: or (iv) in which the cloud-based analytics system controls a digital advertising system or digital signage to provide one-to-one real time marketing to individuals whom it has recognized ([0066] “cloud-based authentication approach.” And, [0203] “detecting presence of, or identifying/authenticating, the user might include, without limitation, analyzing captured images or video segments using one or more of facial recognition software.” And, [0020] “determining, with a second computer, at least one advertisement based at least in part on profile information of the identified user and presenting the at least one advertisement to the user.” And, [0066] “With a PDD, an ICD, and/or a video calling device, when a user enters the room, and the camera sensors detect that user's facial features (or other biometric features) and authenticates the individual.” And, [0052] “presence detection can be local and/or cloud based.” And, [0092] “the PDD may also interface with (or have integrated therein) a camera or other video/image capture device.”). 
Therefore, it would have been obvious to modify the combination of Sandholm and Kim to include (i) which the data output from the engine is indexed in real-time by the cloud-based analytics system with the identity of persons: or (ii) in which the computer-vision system is implanted in an ASIC or SoC located in a camera, such as a security camera; or (iii) in which the computer-vision system operates in real-time, directly processing raw image sensor data: or (iv) in which the cloud-based analytics system controls a digital advertising system or digital signage to provide one-to-one real time marketing to individuals whom it has recognized as taught by Ahmed in order to “better tailor or otherwise more effectively determine advertisements relevant to the user” (Ahmed [0089]).  

s 9 and 34 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over WO 2014178853 (“Sandholm”) in view of US Patent Application Publication Number 20170083796 (“Kim”) as applied to claim 1 above, and in further view of US Patent Application Publication Number 20100134619 (“Hampapur”). 
Claim 9
As per claim 9, Sandholm does not explicitly teach but Hampapur teaches: 
(i) in which the computer-vision system dynamically preserves resolution of selective areas (or region of interest) within each frame; or (ii) in which the computer-vision system dynamically adjusts the effective frame rate based on content and manages streaming level based on user control: or (iii) that outputs a real-time stream of metadata from the pixel stream, the metadata describing the instantaneous attributes or characteristics of each object in the scene it has been trained to search for ([0017] “sensor data, including video metadata generated by processing unit, as well as rules against which the metadata is compared to identify objects and attributes of objects present within region of interest.” And, [0019] “Evaluation component processes visual media from sensor devices in real-time, identifying events, objects, and attributes of objects that are detected in region of interest.” And, [0015] “image data representing visual attributes of objects (e.g., people, products, vehicles etc.) within region of interest.”) or (iv) that outputs a real-time stream of metadata from the pixel stream, the metadata describing the instantaneous attributes or characteristics of each object in the scene it has been trained to search for, and that sends or uses those attributes or characteristics to enable one or more networked devices or sensors to be controlled. 
Therefore, it would have been obvious to modify the combination of Sandholm and Kim to include (i) in which the computer-vision system dynamically preserves resolution of selective areas (or region of interest) within each frame; or (ii) in which the computer-vision system dynamically adjusts the effective frame rate based on content and manages streaming level based on user control: or (iii) that outputs a real-time stream of metadata from the pixel stream, the metadata describing the instantaneous attributes or characteristics of each object in the scene it has been trained to search for; or (iv) that outputs a real-time stream of metadata from the pixel stream, the metadata describing the instantaneous attributes or characteristics of each object in the scene it has been trained to search for, and that sends or uses those  in order to “dynamically create a variety of modified alerts for increased effectiveness of the monitoring system” (Hampapur [0025]). 

Claim 34
	As per claim 34, Sandholm further teaches: 
	outputs continuous or streaming video ([0013] “receiving users in the current video stream (i.e., current physical context).” And, [0019] “process a video stream obtained by capture device.” And, 
Sandholm does not explicitly teach but Hampapur teaches: 
metadata that defines various attributes of individual persons ([0017] “sensor data, including video metadata generated by processing unit, as well as rules against which the metadata is compared to identify objects and attributes of objects present within region of interest.” And, [0019] “Evaluation component processes visual media from sensor devices in real-time, identifying events, objects, and attributes of objects that are detected in region of interest.” And, [0015] “image data representing visual attributes of objects (e.g., people, products, vehicles etc.) within region of interest.”). 
Therefore, it would have been obvious to modify the combination of Sandholm and Kim to include metadata that defines various attributes of individual persons as taught by Hampapur in order to “dynamically create a variety of modified alerts for increased effectiveness of the monitoring system” (Hampapur [0025]). 

13 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over WO 2014178853 (“Sandholm”) in view of US Patent Application Publication Number 20170083796 (“Kim”) as applied to claim 1 above, and in further view of US Patent Application Publication Number 20130054377 (“Krahnstoever”). 
Claim 13
As per claim 13, Sandholm does not explicitly teach but Krahnstoever teaches: 
(i) that can detect multiple people in a scene and continuously track or detect one or more of their: trajectory, pose, gesture, identity or (ii) that can infer or describe a person's behaviour or intent by analyzing one or more of the trajectory, pose, gesture, identity of that person ([0005] “determine gaze directions and body pose directions for the potential customers, and to determine interest levels of the potential customers in the advertising content based on the determined gaze directions and body pose directions.” And, [0029] “processing in real-time. . . Such tracking data may include . . . one or more of gaze direction, body pose direction, direction of motion, position.” And, [0061] “directly infer if a group of people are together interacting with the advertising station (e.g., Is someone currently discussing with peers (revealing mutual gazes), asking them to participate, or inquiring parent's support of purchase?”). or (iii) that performs real-time virtualization of a scene, extracting objects from the scene and grouping their virtualized representations together; or (iv) that applies feature extraction and classification to find objects of known characteristics in each video frame or applies a convolutional or recurrent neural network or another object detection algorithm to do so; or (v) that detects people by extracting independent characteristics including one or more of the following: the head, head & shoulders, hands and full body, each in different orientations, to enable an individual's head orientation, shoulder orientation and full body orientation to be independently evaluated for reliable people tracking; or (vi) that continuously monitors the motion of individuals in the scene and predicts their next location to enable reliable tracking even when the subject is temporarily lost or passes behind another object; or (vii) that contextualizes individual local representations to construct a global representation of each person as they move through an environment of multiple sensors in multiple locations: or (viii) uses data from multiple sensors, each capturing different parts of an environment, to track and show an object moving through that environment and to form a global representation that is not limited to the object when imaged from a single sensor; or (ix) where the approximate location of an object in 3D is reconstructed using depth/distance estimation to assist accuracy of tracking and construction of the global representation from multiple sensors. 
Therefore, it would have been obvious to modify the combination of Sandholm and Kim to include (i) that can detect multiple people in a scene and continuously track or detect one or more of their: trajectory, pose, gesture, identity: or (ii) that can infer or describe a person's behaviour or intent by analyzing one or more of the trajectory, pose, gesture, identity of that person; or (iii) that performs real-time virtualization of a scene, extracting objects from the scene and grouping their virtualized representations together; or (iv) that applies feature extraction and classification to find objects of known characteristics in each video frame or applies a convolutional or recurrent neural network or another object detection algorithm to do so; or (v) that detects people by extracting independent characteristics including one or more of the following: the head, head & shoulders, hands and full body, each in different orientations, to enable an individual's head orientation, shoulder orientation and full body orientation to be independently evaluated for reliable people tracking; or (vi) that continuously monitors the motion of individuals in the scene and predicts their next location to enable reliable tracking even when the subject is temporarily lost or passes behind another object; or (vii) that contextualizes individual local representations to construct a global representation of each person as they move through an environment of multiple sensors in multiple locations: or (viii) uses data from multiple sensors, each capturing different parts of an environment, to track and show an object moving through that environment and to form a global representation that is not limited to the object when imaged from a single sensor; or (ix) where the approximate location of an object in 3D is reconstructed using depth/distance estimation to assist accuracy of tracking and construction of the global representation from multiple sensors as taught by Krahnstoever in order to “improve[] localization and data association for tracking in crowded environments” (Krahnstoever [0034]) and “improve overall tracking performance in crowded conditions” (Krahnstoever [0037]). 

22 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over WO 2014178853 (“Sandholm”) in view of US Patent Application Publication Number 20170083796 (“Kim”) as applied to claim 1 above, and in further view of US Patent Application Publication Number 20100298957 (“Rocha”). 
Claim 22
As per claim 22, Sandholm does not explicitly teach but Rocha teaches: 
(ii) that operates as an interface to enable control of multiple networked computer-enabled sensors and devices in the smart home or office ([0007] “a multi-function sensor includes a network interface and at least one sensor interface operatively coupled to the network interface and configured to receive digital or analog signals from at least one external sensor.” And, [0002] “a multi-function sensor includes a plurality of sensors configured to sense corresponding parameters in a room.” And, [0008] “a multi-function sensor for home automation.”) or (ii) where the digital representation conforms to an API; or (iii) where the digital representation includes feature vectors that define the appearance of a generalized person; or (iv) where the digital representation is used to display a person as a standardized shape 
Therefore, it would have been obvious to modify the combination of Sandholm and Kim to include (ii) that operates as an interface to enable control of multiple networked computer-enabled sensors and devices in the smart home or office; or (ii) where the digital representation conforms to an API; or (iii) where the digital representation includes feature vectors that define the appearance of a generalized person; or (iv) where the digital representation is used to display a person as a standardized shape as taught by Rocha in order to “dynamically create a variety of allow various third party sensors to be easily integrated into the home automation system” (Rocha [0028]). 

s 26 and 27 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over WO 2014178853 (“Sandholm”) in view of US Patent Application Publication Number 20170083796 (“Kim”) as applied to claim 1 above, and in further view of US Patent Application Publication Number 20140267544 (“Li”). 
Claim 26
As per claim 26, Sandholm does not explicitly teach but Li teaches: 
where the digital representation is used to display a person as a graphical symbolic or simplified representation of a person ([0032] “the server 106 may receive captured video of the user's facial expression from the computing device 102, extract the facial parameters and generate the avatar video.” And, [0025] “render the avatar on the display 138 of the computing device 102. In some embodiments, the video module 202 displays the real-time video of the user on the display 138 and may also indicate on the display 138 whether the facial parameters are being extracted in real-time (e.g., by marking the boundary of the user's facial expression on the display 138).”).
Therefore, it would have been obvious to modify the combination of Sandholm and Kim to include where the digital representation is used to display a person as a graphical symbolic or simplified representation of a person as taught by Li in order to “protect[] a person's privacy” (Li [0001]). 

Claim 27
As per claim 27, Sandholm does not explicitly teach but Li teaches: 
(i) where the symbolic or simplified representation is a flat or 2-dimensional shape including head, body, arms and legs; or (ii) where the symbolic or simplified representations of different people are distinguished using different colours: or (iii) where the symbolic or simplified representation is an avatar ([0032] “the server 106 may receive captured video of the user's facial expression from the computing device 102, extract the facial parameters and generate the avatar video.” And, [0025] “render the avatar on the display 138 of the computing device 102. In some embodiments, the video module 202 displays the real-time video of the user on the display 138 and may also indicate on the display 138 whether the facial parameters are being extracted in real-time (e.g., by marking the boundary of the user's facial expression on the display 138).”).
 in order to “protect[] a person's privacy” (Li [0001]).

30 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over WO 2014178853 (“Sandholm”) in view of US Patent Application Publication Number 20170083796 (“Kim”) as applied to claim 1 above, and in further view of US Patent Application Publication Number 20140375886 (“Galleguillos”). 
Claim 30
	As per claim 30, Sandholm does not explicitly teach but Galleguillos teaches: 
	(i) where the digital representation includes feature vectors that define the appearance of a specific person ([0038] “extracting feature vectors for faces from a frame.” And, [0041] “Each face is represented by a high dimensional feature vector 414 generated during face feature extraction 412. The face feature vector 414 captures information to uniquely represent the appearance of a face.” And, claim 5 “generating, for a set of frames from the video, a feature vector to uniquely represent the appearance of each of one or more faces identified in the frame.”) or (ii) where the digital representation of a person is used to analyse, or enable the analysis of one or more of trajectory, pose, gesture and identity of that person and smart home devices can respond to and predict the person's intent and/or needs based on that analysis; or (iii) where the digital representation is not an image and does not enable an image of a person to be created from which that person can be recognised; or (iv) that does not output continuous or streaming video but instead metadata that defines various attributes of individual persons 
Therefore, it would have been obvious to modify the combination of Sandholm and Kim to include (i) where the digital representation includes feature vectors that define the appearance of a specific person; or (ii) where the digital representation of a person is used to analyse, or enable the analysis of one or more of trajectory, pose, gesture and identity of that person and smart home devices can respond to and predict the person's intent and/or needs based on that analysis; or (iii) where the digital representation is not an image and does not enable an image of a person to be created from which that person can be recognised; or (iv) that does not output continuous or streaming video but instead metadata that defines various attributes of individual persons as taught by Galleguillos because using “a high dimensional vector (descriptor) . . . for representing a face enables the system to accurately establish a similarity metric between faces” (Galleguillos [0079]) resulting in improved identity verification accuracy. 
35 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over WO 2014178853 (“Sandholm”) in view of US Patent Application Publication Number 20170083796 (“Kim”) in view of US Patent Application Publication Number 20100134619 (“Hampapur”) as applied to claim 34 above, and in further view of US Patent Publication Number 20130054377 (“Krahnstoever”). 
Claim 35
	As per claim 35, Sandholm does not explicitly teach but Krahnstoever teach: 
(i) where the characteristics or attributes include one or more of trajectory, pose, gesture, identity; or (ii) where the characteristics or attributes include each of trajectory, pose, gesture, and identity ([0005] “determine gaze directions and body pose directions for the potential customers, and to determine interest levels of the potential customers in the advertising content based on the determined gaze directions and body pose directions.” And, [0029] “processing in real-time. . . Such tracking data may include . . . one or more of gaze direction, body pose direction, direction of motion, position.” And, [0061] “directly infer if a group of people are together interacting with the advertising station (e.g., Is someone currently discussing with peers (revealing mutual gazes), asking them to participate, or inquiring parent's support of purchase?”).
Therefore, it would have been obvious to modify the combination of Sandholm, Kim, and Hampapur to include (i) where the characteristics or attributes include one or more of trajectory, pose, gesture, identity; or (ii) where the characteristics or attributes include each of trajectory, pose, gesture, and identity as taught by Krahnstoever in order to “improve[] localization and data association for tracking in crowded environments” (Krahnstoever [0034]) and improve overall tracking performance in crowded conditions (Krahnstoever [0037]). 

37 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over WO 2014178853 (“Sandholm”) in view of US Patent Application Publication Number 20170083796 (“Kim”) as applied to claim 1 above, and in further view of US Patent Publication Number 20140333776 (“Dedeoglu”). 
Claim 37
	As per claim 37, Sandholm does not explicitly teach but Dedeoglu teaches: 
(i) that works with standard images sensors working with chip-level systems that generate real-time data that enables a digital representation of people or other objects to be created; or (ii) that works with IP cameras to form a real-time metadata stream to accompany the output video stream providing an index of video content frame by frame ([0025] “IP cameras” and “cameras may also transmit a stream of metadata in association with the video stream that includes information regarding types of events detected in frames.” And, [0026] “supports the streaming of metadata associated with frames in the video stream. This metadata provides key features of the video stream to enable additional VA in a surveillance center receiving the video stream and metadata.” And, [0037] “The image signal processing component 404 divides the incoming digital signal into frames of pixels and processes each frame . . . The processed frames are provided to the video encoder component 408, the video analytics component 412, and the tampering detection component (206).” And, [0031] “provide video streams in which events have been detected (either by VA software of the VMS or video analytics of a camera) to the summary view computer system 318 along with metadata regarding the events.” And, [0058]) or (iii) that works with smart sensors that use visual information, but never form imagery or video at a hardware level; or (iv) that builds a virtualized digital representation of each individual in the home, comprising each individual's: Trajectoty around the home, including for example the actions of standing and sitting; Pose, for example in which direction the person is facing, and/or in which direction they are looking; Gesture, for example motions made by the person's hands; and Identity, namely the ability to differentiate between people and assign a unique identity (e.g. name) to each person; or (v) that is programmed to understand a wide range of behaviours from the set: counting the number of people in the room, understanding people's pose, identifying persons using facial recognition data, determining where people are moving from/to, extracting specific gestures by an identified individual.
. 

42 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over WO 2014178853 (“Sandholm”) in view of US Patent Application Publication Number 20170083796 (“Kim”) as applied to claim 1 above, and in further view of US Patent Publication Number 20070217765 (“Itoh”). 
Claim 42
	As per claim 42, Sandholm does not explicitly teach but Itoh teaches: 
	where the data rate of the data sent from the computer-vision system is throttled up or based on event-triggering ([0014] “detecting an event set in advance from the image data acquired from each of the cameras; and a frame rate changing unit for changing the frame rate of the image data from which the event is detected; the video recorder and player comprising a first unit for changing only the frame rate of the image data from which the event is detected, when the event is detected.” And [0069] “the change result of the frame rate by the frame rate changing unit 303 can be transmitted to the image compression unit 102 for image processing, too, and the compression ratio of the images can be changed in accordance with the change of the frame rate.” And, [0070] “the compression ratio of the images may be lowered when the event is detected and in this way, the drop of resolution of the images due to image data compression can be suppressed and image processing can be acquired with higher accuracy.” 
Therefore, it would have been obvious to modify the combination of Sandholm and Kim to include where the data rate of the data sent from the computer-vision system is throttled up or based on event-triggering as taught by Itoh so that “event analysis having higher accuracy can be acquired with limited processing resources and the monitor result having high reliability can be acquired while effective utilization of the processing resources and the recording medium is sufficiently achieved (Itoh [0068]).

43 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over WO 2014178853 (“Sandholm”) in view of US Patent Application Publication Number 20170083796 (“Kim”) as applied to claim 1 above, and in further view of US Patent Publication Number 20160026253 (“Bradski”). 
Claim 43
	As per claim 43, Sandholm does not explicitly teach but Bradski teaches: 
	the computer-vision system including a hub and multiple computer-vision systems with shared fields of view or with different fields of view, which send their data to the hub that stores and analyses that data and enables a digital representation of a person to be constructed from the computer-vision systems with the shared fields of view or with the differing fields of view, tracking that person and also recognizing that person ([0556] “hub, central, or distributed, server computer systems and one or more individual AR systems communicatively coupled.” And, [0794] “a camera (or cameras) associated with the users' individual AR system captures multiple images, a large number of points are collected and transmitted to the cloud.” And, [1212] “a wide field of view camera from a pixel count image quality perspective, but with overlapping or non-overlapping fields of view. A plurality of two or three element wafer level of cameras can replace a specific wide field of view lens that has five or six elements, while still achieving the same field of view as the wide field of view camera.” And, [0617] “the AR system can represent two images captured by respective cameras of a part of the same scene.” And, [0709] “The AR system may render virtual representations of users or other entities, referred to as avatars, as described in some detail above. The AR system may render an avatar of a user in the user's own virtual spaces, and/or in the virtual spaces of other user's.” And, [01613] “if a user walks toward a kiosk, the kiosk may be equipped with eye-trackers that are able to determine what the user's eyes are focusing on. Based on this information, a digital human, or video representation of a human at the kiosk (e.g., a video at the kiosk) may be able to look into the user's eyes while interacting with the user.” And, [1709] “iris identification may be used to identify the user.” And, [1730] “the user can be authenticated based on one or more dynamically measured retinal signatures.” And, [1123] “visual tracking of the user's hand and finger.” And, [0373] “increase the field of view of the display, because each engine is being used to scan a different portion of the field of view.”). 
. 

44 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over WO 2014178853 (“Sandholm”) in view of US Patent Application Publication Number 20170083796 (“Kim”) in view of US Patent Application Publication Number 20160026253 (“Bradski”) as applied to claim 43 above, and in further view of US Patent Application Publication Number 20150111539 (“Shim”). 
Claim 44
	As per claim 44, Sandholm does not explicitly teach but Shim teaches:
where the hub exposes an open, person-level digital representation API, enabling various appliances to use and to be controlled in dependence on the data encoded in the API ([0258] “If the home appliance is connected, the notify server 500c transmits a TCP relay including a remote control command to the home appliance 200, e.g., an agent 270 of the home appliance 200 (Sk3). The agent 270a calls the API via the controller 270 of the home appliance 200 (Sk4). The controller 270 of the home appliance 200 transmits an API response result (the remote control result according to the remote control command) to the agent 270a (Sk5).” And, [0203] “The Open API communication method may be used upon authentication processing or when the file data stored in the server 500 is transmitted to the home appliance 200.” And, [0233] “data further transmitted to the mobile terminal 300 may not include a device type, unlike FIG. 9A. That is, a device list of all manageable home appliances corresponding to the user ID may be further transmitted from the server.”). 
Therefore, it would have been obvious to modify the combination of Sandholm, Kim, and Bradski to include where the hub exposes an open, person-level digital representation API, enabling various appliances to use and to be controlled in dependence on the data encoded in the API as taught by Shim “increasing user convenience in terms of communication with a home appliance” (Shim [0005]).

64 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over WO 2014178853 (“Sandholm”) in view of US Patent Application Publication Number 20170083796 (“Kim”) as applied to claim 1 above in further view of US Patent Application Publication Number 20140365518 (“Calo”). 
Claim 64
	As per claim 64, Sandholm does not explicitly teach but Calo teaches:
computer vision system or engine that is localised in one or more of the following: (a) an edge layer that processes raw sensor data; (b) an aggregation layer that provides high level analytics by aggregating and processing data from the edge layer in the temporal and spatial domains; (c) a service layer that handles all connectivity to one or more system controllers and to the end customers for configuration of their home systems and the collection and analysis of the data produced ([0015] “processing on the raw data throughout the data capture system, for example at an edge layer, to extract higher level information.” And, [0040] “physical sensors, and the raw data are data obtained from these physical sensors.” And, [0040] “obtaining raw data sufficient to satisfy the data collection requirements are identified at the edge layer in the data capture system 208.” And, [0037] “Suitable data generating networked devices include . . . surveillance cameras.”).
Therefore, it would have been obvious to modify the combination of Sandholm and Kim to include computer vision system or engine that is localised in one or more of the following: (a) an edge layer that processes raw sensor data; (b) an aggregation layer that provides high level analytics by aggregating and processing data from the edge layer in the temporal and spatial domains; (c) a service layer that handles all connectivity to one or more system controllers and to the end customers for configuration of their home systems and the collection and analysis of the data produced as taught by Calo to allow for “Intelligent and efficient processing is provided at the network edge . . .The resulting . . . data has much less volume, and carries higher level semantics that are more easily consumable by the application” (Calo [0056]). 

69 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over WO 2014178853 (“Sandholm”) in view of US Patent Application Publication Number 20110043625 (“Cobb”) in view of US Patent Application Publication Number 20170083796 (“Kim”)
Claim 69
As per claim 69, Sandholm teaches a computer-vision engine ([0033] “video stream module . . . of computing device” and [0034] “video stream processing module”) that: 
generates from a pixel stream a digital representation of a person ([0019] “process a video stream obtained by capture device” and “video stream processing instructions may detect faces of potential receiving users in the video stream and then extract face images from the video stream to send to a cloud service for processing.” Examiner interprets face images generated from a video stream as a digital representation of a person generated from a pixel stream.); 
determines attributes or characteristics of the person from that digital representation ([0034] “face images may be extracted from the video stream and then provided to cloud server whenever a face is detected. Face detection module may analyze the video stream to detect the faces of potential receiving users for video stream processing module. For example, an object-class detection algorithm specifically configured to detect facial features may be used to detect faces in the video stream.”  Examiner interprets facial features as attributes or characteristics of a person.”);  
based on those attributes or characteristics, outputs data to a cloud-based analytics system that enables that analytics system to identify and also to authenticate the person ([0019] “The detected face images may be provided to the cloud service in a request for face recognition processing, where the results of the face recognition processing are received by temporary token receiving instructions.” And, [0026] “requesting cloud recognition processing from cloud server. Face images receiving instructions may perform face recognition on the face images to identify a matching face profile of a registered receiving user.” And, [0068] “providing ad-hoc, face-recognition-driven authentication by a cloud server.”). 
Sandholm teaches a computer-vision engine ([0033]) but does not explicitly teach that the computer vision engine is provided by chip-level firmware embodied on a non-transitory storage medium as taught by Cobb ([0036] “various components and modules of the behavior-recognition system 100 may be implemented in other systems. For example, in one embodiment, the computer vision engine 135 may be implemented as a part of a video input device (e.g., as a firmware component wired directly into a video camera). In such a case, the output of the video camera may be provided to the machine-learning engine 140 for analysis.” And, [0030] “contained on a variety of computer-readable storage media.”).
Therefore, it would have been obvious to modify Sandholm to include that the computer vision engine is provided by chip-level firmware as taught by Cobb in order to “improve[] the ability of the system to quickly classify objects and behaviors and learn from previously observed patterns to identify normal and/or abnormal events” (Cobb [0072]). 
Sandholm teaches a computer vision system but does not explicitly teach the following feature taught by Kim: 
a neural network that has been trained to recognize objects, the objects recognizable by the neural network including a person, wherein the neural network recognizes an object as a person ([0096] “training of a person recognition neural network.” And, [0068] “When the image recognizer receives an input image, the image recognizer recognizes a person given as a recognition target in the input image by using the convolutional neural network . . . the recognition target is not limited to a person, but the recognition target may be a traffic sign or the like.” And, [0074] “the image recognizer 80 performs a two-dimensional recognition process to determine whether a recognition target such as a person exists in the input image. If a recognition target exists in the input image, the image recognizer 80 outputs information indicating that a person exists in the input image.”). 
Therefore, it would have been obvious to modify the combination of Sandholm and Cobb to include the a neural network that has been trained to recognize objects, the objects recognizable by the neural network including a person, wherein the neural network recognizes an object as a person as taught by Kim because “by using the convolutional neural network (fully convolutional neural network) . . . it is possible to perform an image recognition process on a real-time basis” (Kim [0006]) allowing for faster and more accurate user identify recognition. 

Response to Arguments 
35 U.S.C. 112(b) – Withdrawn
Applicant has amended claims 37 and 48 to remove the phrase “for example” obviating the 35 U.S.C. 112(b) rejections. Therefore, the rejections have been withdrawn. 

35 U.S.C. 101 - Withdrawn
	Applicant has amended claim 1 to replace “a computer-vision system or engine” with “a computer-vision system” obviating the software per se rejection made in consideration of the broadest reasonable interpretation in view of Applicant’s specification of the term “engine” as referring to software. Therefore, the 35 U.S.C. 101 rejections of independent 1 and dependent claims 4, 9, 13, 22, 26-27, 30, 34-35, 37, 42-45, and 64 have been withdrawn.

Applicant has amended claim 69 to recite a non-transitory storage medium to embody the firmware including a computer vision engine obviating the software per se rejection. Therefore the 35 U.S.C. 101 rejection has been withdrawn.  

35 U.S.C. 103
Applicant's arguments, see pages 13-18, filed 02/28/2022, with respect to the rejection(s) of claims 1-4, 9, 13, 22, 26-27, 30, 34-35, 37, 42-45, 64, 66 and 69 under 35 U.S.C. 102 and 35 U.S.C. 103 have been fully considered and are persuasive. Therefore, the rejections have been withdrawn. However, upon further consideration, a new ground of rejection is made in view of Sandholm and Kim under 35 U.S.C. 103(a).

 


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US Patent Application Publication Number 20090031381 (“Cohen”) discloses an on-board video analytics process the video to augment, or in some cases replace, a continuous stream of still or moving image frames from a particular network video source with inferred or extracted metadata.
US Patent Application Publication Number 20150264296 (“Devaux”) teaches a method of creating or enhancing metadata for a video sequence using non-video data. 
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALLAN J WOODWORTH, II whose telephone number is (571)272-6904. The examiner can normally be reached Mon-Fri 9:00-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ilana Spar, can be reached on 571-270-7537. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To 



/ALLAN J WOODWORTH, II/Examiner, Art Unit 3622         

/ILANA L SPAR/Supervisory Patent Examiner, Art Unit 3622