DETAILED ACTIONS
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-13 and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Eledath et al. (US 20160378861 A1), hereinafter referred to as Eledath, in view of Broggi (US 20210142055 A1), hereinafter referred to as Broggi.

Regarding claim 1, Eledath teaches a computer-implemented method (para. 0001, “a method and/or apparatus for implementing a surveillance camera system looking at passing cars”), comprising: 
detecting one or more vehicle objects (Fig. 6 and para. 0105, gray van is detected in image 606, para. 0109, “the system 110 is able to detect and extract the vehicle from the image”) and one or more human objects (Fig. 6 and para. 0105, a person is detected in image 606, para. 0110, “general persons detected may be tagged with green overlays, but persons of interest may be tagged with red overlays”) in a received image (Fig. 6, para. 0105, image 606), using a detection function comprising an artificial intelligence (AI) model (para. 0137, “the system 110 analyzes video depicting a real world scene, extracts semantic elements from the visual scene, and generates a semantic understanding of the visual scene. To do this, the system 110 executes one or more computer vision algorithms, including object detection algorithms”, para. 0094, “illustrative platform 132 executes artificial intelligence technologies including computer vision”), each of the one or more vehicle objects (Fig. 6, vehicle 608, gray van) and one or more human objects (Fig. 6, 612, Jim Jones is detected in the image 606) corresponding to a portion of the image (Fig. 6, both the van and the person are portion of image 606); 
for each of the one or more vehicle objects (Fig. 6 and para. 0105, gray van is detected in image 606, para. 0109, “the system 110 is able to detect and extract the vehicle from the image”), processing the corresponding portion of the image (Fig. 6 image 600, vehicle 608)  to determine a plurality of properties of the vehicle object (para. 0105, “system 110 extracts visual features 608 and 612 and performs information retrieval based on semantic elements that the system 110 associates with the extracted visual features 608, 612. Based on the system 110's semantic understanding of the feature 608, the system 110 generates and displays virtual element 610, which identifies retrieved information about the vehicle depicted in the image.”, para. 0113, “the vehicle graphical overlay 1206 on the real world scene 1202 identifies a vehicle in the scene (from which the user can view certain characteristics of the vehicle, such as color or make/model) as well as it's spatial location within the scene 1202, including surrounding people and objects.”), and to generate annotations of the corresponding portion of the image with the plurality of properties of the vehicle object (Fig. 6, there is annotation with the properties of the vehicle such as who is the owner and the color of the vehicle); 
for each of the one or more human objects (Fig. 6 and para. 0105, a person is detected in image 606, para. 0110, “general persons detected may be tagged with green overlays, but persons of interest may be tagged with red overlays”), processing the corresponding portion of the image (Fig. 6 image 600, person 612) to determine a plurality of properties of the human object (para. 0105, “system 110 extracts visual features 608 and 612 and performs information retrieval based on semantic elements that the system 110 associates with the extracted visual features 608, 612. Based on the system 110's semantic understanding of the feature 608, the system 110 generates and displays virtual element 610, which identifies retrieved information about the vehicle depicted in the image. Based on the system 110's semantic understanding of the feature 612, the system 110 generates and displays virtual element 614, which identifies the person depicted in the image as well as employment information about the person.”, para. 0038, “the system might determine that a current real world scene includes a person with dark hair getting into a red car. The system 110 may use facial recognition to identify the person”), and to generate annotations of the corresponding portion of the image with the plurality of properties of the human object (Fig. 6, there is annotation with the properties of the person such as the name and the employment of the person); and 
transmitting the received image (Fig. 1, the input image is processed in the vision-based user interface platform 132 and transmitted to the application/services 132, para. 0036, “systems 110 that can be used to provide a dynamic, interactive, vision-based user interface to other applications or services of the computing system”), with the annotations of the one or more vehicle objects and one or more human objects (Fig. 6, image has annotations for the vehicle objects and human objects, para. 0141, “the system 110 may provide output (e.g., virtual element overlays and/or NL output) to one or more other applications/services (e.g., applications/services 134), by one or more display services 250, for example. In block 334, the system 110 may provide output (e.g., virtual element overlays and/or NL output) to one or more other applications/services (e.g., messaging, mapping, travel, social media), by one or more collaboration services 258, for example.”), to a service or application (Fig. 1, the input image is processed in the vision-based user interface platform 132 and transmitted to the application/services 132, para. 0036, “systems 110 that can be used to provide a dynamic, interactive, vision-based user interface to other applications or services of the computing system”) that utilizes the annotated image to perform a function of the service or application (para. 0103, “Portions of the platform 132 can act as a “front-end” to a number of applications/services 134, in some embodiments. The applications/services 134 may include, for example, a search engine, a messaging service, a social media application, a navigation tool, geographic mapping software, etc.”).  

Eledath does not expressly disclose detecting one or more vehicle objects and one or more human objects in a received image using a single detection function comprising a single artificial intelligence (AI) model.
	However, Broggi discloses detecting one or more vehicle objects and one or more human objects in a received image using a single detection function comprising a single artificial intelligence (AI) model (para. 0038, “The block 150 may implement a convolutional neural network (CNN) module”, the single AI model is the CNN module, para. 0050, “The CNN module 150 may determine a likelihood that pixels in the video frames belong to a particular object and/or objects in response to the descriptors. For example, using the descriptors, the CNN module 150 may determine a likelihood that pixels correspond to a particular object (e.g., a person, a vehicle, a car seat, a tree, etc.) and/or characteristics of the object (e.g., a mouth of a person, a hand of a person, headlights of a vehicle, a branch of a tree, a seatbelt of a seat, etc.)”, the CNN module is capable of detecting the likelihood of a person or a car in an image, para. 0082, “The processors 106a-106n may detect objects in the captured video signal of the exterior of a vehicle (e.g., automobiles, bicycles, pedestrians, animals, parking spaces, etc.) and/or of an interior of a vehicle (e.g., the driver 202, other occupants, physical characteristics of people in the vehicle, facial expressions of people in the vehicle, fields of view of the people in the vehicle, etc.).”).
Eledath and Broggi are both considered to be analogous to the claimed invention because they are in the same field of object detection using artificial intelligence. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Eledath to incorporate the teachings of Broggi of detecting one or more vehicle objects and one or more human objects in a received image using a single detection function comprising a single artificial intelligence (AI) model. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been because implementing the CNN module 150 as a dedicated hardware module of the processors 1may enable the apparatus to perform the computer vision operations locally (e.g., on-chip) without relying on processing capabilities of a remote device (e.g., communicating data to a cloud computing service) (Broggi, para. 0050).

Regarding  claim 2, the combination of Eledath in view of Broggi teaches the method of claim 1 (Eledath, para. 0001, “a method and/or apparatus for implementing a surveillance camera system looking at passing cars”), wherein the single Al model (Broggi, Fig. 1, CNN 150) has a substantially same number of nodes and layers (Broggi, para. 0118, “To perform the training and/or the computer vision operations, the CNN module 150′ may generate a number of layers 360a-360n. On each one of the layers 360a-360n, the CNN module 150′ may apply a feature detection window 362. In an example, the feature detection window 362 is shown on a portion of the layer 360a. A convolution operation may be applied by the CNN module 150′ on each of the layers 360a-360n using the feature detection window 362.”, para. 0120, “The layers 360a-360n may comprise convolution layers, pooling layers, non-linear layers and/or fully connected layers.”) as an Al model that detects only vehicle objects or detects only human objects (Broggi, the CNN that Broggi uses that can detect both person and vehicle has the same layers as the usual CNN that was used to detect just person or just vehicle).  

Regarding  claim 3, the combination of Eledath in view of Broggi teaches the method of claim 1 (Eledath, para. 0001, “a method and/or apparatus for implementing a surveillance camera system looking at passing cars”), wherein the one or more human objects (Eledath, Fig. 6, person 612) and the one or more vehicle objects (Eledath, Fig. 6, vehicle 608) are detected and processed using a single copy of the captured image (para. 0105, “the system 110 analyzes an image 606 of a real world scene 600 viewed through an AR device 604 of the user 602”, the single image is the image 606). 
 
Regarding  claim 4, the combination of Eledath in view of Broggi teaches the method of claim 1 (Eledath, para. 0001, “a method and/or apparatus for implementing a surveillance camera system looking at passing cars”), wherein the detection function and single Al model (Broggi, para. 0038, “The block 150 may implement a convolutional neural network (CNN) module”, the single AI model is the CNN module, para. 0050, “The CNN module 150 may determine a likelihood that pixels in the video frames belong to a particular object and/or objects in response to the descriptors. For example, using the descriptors, the CNN module 150 may determine a likelihood that pixels correspond to a particular object (e.g., a person, a vehicle, a car seat, a tree, etc.) and/or characteristics of the object (e.g., a mouth of a person, a hand of a person, headlights of a vehicle, a branch of a tree, a seatbelt of a seat, etc.)”, the CNN module is capable of detecting the likelihood of a person or a car in an image) remain cached across iterations of the method of claim 1 (Broggi, para. 0046, “the frame memory and/or buffer 144a may store (e.g., provide temporary storage and/or cache) one or more of the video frames (e.g., the video signal), Broggi teaches a cache for the memory, para. 0063, “The computer vision pipeline portion 162 may be configured to implement a computer vision algorithm in dedicated hardware. The computer vision pipeline portion 162 may implement a number of sub-modules designed to perform various calculations used to perform feature detection in images (e.g., video frames). Implementing sub-modules may enable the hardware used to perform each type of calculation to be optimized for speed and/or efficiency. For example, the sub-modules may implement a number of relatively simple operations that are used frequently in computer vision operations that, together, may enable the computer vision algorithm to be performed in real-time.”, the iteration of the process of detecting the vehicle and human is done in a pipeline manner, Broggi also teaches identifying the properties of the detected objects such as license number (Broggi, para. 0138) and age of the person (Broggi, para. 0087)).  
Eledath and Broggi are both considered to be analogous to the claimed invention because they are in the same field of object detection using artificial intelligence. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Eledath to incorporate the teachings of Broggi wherein the detection function and single Al model remain cached across iterations of the method of claim 1. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been because it may enable the hardware used to perform each type of calculation to be optimized for speed and/or efficiency (Broggi, para. 0063).

Regarding  claim 5, the combination of Eledath in view of Broggi teaches the method of claim 1 (Eledath, para. 0001, “a method and/or apparatus for implementing a surveillance camera system looking at passing cars”), wherein the vehicle object properties (para. 0105, “system 110 extracts visual features 608 and 612 and performs information retrieval based on semantic elements that the system 110 associates with the extracted visual features 608, 612. Based on the system 110's semantic understanding of the feature 608, the system 110 generates and displays virtual element 610, which identifies retrieved information about the vehicle depicted in the image.”, para. 0113, “the vehicle graphical overlay 1206 on the real world scene 1202 identifies a vehicle in the scene (from which the user can view certain characteristics of the vehicle, such as color or make/model) as well as it's spatial location within the scene 1202, including surrounding people and objects.”) include a license number (Eledath, “optical character recognition technology to read the car's license plate”), a type of vehicle (Eledath, Fig. 6, shows what kind of vehicle it is being a gray van), and a color of vehicle (Eledath, Fig. 6, the color of the van is determined) detected in the vehicle object (Eledath, Fig. 6, vehicle 608, gray van).  

Regarding  claim 6, the combination of Eledath in view of Broggi teaches the method of claim 1 (Eledath, para. 0001, “a method and/or apparatus for implementing a surveillance camera system looking at passing cars”), wherein the human object properties (Eledath, para. 0105, “system 110 extracts visual features 608 and 612 and performs information retrieval based on semantic elements that the system 110 associates with the extracted visual features 608, 612. Based on the system 110's semantic understanding of the feature 608, the system 110 generates and displays virtual element 610, which identifies retrieved information about the vehicle depicted in the image. Based on the system 110's semantic understanding of the feature 612, the system 110 generates and displays virtual element 614, which identifies the person depicted in the image as well as employment information about the person.”, para. 0038, “the system might determine that a current real world scene includes a person with dark hair getting into a red car. The system 110 may use facial recognition to identify the person”) include an approximate age (Eledath, para. 0133, “The visual feature extraction technology tags the image associated with the query text “this vehicle” with the label “red sedan” and tags the image associated with the query text “this person” with the label “young male” and tags the image associated with the query text “this location” with the label “outdoor city.””, identifying the person is a young male is an approximate age), hair color (Eledath, para. 0038, “the system might determine that a current real world scene includes a person with dark hair getting into a red car”), and face landmarks (Eledath, para. 0038, “the system 110 might be able to detect (e.g., in a later frame of a video) the driver of the red car and may be able to determine the identity of the driver through facial recognition”, facial recognition uses facial landmarks) of the human detected in the human object (Eledath, Fig. 6, person 612).  

Regarding claim 7, Eledath teaches a non-transitory machine-readable medium (para. 0116, “a non-transitory computer accessible medium such as memory, data storage, and/or processor hardware.”) having instructions stored therein (para. 0161, “Embodiments may also be implemented as instructions stored using one or more machine-readable media, which may be read and executed by one or more processors”) which when executed by a processor (para. 0161, “Embodiments may also be implemented as instructions stored using one or more machine-readable media, which may be read and executed by one or more processors”), cause the processor to perform operations (para. 0161, “Embodiments may also be implemented as instructions stored using one or more machine-readable media, which may be read and executed by one or more processors”), the operations (para. 0001, “a method and/or apparatus for implementing a surveillance camera system looking at passing cars”) comprising: 
detecting one or more vehicle objects (Fig. 6 and para. 0105, gray van is detected in image 606, para. 0109, “the system 110 is able to detect and extract the vehicle from the image”) and one or more human objects (Fig. 6 and para. 0105, a person is detected in image 606, para. 0110, “general persons detected may be tagged with green overlays, but persons of interest may be tagged with red overlays”) in a received image (Fig. 6, para. 0105, image 606), using a detection function comprising an artificial intelligence (AI) model (para. 0137, “the system 110 analyzes video depicting a real world scene, extracts semantic elements from the visual scene, and generates a semantic understanding of the visual scene. To do this, the system 110 executes one or more computer vision algorithms, including object detection algorithms”, para. 0094, “illustrative platform 132 executes artificial intelligence technologies including computer vision”), each of the one or more vehicle objects (Fig. 6, vehicle 608, gray van) and one or more human objects (Fig. 6, 612, Jim Jones is detected in the image 606) corresponding to a portion of the image (Fig. 6, both the van and the person are portion of image 606); 
for each of the one or more vehicle objects (Fig. 6 and para. 0105, gray van is detected in image 606, para. 0109, “the system 110 is able to detect and extract the vehicle from the image”), processing the corresponding portion of the image (Fig. 6 image 600, vehicle 608)  to determine a plurality of properties of the vehicle object (para. 0105, “system 110 extracts visual features 608 and 612 and performs information retrieval based on semantic elements that the system 110 associates with the extracted visual features 608, 612. Based on the system 110's semantic understanding of the feature 608, the system 110 generates and displays virtual element 610, which identifies retrieved information about the vehicle depicted in the image.”, para. 0113, “the vehicle graphical overlay 1206 on the real world scene 1202 identifies a vehicle in the scene (from which the user can view certain characteristics of the vehicle, such as color or make/model) as well as it's spatial location within the scene 1202, including surrounding people and objects.”), and to generate annotations of the corresponding portion of the image with the plurality of properties of the vehicle object (Fig. 6, there is annotation with the properties of the vehicle such as who is the owner and the color of the vehicle); 
for each of the one or more human objects (Fig. 6 and para. 0105, a person is detected in image 606, para. 0110, “general persons detected may be tagged with green overlays, but persons of interest may be tagged with red overlays”), processing the corresponding portion of the image (Fig. 6 image 600, person 612) to determine a plurality of properties of the human object (para. 0105, “system 110 extracts visual features 608 and 612 and performs information retrieval based on semantic elements that the system 110 associates with the extracted visual features 608, 612. Based on the system 110's semantic understanding of the feature 608, the system 110 generates and displays virtual element 610, which identifies retrieved information about the vehicle depicted in the image. Based on the system 110's semantic understanding of the feature 612, the system 110 generates and displays virtual element 614, which identifies the person depicted in the image as well as employment information about the person.”, para. 0038, “the system might determine that a current real world scene includes a person with dark hair getting into a red car. The system 110 may use facial recognition to identify the person”), and to generate annotations of the corresponding portion of the image with the plurality of properties of the human object (Fig. 6, there is annotation with the properties of the person such as the name and the employment of the person); and 
transmitting the received image (Fig. 1, the input image is processed in the vision-based user interface platform 132 and transmitted to the application/services 132, para. 0036, “systems 110 that can be used to provide a dynamic, interactive, vision-based user interface to other applications or services of the computing system”), with the annotations of the one or more vehicle objects and one or more human objects (Fig. 6, image has annotations for the vehicle objects and human objects, para. 0141, “the system 110 may provide output (e.g., virtual element overlays and/or NL output) to one or more other applications/services (e.g., applications/services 134), by one or more display services 250, for example. In block 334, the system 110 may provide output (e.g., virtual element overlays and/or NL output) to one or more other applications/services (e.g., messaging, mapping, travel, social media), by one or more collaboration services 258, for example.”), to a service or application (Fig. 1, the input image is processed in the vision-based user interface platform 132 and transmitted to the application/services 132, para. 0036, “systems 110 that can be used to provide a dynamic, interactive, vision-based user interface to other applications or services of the computing system”) that utilizes the annotated image to perform a function of the service or application (para. 0103, “Portions of the platform 132 can act as a “front-end” to a number of applications/services 134, in some embodiments. The applications/services 134 may include, for example, a search engine, a messaging service, a social media application, a navigation tool, geographic mapping software, etc.”).  

Eledath does not expressly disclose detecting one or more vehicle objects and one or more human objects in a received image using a single detection function comprising a single artificial intelligence (AI) model.
	However, Broggi discloses detecting one or more vehicle objects and one or more human objects in a received image using a single detection function comprising a single artificial intelligence (AI) model (para. 0038, “The block 150 may implement a convolutional neural network (CNN) module”, the single AI model is the CNN module, para. 0050, “The CNN module 150 may determine a likelihood that pixels in the video frames belong to a particular object and/or objects in response to the descriptors. For example, using the descriptors, the CNN module 150 may determine a likelihood that pixels correspond to a particular object (e.g., a person, a vehicle, a car seat, a tree, etc.) and/or characteristics of the object (e.g., a mouth of a person, a hand of a person, headlights of a vehicle, a branch of a tree, a seatbelt of a seat, etc.)”, the CNN module is capable of detecting the likelihood of a person or a car in an image, para. 0082, “The processors 106a-106n may detect objects in the captured video signal of the exterior of a vehicle (e.g., automobiles, bicycles, pedestrians, animals, parking spaces, etc.) and/or of an interior of a vehicle (e.g., the driver 202, other occupants, physical characteristics of people in the vehicle, facial expressions of people in the vehicle, fields of view of the people in the vehicle, etc.).”).
Eledath and Broggi are both considered to be analogous to the claimed invention because they are in the same field of object detection using artificial intelligence. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the medium as taught by Eledath to incorporate the teachings of Broggi of detecting one or more vehicle objects and one or more human objects in a received image using a single detection function comprising a single artificial intelligence (AI) model. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been because implementing the CNN module 150 as a dedicated hardware module of the processors 1may enable the apparatus to perform the computer vision operations locally (e.g., on-chip) without relying on processing capabilities of a remote device (e.g., communicating data to a cloud computing service) (Broggi, para. 0050).

Regarding  claim 8, the combination of Eledath in view of Broggi teaches the medium of claim 7 (Eledath, para. 0116, “a non-transitory computer accessible medium such as memory, data storage, and/or processor hardware.”) wherein the single Al model (Broggi, Fig. 1, CNN 150) has a substantially same number of nodes and layers (Broggi, para. 0118, “To perform the training and/or the computer vision operations, the CNN module 150′ may generate a number of layers 360a-360n. On each one of the layers 360a-360n, the CNN module 150′ may apply a feature detection window 362. In an example, the feature detection window 362 is shown on a portion of the layer 360a. A convolution operation may be applied by the CNN module 150′ on each of the layers 360a-360n using the feature detection window 362.”, para. 0120, “The layers 360a-360n may comprise convolution layers, pooling layers, non-linear layers and/or fully connected layers.”) as an Al model that detects only vehicle objects or detects only human objects (Broggi, the CNN that Broggi uses that can detect both person and vehicle has the same layers as the usual CNN that was used to detect just person or just vehicle).  

Regarding  claim 9, the combination of Eledath in view of Broggi teaches the medium of claim 7 (Eledath, para. 0116, “a non-transitory computer accessible medium such as memory, data storage, and/or processor hardware.”) wherein the one or more human objects (Eledath, Fig. 6, person 612) and the one or more vehicle objects (Eledath, Fig. 6, vehicle 608) are detected and processed using a single copy of the captured image (para. 0105, “the system 110 analyzes an image 606 of a real world scene 600 viewed through an AR device 604 of the user 602”, the single image is the image 606).

Regarding  claim 10, the combination of Eledath in view of Broggi teaches the medium of claim 7 (Eledath, para. 0116, “a non-transitory computer accessible medium such as memory, data storage, and/or processor hardware.”) wherein the detection function and single Al model (Broggi, para. 0038, “The block 150 may implement a convolutional neural network (CNN) module”, the single AI model is the CNN module, para. 0050, “The CNN module 150 may determine a likelihood that pixels in the video frames belong to a particular object and/or objects in response to the descriptors. For example, using the descriptors, the CNN module 150 may determine a likelihood that pixels correspond to a particular object (e.g., a person, a vehicle, a car seat, a tree, etc.) and/or characteristics of the object (e.g., a mouth of a person, a hand of a person, headlights of a vehicle, a branch of a tree, a seatbelt of a seat, etc.)”, the CNN module is capable of detecting the likelihood of a person or a car in an image) remain cached across iterations of the operations of claim 7 (Broggi, para. 0046, “the frame memory and/or buffer 144a may store (e.g., provide temporary storage and/or cache) one or more of the video frames (e.g., the video signal), Broggi teaches a cache for the memory, para. 0063, “The computer vision pipeline portion 162 may be configured to implement a computer vision algorithm in dedicated hardware. The computer vision pipeline portion 162 may implement a number of sub-modules designed to perform various calculations used to perform feature detection in images (e.g., video frames). Implementing sub-modules may enable the hardware used to perform each type of calculation to be optimized for speed and/or efficiency. For example, the sub-modules may implement a number of relatively simple operations that are used frequently in computer vision operations that, together, may enable the computer vision algorithm to be performed in real-time.”, the iteration of the process of detecting the vehicle and human is done in a pipeline manner, Broggi also teaches identifying the properties of the detected objects such as license number (Broggi, para. 0138) and age of the person (Broggi, para. 0087)).  
Eledath and Broggi are both considered to be analogous to the claimed invention because they are in the same field of object detection using artificial intelligence. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the medium as taught by Eledath to incorporate the teachings of Broggi wherein the detection function and single Al model remain cached across iterations of the method of claim 1. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been because it may enable the hardware used to perform each type of calculation to be optimized for speed and/or efficiency (Broggi, para. 0063).

Regarding claim 11, the combination of Eledath in view of Broggi teaches the medium of claim 7 (Eledath, para. 0116, “a non-transitory computer accessible medium such as memory, data storage, and/or processor hardware.”), wherein the vehicle object properties (para. 0105, “system 110 extracts visual features 608 and 612 and performs information retrieval based on semantic elements that the system 110 associates with the extracted visual features 608, 612. Based on the system 110's semantic understanding of the feature 608, the system 110 generates and displays virtual element 610, which identifies retrieved information about the vehicle depicted in the image.”, para. 0113, “the vehicle graphical overlay 1206 on the real world scene 1202 identifies a vehicle in the scene (from which the user can view certain characteristics of the vehicle, such as color or make/model) as well as it's spatial location within the scene 1202, including surrounding people and objects.”) include a license number (Eledath, “optical character recognition technology to read the car's license plate”), a type of vehicle (Eledath, Fig. 6, shows what kind of vehicle it is being a gray van), and a color of vehicle (Eledath, Fig. 6, the color of the van is determined) detected in the vehicle object (Eledath, Fig. 6, vehicle 608, gray van).  

Regarding claim 12, the combination of Eledath in view of Broggi teaches the medium of claim 7 (Eledath, para. 0116, “a non-transitory computer accessible medium such as memory, data storage, and/or processor hardware.”), wherein the human object properties (Eledath, para. 0105, “system 110 extracts visual features 608 and 612 and performs information retrieval based on semantic elements that the system 110 associates with the extracted visual features 608, 612. Based on the system 110's semantic understanding of the feature 608, the system 110 generates and displays virtual element 610, which identifies retrieved information about the vehicle depicted in the image. Based on the system 110's semantic understanding of the feature 612, the system 110 generates and displays virtual element 614, which identifies the person depicted in the image as well as employment information about the person.”, para. 0038, “the system might determine that a current real world scene includes a person with dark hair getting into a red car. The system 110 may use facial recognition to identify the person”) include an approximate age (Eledath, para. 0133, “The visual feature extraction technology tags the image associated with the query text “this vehicle” with the label “red sedan” and tags the image associated with the query text “this person” with the label “young male” and tags the image associated with the query text “this location” with the label “outdoor city.””, identifying the person is a young male is an approximate age), hair color (Eledath, para. 0038, “the system might determine that a current real world scene includes a person with dark hair getting into a red car”), and face landmarks (Eledath, para. 0038, “the system 110 might be able to detect (e.g., in a later frame of a video) the driver of the red car and may be able to determine the identity of the driver through facial recognition”, facial recognition uses facial landmarks) of the human detected in the human object (Eledath, Fig. 6, person 612).  
 
Regarding claim 13, Eledath teaches a data processing system (Fig. 1, computing system 110), comprising: 
a processor (Fig. 4, processor 412); and 
a memory para. 0116, “a non-transitory computer accessible medium such as memory, data storage, and/or processor hardware.”) coupled to the processor to store instructions, which when executed by the processor (para. 0161, “Embodiments may also be implemented as instructions stored using one or more machine-readable media, which may be read and executed by one or more processors”), cause the processor (Fig. 4, processor 412) to perform operations, the operations (para. 0001, “a method and/or apparatus for implementing a surveillance camera system looking at passing cars”) including: 
detecting one or more vehicle objects (Fig. 6 and para. 0105, gray van is detected in image 606, para. 0109, “the system 110 is able to detect and extract the vehicle from the image”) and one or more human objects (Fig. 6 and para. 0105, a person is detected in image 606, para. 0110, “general persons detected may be tagged with green overlays, but persons of interest may be tagged with red overlays”) in a received image (Fig. 6, para. 0105, image 606), using a detection function comprising an artificial intelligence (AI) model (para. 0137, “the system 110 analyzes video depicting a real world scene, extracts semantic elements from the visual scene, and generates a semantic understanding of the visual scene. To do this, the system 110 executes one or more computer vision algorithms, including object detection algorithms”, para. 0094, “illustrative platform 132 executes artificial intelligence technologies including computer vision”), each of the one or more vehicle objects (Fig. 6, vehicle 608, gray van) and one or more human objects (Fig. 6, 612, Jim Jones is detected in the image 606) corresponding to a portion of the image (Fig. 6, both the van and the person are portion of image 606); 
for each of the one or more vehicle objects (Fig. 6 and para. 0105, gray van is detected in image 606, para. 0109, “the system 110 is able to detect and extract the vehicle from the image”), processing the corresponding portion of the image (Fig. 6 image 600, vehicle 608)  to determine a plurality of properties of the vehicle object (para. 0105, “system 110 extracts visual features 608 and 612 and performs information retrieval based on semantic elements that the system 110 associates with the extracted visual features 608, 612. Based on the system 110's semantic understanding of the feature 608, the system 110 generates and displays virtual element 610, which identifies retrieved information about the vehicle depicted in the image.”, para. 0113, “the vehicle graphical overlay 1206 on the real world scene 1202 identifies a vehicle in the scene (from which the user can view certain characteristics of the vehicle, such as color or make/model) as well as it's spatial location within the scene 1202, including surrounding people and objects.”), and to generate annotations of the corresponding portion of the image with the plurality of properties of the vehicle object (Fig. 6, there is annotation with the properties of the vehicle such as who is the owner and the color of the vehicle); 
for each of the one or more human objects (Fig. 6 and para. 0105, a person is detected in image 606, para. 0110, “general persons detected may be tagged with green overlays, but persons of interest may be tagged with red overlays”), processing the corresponding portion of the image (Fig. 6 image 600, person 612) to determine a plurality of properties of the human object (para. 0105, “system 110 extracts visual features 608 and 612 and performs information retrieval based on semantic elements that the system 110 associates with the extracted visual features 608, 612. Based on the system 110's semantic understanding of the feature 608, the system 110 generates and displays virtual element 610, which identifies retrieved information about the vehicle depicted in the image. Based on the system 110's semantic understanding of the feature 612, the system 110 generates and displays virtual element 614, which identifies the person depicted in the image as well as employment information about the person.”, para. 0038, “the system might determine that a current real world scene includes a person with dark hair getting into a red car. The system 110 may use facial recognition to identify the person”), and to generate annotations of the corresponding portion of the image with the plurality of properties of the human object (Fig. 6, there is annotation with the properties of the person such as the name and the employment of the person); and 
transmitting the received image (Fig. 1, the input image is processed in the vision-based user interface platform 132 and transmitted to the application/services 132, para. 0036, “systems 110 that can be used to provide a dynamic, interactive, vision-based user interface to other applications or services of the computing system”), with the annotations of the one or more vehicle objects and one or more human objects (Fig. 6, image has annotations for the vehicle objects and human objects, para. 0141, “the system 110 may provide output (e.g., virtual element overlays and/or NL output) to one or more other applications/services (e.g., applications/services 134), by one or more display services 250, for example. In block 334, the system 110 may provide output (e.g., virtual element overlays and/or NL output) to one or more other applications/services (e.g., messaging, mapping, travel, social media), by one or more collaboration services 258, for example.”), to a service or application (Fig. 1, the input image is processed in the vision-based user interface platform 132 and transmitted to the application/services 132, para. 0036, “systems 110 that can be used to provide a dynamic, interactive, vision-based user interface to other applications or services of the computing system”) that utilizes the annotated image to perform a function of the service or application (para. 0103, “Portions of the platform 132 can act as a “front-end” to a number of applications/services 134, in some embodiments. The applications/services 134 may include, for example, a search engine, a messaging service, a social media application, a navigation tool, geographic mapping software, etc.”).  

Eledath does not expressly disclose detecting one or more vehicle objects and one or more human objects in a received image using a single detection function comprising a single artificial intelligence (AI) model.
	However, Broggi discloses detecting one or more vehicle objects and one or more human objects in a received image using a single detection function comprising a single artificial intelligence (AI) model (para. 0038, “The block 150 may implement a convolutional neural network (CNN) module”, the single AI model is the CNN module, para. 0050, “The CNN module 150 may determine a likelihood that pixels in the video frames belong to a particular object and/or objects in response to the descriptors. For example, using the descriptors, the CNN module 150 may determine a likelihood that pixels correspond to a particular object (e.g., a person, a vehicle, a car seat, a tree, etc.) and/or characteristics of the object (e.g., a mouth of a person, a hand of a person, headlights of a vehicle, a branch of a tree, a seatbelt of a seat, etc.)”, the CNN module is capable of detecting the likelihood of a person or a car in an image, para. 0082, “The processors 106a-106n may detect objects in the captured video signal of the exterior of a vehicle (e.g., automobiles, bicycles, pedestrians, animals, parking spaces, etc.) and/or of an interior of a vehicle (e.g., the driver 202, other occupants, physical characteristics of people in the vehicle, facial expressions of people in the vehicle, fields of view of the people in the vehicle, etc.).”).
Eledath and Broggi are both considered to be analogous to the claimed invention because they are in the same field of object detection using artificial intelligence. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data processing system as taught by Eledath to incorporate the teachings of Broggi of detecting one or more vehicle objects and one or more human objects in a received image using a single detection function comprising a single artificial intelligence (AI) model. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been because implementing the CNN module 150 as a dedicated hardware module of the processors 1may enable the apparatus to perform the computer vision operations locally (e.g., on-chip) without relying on processing capabilities of a remote device (e.g., communicating data to a cloud computing service) (Broggi, para. 0050).

Regarding claim 16, the combination of Eledath in view of Broggi teaches the system of claim 13 (Eledath, Fig. 1, computing system 110), wherein the single Al model (Broggi, Fig. 1, CNN 150) has a substantially same number of nodes and layers (Broggi, para. 0118, “To perform the training and/or the computer vision operations, the CNN module 150′ may generate a number of layers 360a-360n. On each one of the layers 360a-360n, the CNN module 150′ may apply a feature detection window 362. In an example, the feature detection window 362 is shown on a portion of the layer 360a. A convolution operation may be applied by the CNN module 150′ on each of the layers 360a-360n using the feature detection window 362.”, para. 0120, “The layers 360a-360n may comprise convolution layers, pooling layers, non-linear layers and/or fully connected layers.”) as an Al model that detects only vehicle objects or detects only human objects (Broggi, the CNN that Broggi uses that can detect both person and vehicle has the same layers as the usual CNN that was used to detect just person or just vehicle).  

Regarding claim 17, the combination of Eledath in view of Broggi teaches the system of claim 13 (Eledath, Fig. 1, computing system 110), wherein the one or more human objects (Eledath, Fig. 6, person 612) and the one or more vehicle objects (Eledath, Fig. 6, vehicle 608) are detected and processed using a single copy of the captured image (para. 0105, “the system 110 analyzes an image 606 of a real world scene 600 viewed through an AR device 604 of the user 602”, the single image is the image 606).

Regarding claim 18, the combination of Eledath in view of Broggi teaches the system of claim 13 (Eledath, Fig. 1, computing system 110), wherein the detection function and single Al model (Broggi, para. 0038, “The block 150 may implement a convolutional neural network (CNN) module”, the single AI model is the CNN module, para. 0050, “The CNN module 150 may determine a likelihood that pixels in the video frames belong to a particular object and/or objects in response to the descriptors. For example, using the descriptors, the CNN module 150 may determine a likelihood that pixels correspond to a particular object (e.g., a person, a vehicle, a car seat, a tree, etc.) and/or characteristics of the object (e.g., a mouth of a person, a hand of a person, headlights of a vehicle, a branch of a tree, a seatbelt of a seat, etc.)”, the CNN module is capable of detecting the likelihood of a person or a car in an image) remain cached across iterations of the operations of claim 13 (Broggi, para. 0046, “the frame memory and/or buffer 144a may store (e.g., provide temporary storage and/or cache) one or more of the video frames (e.g., the video signal), Broggi teaches a cache for the memory, para. 0063, “The computer vision pipeline portion 162 may be configured to implement a computer vision algorithm in dedicated hardware. The computer vision pipeline portion 162 may implement a number of sub-modules designed to perform various calculations used to perform feature detection in images (e.g., video frames). Implementing sub-modules may enable the hardware used to perform each type of calculation to be optimized for speed and/or efficiency. For example, the sub-modules may implement a number of relatively simple operations that are used frequently in computer vision operations that, together, may enable the computer vision algorithm to be performed in real-time.”, the iteration of the process of detecting the vehicle and human is done in a pipeline manner, Broggi also teaches identifying the properties of the detected objects such as license number (Broggi, para. 0138) and age of the person (Broggi, para. 0087)).  
Eledath and Broggi are both considered to be analogous to the claimed invention because they are in the same field of object detection using artificial intelligence. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system as taught by Eledath to incorporate the teachings of Broggi wherein the detection function and single Al model remain cached across iterations of the method of claim 1. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been because it may enable the hardware used to perform each type of calculation to be optimized for speed and/or efficiency (Broggi, para. 0063).

Regarding claim 19, the combination of Eledath in view of Broggi teaches the system of claim 13 (Eledath, Fig. 1, computing system 110), wherein the vehicle object properties (para. 0105, “system 110 extracts visual features 608 and 612 and performs information retrieval based on semantic elements that the system 110 associates with the extracted visual features 608, 612. Based on the system 110's semantic understanding of the feature 608, the system 110 generates and displays virtual element 610, which identifies retrieved information about the vehicle depicted in the image.”, para. 0113, “the vehicle graphical overlay 1206 on the real world scene 1202 identifies a vehicle in the scene (from which the user can view certain characteristics of the vehicle, such as color or make/model) as well as it's spatial location within the scene 1202, including surrounding people and objects.”) include a license number (Eledath, “optical character recognition technology to read the car's license plate”), a type of vehicle (Eledath, Fig. 6, shows what kind of vehicle it is being a gray van), and a color of vehicle (Eledath, Fig. 6, the color of the van is determined) detected in the vehicle object (Eledath, Fig. 6, vehicle 608, gray van).  

Regarding claim 20, the combination of Eledath in view of Broggi teaches the system of claim 13 (Eledath, Fig. 1, computing system 110), wherein the human object properties (Eledath, para. 0105, “system 110 extracts visual features 608 and 612 and performs information retrieval based on semantic elements that the system 110 associates with the extracted visual features 608, 612. Based on the system 110's semantic understanding of the feature 608, the system 110 generates and displays virtual element 610, which identifies retrieved information about the vehicle depicted in the image. Based on the system 110's semantic understanding of the feature 612, the system 110 generates and displays virtual element 614, which identifies the person depicted in the image as well as employment information about the person.”, para. 0038, “the system might determine that a current real world scene includes a person with dark hair getting into a red car. The system 110 may use facial recognition to identify the person”) include an approximate age (Eledath, para. 0133, “The visual feature extraction technology tags the image associated with the query text “this vehicle” with the label “red sedan” and tags the image associated with the query text “this person” with the label “young male” and tags the image associated with the query text “this location” with the label “outdoor city.””, identifying the person is a young male is an approximate age), hair color (Eledath, para. 0038, “the system might determine that a current real world scene includes a person with dark hair getting into a red car”), and face landmarks (Eledath, para. 0038, “the system 110 might be able to detect (e.g., in a later frame of a video) the driver of the red car and may be able to determine the identity of the driver through facial recognition”, facial recognition uses facial landmarks) of the human detected in the human object (Eledath, Fig. 6, person 612).  

Claims 14 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Eledath in view of Broggi and in further view of Selinger et al. (US 20180307912 A1), hereinafter referred to as Selinger.

Regarding claim 14, the combination of Eledath in view of Broggi teaches the system of claim 13 (Eledath, Fig. 1, computing system 110).

The combination of Eledath in view of Broggi does not expressly disclose an Al accelerator that includes a central processing unit (CPU) and at least one of a graphics processing unit (GPU) or a visual processing unit (VPU). 
	However, Selinger teaches an Al accelerator (para. 0072, “I accelerators that may function as a VPU include, for example, IBM TrueNorth (neuromorphic processor aimed at sensor data pattern recognition and intelligence tasks including video), and Qualcomm Zeroth Neural processing unit (a sensor/AI oriented chip). Other useful processors with VPU functionality include Adapteva Epiphany (a manycore processor with similar emphasis on on-chip dataflow, focused on 32 bit floating point performance), CELL (a multicore processor with features consistent with vision processing units, incl. SIMD instructions & datatypes suitable for video, and on-chip DMA between scratchpad memories), Digital signal processors (designed to work with real-time data streams), OpenCL framework for parallel computing, Multiprocessor system-on-chip (MPSoC), Coprocessors to supplement the CPU in graphics and related operations, Physics processing unit (complements CPU and GPU with a high throughput accelerator)”) that includes a central processing unit (CPU) and at least one of a graphics processing unit (GPU) or a visual processing unit (VPU) (para. 0067, “computing device can generally be comprised of a Central processing Unit (CPU, 301) with one or more vision processing unit (VPU, 302), or alternatively a functionally equivalent image processing “accelerator” (e.g. video processing unit, integrated or dedicated graphics processing unit (GPU) or similar, optimized for image processing speed),”). 
Selinger is considered to be analogous to the claimed invention because they are in the same field of object detection using artificial intelligence. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data processing system as taught by the combination of Eledath in view of Broggi to incorporate the teachings of Selinger of an Al accelerator that includes a central processing unit (CPU) and at least one of a graphics processing unit (GPU) or a visual processing unit (VPU). Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been optimized image processing speed (Selinger, para. 0067).

Regarding claim 15, the combination of Eledath in view of Broggi in further view of Selinger teaches the system of claim 14 (Eledath, Fig. 1, computing system 110), wherein the Al accelerator (Selinger, para. 0072, “I accelerators that may function as a VPU include, for example, IBM TrueNorth (neuromorphic processor aimed at sensor data pattern recognition and intelligence tasks including video), and Qualcomm Zeroth Neural processing unit (a sensor/AI oriented chip). Other useful processors with VPU functionality include Adapteva Epiphany (a manycore processor with similar emphasis on on-chip dataflow, focused on 32 bit floating point performance), CELL (a multicore processor with features consistent with vision processing units, incl. SIMD instructions & datatypes suitable for video, and on-chip DMA between scratchpad memories), Digital signal processors (designed to work with real-time data streams), OpenCL framework for parallel computing, Multiprocessor system-on-chip (MPSoC), Coprocessors to supplement the CPU in graphics and related operations, Physics processing unit (complements CPU and GPU with a high throughput accelerator)”) further comprises an image capture device (Selinger, para. 0067, “Useful examples include, but are not limited to, personal computers, smart cameras/vision sensors”, Selinger’s AI accelerator includes smart camera/vision sensors which is an image capture device).  

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENISE G ALFONSO whose telephone number is (571)272-1360. The examiner can normally be reached Monday - Friday 7:30 - 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire Wang can be reached on 571-270-1051. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/DENISE G ALFONSO/Examiner, Art Unit 2663                                                                                                                                                                                                        
/CLAIRE X WANG/Supervisory Patent Examiner, Art Unit 2663