DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 02/28/2022 has been entered.
 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1, 8, 15-16, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gordon (US 20170323158) in view of Osotio et al (US 20190096105).
Regarding claim 1, Gordon discloses a method comprising:
receiving, at a server over a network from a remote device (¶58-59 & Fig. 5 , the computing device 502 may communicate via the one or more networks 504 with an electronic device 506 associated with an individual 508), an image and spatial information about the image (¶51 & ¶101 the process 600 includes receiving input data including at least one of audible input, visual input, or sensor input and at 610, the process 600 includes determining that the input data corresponds to a request to identify the object; ¶156-159 the computing device architecture 1400 is applicable to any of the clients shown in FIGS. 1, 2, 5, 12, and 13; the processor 1402 may additionally or alternatively comprise a holographic processing unit (HPU) which is designed specifically to process and integrate data from multiple sensors of a head mounted computing device and to handle tasks such as spatial mapping, gesture recognition, and voice and speech recognition);
determining, by the server, a context of the image (¶77 a context of the object to be identified);
detecting, by the server using an object detection algorithm and the spatial information, one or more objects within the image (¶77 a description of other objects proximate to the object to be identified, one or more images of the object to be identified, one or more images of a scene including the object to be identified, or combinations thereof; ¶156-159 process and integrate data from multiple sensors of a head mounted computing device and to handle tasks such as spatial mapping, gesture recognition, and voice and speech recognition);
comparing, by the server, each of the one or more objects with the image context (¶78 the additional features included in the image may be used to identify a person or may be inappropriate for some individuals to view) and
Gordon fails to specifically teach selectively modifying, by the server, each of the one or more objects in the image that does not relate to the image context by replacing each of the one or more objects with a generic version of each of the one or more objects, each of the generic versions having a size and perspective comparable to their respective replaced objects.
Osotio teaches selectively modifying, by the server, each of the one or more objects in the image that does not relate to the image context by replacing each of the one or more objects with a generic version of each of the one or more objects (¶87-88 a placeholder object can be presented, that lets the user know that augmented content is available and as the user focuses on the placeholder object, the augmented content is presented, at a selected rendering fidelity), each of the generic versions having a size and perspective comparable to their respective replaced objects (¶87-89 Selecting a rendering fidelity thus comprises selecting one or more characteristics such as transparency, color, resolution, how much of the content is rendered, size, and so forth that will be used to render the object. Which combination can be based on user preference, for example, the user may specify that they prefer resolution adjustment for initial presentation of augmented content or that they prefer transparency adjustment for initial presentation of augmented content).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of modifying, by the server, each of the one or more objects in the image that does not relate to the image context by replacing each of the one or more objects with a generic version of each of the one or more objects, each of the generic versions having a size and perspective comparable to their respective replaced objects from Osotio into the method as disclosed by Gordon. The motivation for doing this is to improve the user experience, thus improving the efficiency and effectiveness of the system.

Regarding claims 8 (drawn to a CRM):                  
The proposed combination of Gordon and Osotio, explained in the rejection of method claim 1 renders obvious the steps of the computer readable medium of claim 8 because these steps occur in the operation of the proposed rejection as discussed above. Thus, the arguments similar to that presented above for claim 1 is equally applicable to claim 8. See further Gordon ¶61.

Regarding claim 15, Gordon discloses an apparatus, comprising: 
an object detector (Fig. 5 processor 510);
a context determiner (Fig. 5 processor 510); and 
an object replacer (Fig. 5 processor 510), 
wherein: 
the object detector is to detect, using an object detection algorithm, one or more objects from a video and spatial information about the video (¶21 the system may capture visual input, such as an image or video; ¶51 & ¶101 the process 600 includes receiving input data including at least one of audible input, visual input, or sensor input and at 610, the process 600 includes determining that the input data corresponds to a request to identify the object; ¶156-159 the computing device architecture 1400 is applicable to any of the clients shown in FIGS. 1, 2, 5, 12, and 13; the processor 1402 may additionally or alternatively comprise a holographic processing unit (HPU) which is designed specifically to process and integrate data from multiple sensors of a head mounted computing device and to handle tasks such as spatial mapping, gesture recognition, and voice and speech recognition), 
the context determiner is to determine a context of the video from an associated audio (¶77 a context of the object to be identified), and 
the object replacer is to compare each of the one or more objects with the video context (¶78 the additional features included in the image may be used to identify a person or may be inappropriate for some individuals to view).
Gordon fails to specifically teach selectively modify each of the one or more objects in the video that does not relate to the video context by replacing each of the one or more objects with a generic version of each of the one or more objects, each of the generic versions having a size and perspective comparable to their respective replaced objects..
Osotio teaches selectively modify each of the one or more objects in the video that does not relate to the video context by replacing each of the one or more objects with a generic version of each of the one or more objects (¶87-88 a placeholder object can be presented, that lets the user know that augmented content is available and as the user focuses on the placeholder object, the augmented content is presented, at a selected rendering fidelity), each of the generic versions having a size and perspective comparable to their respective replaced objects (¶87-89 Selecting a rendering fidelity thus comprises selecting one or more characteristics such as transparency, color, resolution, how much of the content is rendered, size, and so forth that will be used to render the object. Which combination can be based on user preference, for example, the user may specify that they prefer resolution adjustment for initial presentation of augmented content or that they prefer transparency adjustment for initial presentation of augmented content).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of modifying, by the server, each of the one or more objects in the image that does not relate to the image context by replacing each of the one or more objects with a generic version of each of the one or more objects, each of the generic versions having a size and perspective comparable to their respective replaced objects from Osotio into the method as disclosed by Gordon. The motivation for doing this is to improve the user experience, thus improving the efficiency and effectiveness of the system.

Regarding claim 16, Gordon discloses the apparatus of claim 15, wherein the apparatus is a mobile device (¶58-59). 

Regarding claim 20, Gordon discloses the apparatus of claim 15, wherein the object detector is to detect one or more objects from the video with reference to an object library (¶23 compare the characteristics of the object in the scene to characteristics of objects in the database). 

Claims 3-4, 10-11 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gordon and Osotio as applied to claims 1, 8 and 15 above, and further in view of Shoemake et al (US 20150070516).
Regarding claim 3, the combination of Gordon and Osotio discloses the method of claim 1, but fails to teach capturing audio with the image and wherein determining the context of the image comprises extracting, from the audio, one or more keywords.
Shoemake teaches capturing audio with the image (¶149 the presence detection device might comprise a video input interface to receive video input from the local content source, an audio input interface to receive audio input from the local content source), and wherein determining the context of the image comprises extracting, from the audio, one or more keywords (¶158 analyzing the first media content to identify specific video content, image content, game content, audio content, etc. that are indicated in a database as being potentially objectionable). 
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of capturing audio with the image and wherein determining the context of the image comprises extracting, from the audio, one or more keywords from Shoemake into the method as disclosed by the combination of Gordon and Osotio. The motivation for doing this is to improve the effectiveness of future content filtering.

Regarding claim 4, the combination of Gordon, Osotio and Shoemake discloses the method of claim 3, wherein comparing each of the one or more objects with the image context comprises identifying each of the one or more objects, and comparing each of the one or more objects with each of the one or more keywords (Shoemake ¶158 analyzing the first media content to identify specific video content, image content, game content, audio content, etc. that are indicated in a database as being potentially objectionable). 
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein comparing each of the one or more objects with the image context comprises identifying each of the one or more objects, and comparing each of the one or more objects with each of the one or more keywords from Shoemake into the method as disclosed by the combination of Gordon and Osotio. The motivation for doing this is to improve the effectiveness of future content filtering.

Regarding claims 10-11 (drawn to a CRM):                  
The proposed combination of Gordon, Osotio and Shoemake, explained in the rejection of method claims 3-4 renders obvious the steps of the computer readable medium of claims 10-11 because these steps occur in the operation of the proposed rejection as discussed above. Thus, the arguments similar to that presented above for claims 3-4 are equally applicable to claims 10-11. See further Gordon ¶61.

Claims 5-6 & 12-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gordon, Osotio and Shoemake as applied to claims 4 and 11 above, and further in view of Delaney et al (US 20190139576).
Claims 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gordon and Osotio as applied to claim 15 above, and further in view of Delaney et al (US 20190139576).
Regarding claim 5, the combination of Gordon, Osotio and Shoemake discloses the method of claim 4, but fails to teach assigning a weight to each of the one or more keywords, and wherein comparing each of the one or more objects with the image context comprises assigning a weight to each of the one or more objects based upon the weight of each of the one or more keywords that are relevant to each of the one or more objects. 
Delaney teaches assigning a weight to each of the one or more keywords (¶33 the present system uses a confidence score between NLU processing and image recognition processing to assign a tag to video content. For example, as the video content progresses, the video content tagging device continually generates an audio and video confidence score), and wherein comparing each of the one or more objects with the image context comprises assigning a weight to each of the one or more objects based upon the weight of each of the one or more keywords that are relevant to each of the one or more objects (¶33-34 the present system uses a confidence score between NLU processing and image recognition processing to assign a tag to video content. For example, as the video content progresses, the video content tagging device continually generates an audio and video confidence score. Once the audio and video confidence score cross their respective threshold values, the video content tagging device assigns the tag to the video content. The video content tagging device de-assigns the tag to the video content when the audio and video confidence score falls below their respective threshold values.).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of assigning a weight to each of the one or more keywords, and wherein comparing each of the one or more objects with the image context comprises assigning a weight to each of the one or more objects based upon the weight of each of the one or more keywords that are relevant to each of the one or more objects from Delaney into the method as disclosed by the combination of Gordon, Osotio and Shoemake. The motivation for doing this is to improve corroborating video data with audio data to automatically tag video content.

Regarding claim 6, the combination of Gordon, Osotio, Shoemake and Delaney discloses the method of claim 5, wherein selectively modifying each of the one or more objects that does not relate to the image context comprises selectively modifying each of the one or more objects that has a weight that falls below a predetermined threshold (Delaney ¶33-35 The video content tagging device de-assigns the tag to the video content when the audio and video confidence score falls below their respective threshold values; de-assigning may refer modifying the image of the car in the video image by removing the generic highlight, such as the box, around the image of the car). 
	Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein selectively modifying each of the one or more objects that does not relate to the image context comprises selectively modifying each of the one or more objects that has a weight that falls below a predetermined threshold from Delaney into the method as disclosed by the combination of combination of Gordon, Osotio and Shoemake. The motivation for doing this is to improve corroborating video data with audio data to automatically tag video content.

Regarding claims 12-13 (drawn to a CRM):                  
The proposed combination of Gordon, Osotio, Shoemake and Delaney, explained in the rejection of method claims 5-6 renders obvious the steps of the computer readable medium of claims 12-13 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claims 5-6 are equally applicable claims 12-13. See further Shoemake ¶124.

Regarding claim 18, the combination of Gordon and Osotio discloses the apparatus of claim 15, but fails to teach wherein the context determiner is to determine the context of the video from the associated audio with an automated speech recognition routine. 
Delaney teaches wherein the context determiner is to determine the context of the video from the associated audio with an automated speech recognition routine (¶42 Based on receiving the video stream 92, the audio analyzing module 72 analyzes the audio data 94 in the video stream 92 using NLU processing. The audio analyzing module 72 determines a candidate audio tag in the video stream 92 based on NLU processing of the audio data 94). 
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein the context determiner is to determine the context of the video from the associated audio with an automated speech recognition routine from Delaney into the method as disclosed by the combination of Gordon and Osotio. The motivation for doing this is to improve corroborating video data with audio data to automatically tag video content.

Regarding claim 19, the combination of Gordon, Osotio and Delaney discloses the apparatus of claim 18, wherein the context determiner is to further determine the context of the video from the associated audio with a non-speech recognition routine (Gordon ¶34 the individual 104 may provide one or more sounds, one or more words, one or more gestures, or combinations thereof to indicate a request to identify an object within the scene 102. The computing device 112 may analyze the input from the individual 104 and determine that a request is being provided by the individual 104 to identify one or more objects within the scene). 

Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 3-6, 8, 10-13, 15-16, 18-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEVIN KY whose telephone number is (571)272-7648. The examiner can normally be reached Monday-Friday 9-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached on 571-272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KEVIN KY/Primary Examiner, Art Unit 2669