Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 4/29/2022 has been entered.
 
Response to Arguments
Applicant’s arguments submitted on 4/29/2022 have been fully considered.  Applicant argues that the prior art does not disclose the newly added amendments to the independent claims.  Examiner cites new prior art herein below in response to Applicant’s amendments.
Previously cited Tu discloses a computer-implemented method (see Tu Figs. 1-8, and paras. 0069-0077, where a computer and memory executing programming is disclosed), the method comprising: receiving live frames of media content recorded by a camera system of a media source (see Tu Figs. 1-8, and paras. 0026-0028 and 0075, where training is performed on input images for various devices including, for example, an immersive augmented reality system; and also a camera system); for each live frame of media content, identifying a presence of one or more unknown objects in the live frame using a classifier that is unable to identify a type of each unknown object (see Tu Figs. 1-8, and paras. 0029-0031, where new objects from unknown classes are detected, localized, and discovered); for each present unknown object: responsive to identifying the unknown object, accessing media content recorded by the media source; generating a novel detector based on the features of the unknown object corresponding to the portions of the live frames of media content, the detector configured to output a confidence score indicating a likelihood that the unknown object is present within a frame of media content (see Tu Figs. 1-6, and paras. 0018, 0039, 0040, 0047-0050, where object probabilities and likelihoods are calculated for images based on visual saliency and other visual image features; see also paras. 0022 and 0023, where saliency windows 208 and 210 are used to identify the visual saliency features for training).
Newly cited DeAngelus discloses receiving, via a user interface in which the live frames of media content are displayed, a selection of portions of the live frames of media content that correspond to features of the unknown object and selected portions (see DeAngelus Fig. 6, and paras. 0040, 0062, and 0063, corresponding to pgs. 9 and 17 of the provisional application, where the algorithm can operate in “real-time” and “. . . the jump back process 600 may receive a user input corresponding to a user drawing a bounding box 607 around or partially around an image corresponding to object 605 in image chip 609”).
It would have been obvious to one of ordinary skill in the art at the time of filing to use the user interface of DeAngelus to organize and improve the training data of Tu by confirming the location in the image of the most salient features of the known and unknown objects, because it is predictable that providing supervised learning would improve the accuracy of the object detector by providing additional data for discriminating the features for each object detector.  Any errors made by the salient feature detector can be immediately corrected by the user thereby also providing a predictable time savings.
Previously cited Sivic discloses accessing media content previously recorded by the camera system of the media source (see Sivic pg. 595, where all frames and descriptors are previously stored); applying the novel detector to the accessed media content to identify previous appearances of the unknown object in frames of the accessed media content; and generating, for presentation to a user, a user interface identifying one or more frames of the accessed media content in which at least one of the unknown objects is present and a location of each unknown object present in each identified frame (see Sivic Abstract, and pgs. 595 and 599, and Figs. 6 and 11, where a user interface presents video frames from a video to a user to alert them via highlighting instances of a detected object in those video frames).
It would have been obvious to one of ordinary skill in the art at the time of filing to use the user interface of Sivic to apply the detector of Tu, as modified by DeAngelus, to all the previously stored image data of the immersive augmented reality system and/or camera system and thereby analyze the results, because it is predictable that doing so would improve the performance of Tu’s object detector by identifying any errors in the detection on the stored data and thereby determine any necessary adjustments to the object detector.
It would also have been obvious to one of ordinary skill in the art at the time of filing to apply the training of Tu’s algorithm to the live images of the various image capturing systems of Tu, including the immersive augmented reality system and/or camera system, because it is predictable that the user would benefit from being able to immediately choose and/or create the specific training data that is most relevant for the user’s immediate application, thereby predictably improving the object detector’s performance.
 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim(s) 1, 2, 4, 5, 7, 10, 11, 13, 14, 16, 19, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tu et al., US 2014/0140610 A1 (hereinafter referred to as “Tu”) in view of DeAngelus et al., US 2020/0374491 A1 (hereinafter referred to as “DeAngelus”) and Sivic, Josef, and Andrew Zisserman. "Efficient visual search of videos cast as text retrieval." IEEE transactions on pattern analysis and machine intelligence 31.4 (2008): 591-606 (hereinafter referred to as “Sivic”). 

Regarding claim 1, Tu discloses a computer-implemented method (see Tu Figs. 1-8, and paras. 0069-0077, where a computer and memory executing programming is disclosed), the method comprising: receiving live frames of media content recorded by a camera system of a media source (see Tu Figs. 1-8, and paras. 0026-0028 and 0075, where training is performed on input images for various devices including, for example, an immersive augmented reality system; and also a camera system); for each live frame of media content, identifying a presence of one or more unknown objects in the live frame using a classifier that is unable to identify a type of each unknown object (see Tu Figs. 1-8, and paras. 0029-0031, where new objects from unknown classes are detected, localized, and discovered); for each present unknown object: responsive to identifying the unknown object, accessing media content recorded by the media source; generating a novel detector based on the features of the unknown object corresponding to the portions of the live frames of media content, the detector configured to output a confidence score indicating a likelihood that the unknown object is present within a frame of media content (see Tu Figs. 1-6, and paras. 0018, 0039, 0040, 0047-0050, where object probabilities and likelihoods are calculated for images based on visual saliency and other visual image features; see also paras. 0022 and 0023, where saliency windows 208 and 210 are used to identify the visual saliency features for training).
Tu does not explicitly disclose receiving, via a user interface in which the live frames of media content are displayed, a selection of portions of the live frames of media content that correspond to features of the unknown object and selected portions; accessing media content previously recorded by the camera system of the media source; applying the novel detector to the accessed media content to identify previous appearances of the unknown object in frames of the accessed media content; and generating, for presentation to a user, a user interface identifying one or more frames of the accessed media content in which at least one of the unknown objects is present and a location of each unknown object present in each identified frame.
However, DeAngelus discloses receiving, via a user interface in which the live frames of media content are displayed, a selection of portions of the live frames of media content that correspond to features of the unknown object and selected portions (see DeAngelus Fig. 6, and paras. 0040, 0062, and 0063, corresponding to pgs. 9 and 17 of the provisional application, where the algorithm can operate in “real-time” and “. . . the jump back process 600 may receive a user input corresponding to a user drawing a bounding box 607 around or partially around an image corresponding to object 605 in image chip 609”).
It would have been obvious to one of ordinary skill in the art at the time of filing to use the user interface of DeAngelus to organize and improve the training data of Tu by confirming the location in the image of the most salient features of the known and unknown objects, because it is predictable that providing supervised learning would improve the accuracy of the object detector by providing additional data for discriminating the features for each object detector.  Any errors made by the salient feature detector can be immediately corrected by the user thereby also providing a predictable time savings.
Furthermore, Sivic discloses accessing media content previously recorded by the camera system of the media source (see Sivic pg. 595, where all frames and descriptors are previously stored); applying the novel detector to the accessed media content to identify previous appearances of the unknown object in frames of the accessed media content; and generating, for presentation to a user, a user interface identifying one or more frames of the accessed media content in which at least one of the unknown objects is present and a location of each unknown object present in each identified frame (see Sivic Abstract, and pgs. 595 and 599, and Figs. 6 and 11, where a user interface presents video frames from a video to a user to alert them via highlighting instances of a detected object in those video frames).
It would have been obvious to one of ordinary skill in the art at the time of filing to use the user interface of Sivic to apply the detector of Tu, as modified by DeAngelus, to all the previously stored image data of the immersive augmented reality system and/or camera system and thereby analyze the results, because it is predictable that doing so would improve the performance of Tu’s object detector by identifying any errors in the detection on the stored data and thereby determine any necessary adjustments to the object detector.
It would also have been obvious to one of ordinary skill in the art at the time of filing to apply the training of Tu’s algorithm to the live images of the various image capturing systems of Tu, including the immersive augmented reality system and/or camera system, because it is predictable that the user would benefit from being able to immediately choose and/or create the specific training data that is most relevant for the user’s immediate application, thereby predictably improving the object detector’s performance.

Claims 11 and 20 are rejected under the same analysis as claim 1 above.

Regarding claim 2, Tu does not explicitly disclose wherein the live frames of media content are assigned a source label identifying the media source and a history of media content is assigned the same source label.
However, Sivic discloses wherein the live frames of media content are assigned a source label identifying the media source and a history of media content is assigned the same source label (see Sivic Abstract, and pgs. 595 and 599, and Figs. 6 and 11, where all the frames of the video have the same source label “Groundhog Day”).
It would have been obvious to one of ordinary skill in the art at the time of filing to use the labeling technique of Sivic to label the video frames of Tu, because it is predictable that doing so would improve Tu’s data by making the data more organized so that related video frames can be found by sorting by a common label.

Regarding claim 4, Tu discloses wherein identifying the presence of one or more unknown objects in the frames of media content comprises: receiving, from the user, a request to search the live frames of media content for preferred content, the preferred content representing visual features of interest to the user; applying one or more detectors to the live frames of media content to search for the preferred content in the live frames of media content, each detector of the one or more detectors configured to identify preferred content in the live frames of media content; and identifying, by the one or more detectors, a presence of one or more unknown objects in the frames of media content, wherein each unknown object represents a type of content the one or more detectors is unable to identify (see Tu Figs. 1-8, and paras. 0014-0017, 0026-0028 and 0075, where training is performed on input images for various devices including, for example, an immersive augmented reality system, and the user may specify a specific “object of interest” to search for, such as a “book”).

Claim 13 is rejected under the same analysis as claim 4 above.

Regarding claim 5, Tu discloses further comprising: for each frame of the live frames of media content in which an unknown object is present, determining a location of identified preferred content in the frame; and determining a location of the unknown object in the frame (see Tu paras. 0017 and 0030-0032, where all objects are located in the image frame).
Sivic also discloses further comprising: for each frame of the live frames of media content in which an unknown object is present, determining a location of identified preferred content in the frame; and determining a location of the unknown object in the frame (see Sivic Abstract, and pgs. 595 and 599, and Figs. 6 and 11, where all objects are located in the image frame).

Claim 14 is rejected under the same analysis as claim 5 above.

Regarding claim 7, Tu discloses wherein generating the novel detector based on visual features of the unknown object comprises: for each identified unknown object, inputting the live frame in which the unknown object is present to a machine learned model to extract one or more visual features of the unknown object from the live frame of media content; and training the novel detector to classify the unknown object based on the extracted visual features, the classification based on the confidence score (see Tu Figs. 1-6, and paras. 0018, 0039, 0040, 0047-0050, where object probabilities and likelihoods are calculated for images based on visual saliency and other visual image features).

Claim 16 is rejected under the same analysis as claim 7 above.

Regarding claim 10, Tu does not explicitly disclose further comprising: determining a set of analytics based on a number of appearances of the unknown object in the live frames and a history of media content; and updating the user interface with an analytics interface element listing the determined set of analytics.
However, Sivic discloses further comprising: determining a set of analytics based on a number of appearances of the unknown object in the live frames and a history of media content; and updating the user interface with an analytics interface element listing the determined set of analytics (see Sivic Abstract, and pgs. 595 and 599, and Figs. 6 and 11, where a user interface presents video frames with their relevance analytics to a user to alert them via highlighting with a yellow box to instances of a detected object in those video frames).

Claim 19 is rejected under the same analysis as claim 10 above.

Claim(s) 3 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tu in view of DeAngelus and Sivic as applied to claims 1 and 11 above, and in further view of Zheng et al., US 2019/0080172 A1 (hereinafter referred to as “Zheng”). 

Regarding claim 3, Tu does not explicitly disclose further comprising: responsive to identifying the unknown object, generating an alert, wherein the alert describes that the unknown object is present in the live frames of media content and identifies the media source and a timestamp at which the unknown object was identified; and presenting the alert to the user via the user interface.
However, Sivic discloses further comprising: responsive to identifying the unknown object, generating an alert, wherein the alert describes that the unknown object is present in the live frames of media content and identifies the media source and a frame number at which the unknown object was identified; and presenting the alert to the user via the user interface (see Sivic Abstract, and pgs. 595 and 599, and Figs. 6 and 11, where a user interface presents video frames with their frame numbers from a video to a user to alert them via highlighting with a yellow box to instances of a detected object in those video frames).
Furthermore, Zheng discloses live frames of media content and a timestamp (see Zheng para. 0076, where “[t]he camera platform manager module 116, for instance, may receive digital images 114 from a digital camera 112 from a live feed in real time, from a storage device, and so on.  The digital images 114 may have associated metadata that describes "when" respective digital images are captured, e.g., a timestamp”)
It would have been obvious to one of ordinary skill in the art at the time of filing to apply the timestamp of Zheng to the images of Tu as modified by DeAngelus and Sivic, because it is predictable that the addition of precise timestamps would assist users in identifying the unknown object and/or person by providing the user a time window during which the object and/or person was present at a location near where the images are captured, thereby narrowing down the available possibilities of the object and/or person’s identity.

Claim 12 is rejected under the same analysis as claim 3 above.

Claim(s) 6 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tu in view of DeAngelus and Sivic as applied to claims 4 and 13 above, and in further view of Chadha et al., US 2021/0150282 A1 (hereinafter referred to as “Chadha”). 

Regarding claim 6, Tu discloses wherein each detector of the one or more detectors is configured to output a confidence score indicating a likelihood that a type of content is present within a frame of the live frames of media content (see Tu Figs. 1-6, and paras. 0018, 0039, 0040, 0047-0050, where object probabilities and likelihoods are calculated for images based on visual saliency and other visual image features).
Tu does not explicitly disclose the method further comprising: classifying objects assigned greater than a threshold score as preferred content present in the frame; and classifying objects assigned confidence scores less than the threshold score as unknown objects present in the frame.
However, Chadha discloses the method further comprising: classifying objects assigned greater than a threshold score as preferred content present in the frame; and classifying objects assigned confidence scores less than the threshold score as unknown objects present in the frame (see Chadha para. 0054, where “[t]he object identification component 445 may identify, for each confidence score from the set of confidence scores identified as satisfying the confidence score threshold, the corresponding object as a detected object from the input image”).
It would have been obvious to one of ordinary skill in the art at the time of filing to use the thresholding technique of Chadha on the object detector of Tu as modified by DeAngelus and Sivic, because it is predictable that doing so would improve the accuracy of the results by ensuring that only known objects without sufficient likelihood and/or confidence are identified as such, and those objects without sufficient likelihood and/or confidence are labeled unknown and subjected to the new object clustering and discovery.

Claim 15 is rejected under the same analysis as claim 6 above.

Claim(s) 9 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tu in view of DeAngelus and Sivic as applied to claims 1 and 11 above, and in further view of Hazanovich et al., US 2016/0349930 A1 (hereinafter referred to as “Hazanovich”). 

Regarding claim 9, Tu does not explicitly disclose wherein the user interface presented to the user comprises: a progress bar interface element segmented into frames of the live media content and a history of media content, wherein each frame in which an unknown object is present is a marked with an alert interface element; and a display interface element for presenting a frame with highlighted markings of locations of unknown objects present in the frame, wherein the frame is presented at the display interface element in response to a selection of the frame from the progress bar interface element.
However, Sivic discloses wherein the user interface presented to the user comprises: a progress bar interface element segmented into frames of the live media content and a history of media content, wherein each frame in which an unknown object is present is a marked with an alert interface element; and a display interface element for presenting a frame with highlighted markings of locations of unknown objects present in the frame (see Sivic Abstract, and pgs. 595 and 599, and Figs. 6 and 11, where a user interface presents a timeline progress bar with video frames with their relevance analytics to a user to alert them via highlighting with a yellow box to instances of a detected object in those video frames).
Furthermore, Hazanovich discloses wherein the frame is presented at the display interface element in response to a selection of the frame from the progress bar interface element (see Hazanovich paras. 0009, 0015, and 0021, where a selection is used to display frames from the video). 
It would have been obvious to one of ordinary skill in the art at the time of filing to use the selection interface technique of Hazanovich to select the video frames of Tu as modified by DeAngelus and Sivic, because it is predictable that users would benefit from being able to select and modify the video frames in order to further analyze more video frames around the detected relevant frames in order to confirm and/or find the most relevant frame for the unknown object.

Claim 18 is rejected under the same analysis as claim 9 above.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW M MOYER whose telephone number is (571)272-9523. The examiner can normally be reached Monday-Friday 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire Wang can be reached on (571)270-1051. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ANDREW M MOYER/             Primary Examiner, Art Unit 2663