DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of the Claims
Claims 1-4, and 6-13 remain pending in the application.
Claim 13 is newly added.
Claim 8 is objected to.
Claims 1-4, and 6-13 are rejected under 35 U.S.C. 112(a).
Claims 1-4, and 6-13 are rejected under 35 U.S.C. 112(b).
Claims 1-4, and 7-13 are rejected under 35 U.S.C. 103.

Response to Amendment and Arguments
The amendment filed 6/27/2022 has been entered. 
Applicant’s arguments with respect to the rejection of claim 8 under 35 U.S.C. 112(a) and (b) as failing to disclose any hardware for performing the claimed functional limitations, have been fully considered, but they are not persuasive. In response to Applicant’s argument that the following portion of the disclosure “The information database is hosted, in one embodiment, in Azure SQL DB, while the rest of the components are hosted in virtual machines in the cloud”, recites sufficient hardware for performing the claimed functional limitations, Examiner respectfully disagrees. The broadest reasonable interpretation of virtual machines in the cloud encompasses software. Therefore, it is maintained that the disclosure fails to disclose any hardware for performing the claimed functional limitations.

Applicant’s arguments with respect to the rejection of claim 1-4, and 6-12 under 35 U.S.C. 112(b) have been fully considered and are persuasive in part. In response to Applicant’s arguments that the amendments resolve the indefiniteness concerns for the following limitations found in claims 8 and 10: “search for one of the identified extracted frames”, “search for one of the identified extracted frames having a defined environment and/or location that appears identical to the environment and/or location of the given unidentified frame,” and the following limitation found in claim 10 “wherein said image recognition database is trained through deep learning techniques”, Examiner agrees. However, the amendments introduce new indefiniteness issues in the claims (see below). Therefore, the 35 U.S.C. 112(b) rejection of claims 1-4, and 6-12 is maintained.

Applicant’s arguments with respect to the rejection of claim 6 under 35. U.S.C. 103 have been fully considered and are persuasive. Therefore, the rejection has been withdrawn.

Applicant’s arguments with respect to the rejection of claims 1-4, and 7-12 under 35. U.S.C. 103 have been fully considered, but are not persuasive.
In response to Applicant’s argument that the combination of Tseng and Galligan do not teach claim 10, Examiner respectfully disagrees. In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e. “identifying an ‘environment’ within a frame lacking any ‘object of interest,’ based on an identification of that environment within a different frame within a video having an ‘object of interest’”; “method of assigning locations to individual frames, based on the identification of those same locations as ‘context’ for an ‘object of interest’ within a different frame, and assignment of the same location to frames containing the same contextual scene but without the object of interest”; “identifying a location of frames that lack an object of interest”; “When, in a subsequent frame, the environment appears without the location of interest”) are not recited in the rejected claims.  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). 

In response to Applicant’s argument that the combination of Tseng, Galligan, Benjamin et al., and Zhao et al. do not teach claim 11, Examiner respectfully disagrees. In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e. “using a completion algorithm with respect to all images in a video, including images lacking identifiable landmarks”; “Using the added metadata, the database may be searched in order to create a trip plan”) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).

In response to Applicant’s argument that the combination of Tseng and Galligan do not teach claim 4, Examiner respectfully disagrees. Tseng teaches updating of the photographic location database may include obtaining images from third party sites such as Yelp! Or GoogleStreetView (¶ [0035]). To further elaborate on how the database is updated, Tseng explains that the images for which the location API identifies a location for may be stored in photographic location database (¶ [0043]).

In response to Applicant’s argument that the combination of Tseng, Galligan, and Tyagi do not teach claim 12, Examiner respectfully disagrees. Tyagi teaches panoramic video data may include video data having a field of view beyond 180 degrees. For example, the image capture device 110 may capture a field of view of 360 degrees using a plurality of cameras, where the image capture device 110 may capture the panoramic video data using the one or more camera(s) 115 (Col. 4 lines 6-15 and Col. 6 lines 13-43). Also, Examiner notes that the claim does not require that the 360 degree video is constructed with images taken by a single camera. Applicant is reminded that although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). Furthermore, in response to applicant's arguments against the Tyagi reference individually for not identifying a location within the video, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). To elaborate, the combination of Tseng and Galligan teaches a known method of identifying a location within a video, which is modified by Tyagi’s use of 360 degree videos. One would be motivated to use 360 degree videos in the location identifying method in order to have a greater view of the region of interest.

In response to Applicant’s argument that the combination of Tseng and Galligan do not teach new claim 13, Examiner respectfully disagrees. In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e. “the identification of the unidentified frame is determined positively”) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).

If it is Applicant’s intent to claim the subject matter argued above, specifically the identification of a frame including an environment and lacking the object of interest, Examiner suggests Applicant clarify that within the claim. Examiner also suggests Applicant clarify what it meant by an environment “near” an object of interest. Finally, Examiner recommends Applicant show how the disclosure supports newly claimed subject matter of applying an algorithm determining a location of images taken in a camera sweep, to the identification of a location of an environment near an object of interest in a video stream, as is applied in the claims. 

Claim Objections
Claim 8 is objected to because of the following informalities:  
a) “including at lest” should read “including at least”
b) “breaking said at least one stream” should read “breaking said at least one video stream”.
 Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph: 
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. The limitations in this application that use the word “means” and are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, include: “processing means for processing each of said extracted frames”. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. 
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitations uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitations are: “a video parsing element adapted to extract from said at least one video stream a plurality of disunited frames”, “a location identification engine trained through deep learning techniques which is configured to identify a location that is visible in a given one of said extracted frames…”, and “ a completion algorithm module which is invoked if one or more of said disunited frames is unidentified by said location identification engine and is configured to identify an environment that is visible in a given one of said unidentified frames…” in claim 8. 
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. 
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.


The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Claims 1-4, and 6-13 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. 

The claims contain subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. The amended independent claims 10 and 8 include subject matter for which support was indicated on pages 4 and 5 of the Applicant’s remarks. The portions pointed to by Applicant broadly suggest creating context for objects in a video, however they do not clearly disclose the algorithm described in the claims as performing such a process (e.g. using a video stream as opposed to a video pan and the specifics of steps a-f). Examiner suggests that Applicant clarify this and point to support that clearly links the claimed algorithm to the determination of a location of an environment near an object of interest, and explain any implicit support.


	Claims 1-4, 6-7, 9, and 11-13 are dependent from claim 10. Therefore, they inherit the defects of their respective parent claims and are rejected accordingly.

Additionally, claim 8 recites “processing means for processing each of said extracted frames”, “a video parsing element adapted to extract from said at least one video stream a plurality of disunited frames”, “a location identification engine trained through deep learning techniques which is configured to identify a location that is visible in a given one of said extracted frames…”, and “ a completion algorithm module which is invoked if one or more of said disunited frames is unidentified by said location identification engine and is configured to identify an environment that is visible in a given one of said unidentified frames…”. For computer implemented functional claims, the specification must disclose the computer and the algorithm that perform the claimed function, see MPEP § 2161.01(I). The specification fails to disclose any hardware for performing the claimed functional limitations.

Claims 1-4, and 6-13 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claims 10 and 8 recite “an environment near said object of interest.” The term “near” is a relative term which renders the claim indefinite. The term “near” is not defined by the claim, nor does the specification provide a standard for ascertaining the requisite degree. Therefore, one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. For the purposes of examination, Examiner is interpreting the limitation as each frame of the first plurality of frames including an object of interest and an environment. 

Claim 13 recites “adding said frames.” However, there are multiple recitations of frames including at least: “disunited frames”, “a first plurality of frames”, and “unidentified frames”. It is not clear which frames are being referred to in this limitation. Therefore, one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. For the purposes of examination, Examiner is interpreting the limitation as the “unidentified frames.”

Claim 2  recites “if not identified by the image recognition database.” It is not clear what is meant by this limitation as prior limitations have recited “a location identification engine” as identifying a location in the sent frame. Therefore, one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. For the purposes of examination, Examiner is interpreting the limitation as “if not identified by the location determination engine”.

Claims 1-4, 6-7, 9, and 11-13 are dependent from claim 10. Therefore, they inherit the defects of their respective parent claims and are rejected accordingly.

The following claim limitations of claim 8 invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph: “processing means for processing each of said extracted frames”, “a video parsing element adapted to extract from said at least one video stream a plurality of disunited frames”, “a location identification engine trained through deep learning techniques which is configured to identify a location that is visible in a given one of said extracted frames…”, and “ a completion algorithm module which is invoked if one or more of said disunited frames is unidentified by said location identification engine and is configured to identify an environment that is visible in a given one of said unidentified frames…”. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. The disclosure is devoid of structure that is clearly linked or associated to the claimed functions. For computer implemented means-plus-function limitations, the limitation must be supported by both the algorithm and the computer or microprocessor programmed with the algorithm, see MPEP § 2181(II)B. Therefore, claim 8 is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. 
Applicant may: 
(a) Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b) Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c) Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a) Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b) Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 10, 1, 4, 7, 9, 8, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Tseng (US 2012/0310968) and Galligan (US 8,942,535).

Regarding claim 10, Tseng, in the analogous field of location determination, teaches A method for identifying a location of an unidentified environment depicted in a video stream, comprising: a) receiving at least one video stream, said video stream including at lest a first plurality of frames each including an image of an object of interest and an environment near said object of interest (Tseng: multimedia objects for upload may be single image files or video file, ¶ [0037]; an image for upload includes a scene which includes object 203 and objects 204, ¶ [0036] and see FIG. 2 which depicts such a scene that includes the Eiffel tower object 203 (i.e. object of interest) and surrounding scenery such as a row of trees 204 (i.e. an environment); one or more individual frames from the video file are extracted, ¶ [0038]; Note one or more individual frames is being interpreted as a first plurality of frames of the video.); 
b) breaking said at least one stream into disunited frames (Tseng: one or more individual frames from the video file are extracted, ¶ [0038]); 
c) sending each of said disunited frames to a location identification engine which is configured to identify a location that is visible in the sent frame, wherein said location identification engine is trained through deep learning techniques to compare images within each disunited frame with images stored in an image recognition database (Tseng: photographic location database is updated by use of submitted photos ¶ [0035]; the location of the submission is determined by matching one or more objects in the images to objects contained in images in the photographic location database, ¶ [0023]; In particular embodiments, the object itself may also be tagged as a result of the object image recognition algorithm. For example, if a photo includes the Statue of Liberty, the social networking system may tag the object in the photo with the meta data "Statute of Liberty, ¶ [0024]; user-uploaded photos whose location was determined within a threshold precision are added to the database, ¶ [0043]; the location API uses neural network techniques to recognize and match objects in the images as part of the location determination process, ¶ [0020]); 
d) for each of the first plurality of frames, identifying a location of the object of interest on a basis of the comparison performed with the location identification engine, and further identifying a location of the environment near the object of interest (Tseng: the location of the submission is determined by matching one or more objects in the images to objects contained in images in the photographic location database, ¶ [0023]; an image for upload includes a scene which includes object 203 and objects 204, ¶ [0036] and see FIG. 2 which depicts such a scene that includes the Eiffel tower object 203 (i.e. object of interest) and surrounding scenery such as a row of trees 204 (i.e. an environment); In particular embodiments, the object itself may also be tagged as a result of the object image recognition algorithm. For example, if a photo includes the Statue of Liberty, the social networking system may tag the object in the photo with the meta data "Statute of Liberty, ¶ [0024]; Note that it is interpreted that “one or more objects” includes at least two objects that are matched, such as object 203 (i.e. object of interest) and objects 204 (i.e. environment)), and []
However, Tseng does not teach if one or more of said disunited frames is unidentified by said location identification engine, invoking a completion algorithm to identify an environment that is visible in a given one of said unidentified frames, whereby said completion algorithm is operable to identify one or more frames within the first plurality of frames with an identified location and being associated with an identified environment, to determine that an environment shown in said unidentified frame is the same as the environment with the identified location, and to set a location of the given unidentified frame with the location of the environment with the identified location.
Galligan, in the analogous field of location determination, teaches if one or more of said disunited frames is unidentified by said location identification engine, invoking a completion algorithm to identify an environment that is visible in a given one of said unidentified frames, whereby said completion algorithm is operable to identify one or more frames within the first plurality of frames with an identified location and being associated with an identified environment, to determine that an environment shown in said unidentified frame is the same as the environment with the identified location, and to set a location of the given unidentified frame with the location of the environment with the identified location (Galligan: the implicit geolocation information for a frame may be inferred from the geolocation information identified for one or more other frames. For example, implicit geolocation information may be unidentifiable or may be identifiable with low confidence, based on the content of the third frame and implicit geolocation information identified for the first frame, the second frame, or both may be used to interpolate the geolocation information of the third frame. A most likely geographic location for the third frame may be selected from the candidate geographic locations based on a degree of similarity between the candidate geographic locations identified for the third frame and the geographic location identified for the first frame and the geographic location identified for the second frame, Col. 9 Line 45– Col. 10 line 5).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Tseng with that of Galligan and to apply the known technique of using information identified for one or more other frames to determine the location of an unidentified frame, to the known photographic image location determination process to yield the predictable result of resolving the location of the unidentified frame.

Regarding claim 1, the combination of Tseng and Galligan further teaches wherein the frames broken from the at least one video stream are a plurality of sequential frames (Tseng: one or more individual frames from the video file are extracted, ¶ [0038]).

Regarding claim 9, the combination further teaches wherein the plurality of sequential frames broken from the at least one video stream include all the frames in a sequence (Tseng: one or more individual frames from the video file are extracted, ¶ [0038]).

Regarding claim 4, the combination further teaches wherein the video stream is acquired using a crawler (Tseng: updating of the photographic location database may include obtaining images from third party sites such as Yelp! Or GoogleStreetView, ¶ [0035]). 

Regarding claim 7, the combination further teaches wherein additional data acquired by the crawler is related to one or more of the groups consisting of text, photos, audio, maps reviews, descriptions and ratings (Tseng: Yelp! and GoogleStreetView, ¶ [0035]).

Regarding claim 13, the combination further teaches further comprising, following setting of the location of each of the unidentified disunited frames, adding said frames and accompanying metadata to the image recognition database (Tseng: photographic location database is updated by use of submitted photos ¶ [0035]; the location of the submission is determined by matching one or more objects in the images to objects contained in images in the photographic location database, ¶ [0023]; In particular embodiments, the object itself may also be tagged as a result of the object image recognition algorithm. For example, if a photo includes the Statue of Liberty, the social networking system may tag the object in the photo with the metadata "Statute of Liberty, ¶ [0024]).

Regarding claim 8, Tseng, in the analogous field of location determination, teaches A system for identifying a location of an unidentified environment within a video stream, comprising: a) circuitry adapted to receive at least one video stream, said video stream including at least a first plurality of frames each including an image of an object of interest and an environment near said object of interest (Tseng: multimedia objects for upload may be single image files or video file, ¶ [0037]; an image for upload includes a scene which includes object 203 and objects 204, ¶ [0036] and see FIG. 2 which depicts such a scene that includes the Eiffel tower object 203 (i.e. object of interest) and surrounding scenery such as a row of trees 204 (i.e. an environment); one or more individual frames from the video file are extracted, ¶ [0038]; Note one or more individual frames is being interpreted as a first plurality of frames of the video processors, electrical components, circuits, ¶ [0056] and FIG. 1); 
b) a video parsing element adapted to extract from said at least one video stream a plurality of disunited frames (Tseng: one or more individual frames from the video file are extracted, ¶ [0038]); and 
c) processing means for processing each of said extracted frames, said processing means comprising: i) a location identification engine trained through deep learning techniques (Tseng: photographic location database is updated by use of submitted photos ¶ [0035]; the location of the submission is determined by matching one or more objects in the images to objects contained in images in the photographic location database, ¶ [0023]; In particular embodiments, the object itself may also be tagged as a result of the object image recognition algorithm. For example, if a photo includes the Statue of Liberty, the social networking system may tag the object in the photo with the meta data "Statute of Liberty, ¶ [0024]; user-uploaded photos whose location was determined within a threshold precision are added to the database, ¶ [0043]; the location API uses neural network techniques to recognize and match objects in the images as part of the location determination process, ¶ [0020]);  which is configured to identify a location that is visible in a given one of said extracted frames, by comparing images within each disunited frame with images stored in an image recognition database, and which is further configured, for each of the first plurality of frames, to identify a location of the object of interest on a basis of the comparison performed with the location identification engine, and to further identify a location of the environment near the object of interest (Tseng: the location of the submission is determined by matching one or more objects in the images to objects contained in images in the photographic location database, ¶ [0023]; an image for upload includes a scene which includes object 203 and objects 204, ¶ [0036] and see FIG. 2 which depicts such a scene that includes the Eiffel tower object 203 (i.e. object of interest) and surrounding scenery such as a row of trees 204 (i.e. an environment); In particular embodiments, the object itself may also be tagged as a result of the object image recognition algorithm. For example, if a photo includes the Statue of Liberty, the social networking system may tag the object in the photo with the meta data "Statute of Liberty, ¶ [0024]; Note that it is interpreted that “one or more objects” includes at least two objects that are matched, such as object 203 (i.e. object of interest) and objects 204 (i.e. environment)) ; and []
However, Tseng does not teach a completion algorithm module which is invoked if one or more of said disunited frames is unidentified by said location identification engine and is configured to identify an environment that is visible in a given one of said unidentified frames, search for an environment within the first plurality of frames with an identified location, determine that an  environment shown in said unidentified frame is the same as the environment with the identified location, and set a location of the given unidentified frame with the location of the environment with the identified location.
Galligan, in the analogous field of location determination, teaches a completion algorithm module which is invoked if one or more of said disunited frames is unidentified by said location identification engine and is configured to identify an environment that is visible in a given one of said unidentified frames, search for an environment within the first plurality of frames with an identified location, determine that an  environment shown in said unidentified frame is the same as the environment with the identified location, and set a location of the given unidentified frame with the location of the environment with the identified location (Galligan: the implicit geolocation information for a frame may be inferred from the geolocation information identified for one or more other frames. For example, implicit geolocation information may be unidentifiable or may be identifiable with low confidence, based on the content of the third frame and implicit geolocation information identified for the first frame, the second frame, or both may be used to interpolate the geolocation information of the third frame. A most likely geographic location for the third frame may be selected from the candidate geographic locations based on a degree of similarity between the candidate geographic locations identified for the third frame and the geographic location identified for the first frame and the geographic location identified for the second frame, Col. 9 Line 45– Col. 10 line 5).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Tseng with that of Galligan and to apply the known technique of using information identified for one or more other frames to determine the location of an unidentified frame, to the known photographic image location determination process to yield the predictable result of resolving the location of the unidentified frame.

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Tseng (US 2012/0310968), Galligan (US 8,942,535), and Benjamin et al. (US 2011/0167357).

Regarding claim 2, the combination of Tseng and Galligan teaches the method of claim 10, as shown above. 
However, the combination does not teach identifying a location that is visible in the sent frame, if not identified by the image recognition database according to keywords derived from an audio transcript of the sent frame. 
Benjamin et al., in the analogous field of multimedia processing, teaches identifying a location that is visible in the sent frame, if not identified by the image recognition database according to keywords derived from an audio transcript of the sent frame (Benjamin et al.: Locations may be mentioned in conversations, and extracted from the audio recordings, ¶ [0052]; a number of descriptive labels can be associated with the location as well… These descriptive labels can be derived from keyword analysis of the audio transcripts, ¶ [0075]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Tseng and Galligan with that of Benjamin et al. and to apply the known technique of using audio transcripts in order to determine the location of a frame, to the known photographic image location determination process to yield the predictable result of resolving the location of the frame.

Claims 11 and 3 are rejected under 35 U.S.C. 103 as being unpatentable over Tseng (US 2012/0310968), Galligan (US 8,942,535), Benjamin et al. (US 2011/0167357), and Zhao et al. (US 2014/0310319). 

Regarding claim 11, the combination of Tseng, Galligan, and Benjamin et al. teaches the method according to claim 2, as shown above. The combination further teaches ii) manually approving said identification; iii) adding the identified location [] to the image recognition database after such approval (Tseng: photographic location database is updated by use of a photo submitted in connection with a check-in, ¶ [0035]; location API may not automatically generate a check-in, but assist in the check –in by displaying possible locations that the user then selects from, ¶ [0045]); and iv) invoking the completion algorithm if a location that is visible in the sent frame is not identified [] (Galligan: the implicit geolocation information for a frame may be inferred from the geolocation information identified for one or more other frames. For example, implicit geolocation information may be unidentifiable or may be identifiable with low confidence, based on the content of the third frame and implicit geolocation information identified for the first frame, the second frame, or both may be used to interpolate the geolocation information of the third frame. A most likely geographic location for the third frame may be selected from the candidate geographic locations based on a degree of similarity between the candidate geographic locations identified for the third frame and the geographic location identified for the first frame and the geographic location identified for the second frame, Col. 9 Line 45– Col. 10 line 5).
However, the combination does not teach i) using a public API database to identify a location that is visible in the sent frame, if not identified according to an audio of the sent frame. 
Zhao et al., in the analogous field of scene recognition, teaches i) using a public API database to identify a location that is visible in the sent frame, if not identified according to an audio of the sent frame (Zhao et al.: a GOOGLE geocoding service may be used to determine the geolocations, ¶ [0058]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Tseng, Galligan and Benjamin et al. with that of Zhao et al. and to apply the known technique of using a public API database in order to determine the location of a frame, to the known photographic image location determination process to yield the predictable result of resolving the location of the frame.

Regarding claim 3, the combination of Tseng, Galligan, Benjamin et al., and Zhao et al. further teaches adding metadata of each of the disunited frames associated with the identified location to the image recognition database (Tseng: photographic location database is updated by use of submitted photos ¶ [0035]; the location of the submission is determined by matching one or more objects in the images to objects contained in images in the photographic location database, ¶ [0023]; In particular embodiments, the object itself may also be tagged as a result of the object image recognition algorithm. For example, if a photo includes the Statue of Liberty, the social networking system may tag the object in the photo with the metadata "Statute of Liberty, ¶ [0024]).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Tseng (US 2012/0310968), Galligan (US 8,942,535), and Tyagi et al. (US 9,818,451). 
Regarding claim 12, the combination of Tseng and Galligan teaches the method of claim 10, as shown above. 
However, the combination does not teach wherein the received at least one video stream is a 360-degree video whereby images of an object are taken at different angles and are connected into a panoramic rounding frame, and the rounding frames of the 360-degree video are sent to the image recognition database. 
Tyagi et al., in the analogous field of video processing, teaches wherein the received at least one video stream is a 360-degree video whereby images of an object are taken at different angles and are connected into a panoramic rounding frame, and the rounding frames of the 360-degree video are sent to the image recognition database (Tyagi et al.: panoramic video data may include video data having a field of view beyond 180 degrees. The image capture device 110 may capture the panoramic video data using the one or more camera(s) 115. For example, the image capture device 110 may capture a field of view of 360 degrees using a plurality of cameras, Col. 4 lines 6-15 and Col. 6 lines 13-43). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Tseng and Galligan with that of Tyagi et al. and to receive a 360 degree video in order to have a greater view of the region of interest.

Allowable Subject Matter
Claim 6 would be allowable if rewritten to overcome the rejections under 35 U.S.C. 112(a) and 35 U.S.C. 112(b), as set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LANA ALAGIC whose telephone number is (571)270-1624. The examiner can normally be reached Monday-Friday 8:00 am-4:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, TAMARA T KYLE can be reached on (571)272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/L.A./Examiner, Art Unit 2156                                                                                                                                                                                                        08/20/222

/William B Partridge/Primary Examiner, Art Unit 2183