DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Response to Amendment
The amendment filed on March 22nd, 2021 has been entered.
The amendment of claims 1, 3-7, 12-14, and 20 has been acknowledged.

Response to Arguments
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
In response to applicant’s argument that there is no teaching, suggestion, or motivation to combine the references, the examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the knowledge generally available to one of ordinary skill in the art.  See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d all of the prior art references are directed to image processing methods and systems for detecting and tracking an object of interest. It would be obvious to modify MacMillan with Zhou to provide more mobility to the user by allowing the user to move while viewing the scene and further modify to present the POI reset to adjust the FOV so that the user can continue to monitor and track the POI.
In addition, Applicant’s arguments with respect to the pending claims have been fully considered but are moot because the arguments rely on newly added and/or amended claim limitations. The examiner has revised the rejections to match the new claim limitations.

Claim Rejections - 35 USC § 112
Claim(s) 3 and 22 is/are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 3 recites the limitation “the recording.” There is no antecedent basis for this limitation in the claim. For the purpose of further examination, the limitation “wherein the recording includes” has been interpreted as “further comprising.”
Claim 22 recites the limitation “the graphical indicia.” There is no antecedent basis for this limitation in the claim. For the purpose of further examination, the limitation “wherein the graphical indicia include” has been interpreted as “further comprising.”

Claim Rejections - 35 USC § 103
Claim(s) 1, 2, 4-9, 11-14, and 16-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over MacMillan et al. (US 2015/0256746 A1), in view of Zhou et al. (US 2017/0364752 A1), and further in view of Glatt (US 2008/0218587 A1), hereinafter referred to as MacMillan, Zhou, and Glatt, respectively.
claim 1, MacMillan teaches a computer implemented method, comprising: 
under control of one or more processors configured with executable instructions (MacMillan ¶0037: “one or more processors and a non-transitory computer-readable storage medium storing instructions therein that when executed cause the processor to carry out the functions attributed to the respective devices described herein”), 
obtaining a panoramic video for a scene, the panoramic video having a coordinate system (MacMillan ¶0019: “A spherical content capture system captures spherical video content”; MacMillan Fig. 1; MacMillan ¶0041: “provides GPS coordinates”); 
identifying a point of interest (POI) from the scene within the panoramic video (MacMillan Abstract: “relevant sub-frames having a reduced field of view may be extracted from each frame of spherical video to generate an output video that tracks a particular individual or object of interest”); 
calculating directional sound information related to a sound origin of the POI (MacMillan ¶0067: “an audio analysis is performed on audio received from a microphone array to detect a direction associated with the sound source. The direction of the sound source can then be correlated to a particular spatial position thin the spherical video (using, for example, a known orientation of the camera determined based on sensor data or visual cues). The position of the sound source can then be identified and tracked”); 
tracking a position of the POI within the panoramic video (MacMillan Abstract discussed above; MacMillan ¶0020: “sub-frames may be selected to generate an output video that track a particular individual, object, scene, or activity of interest”); 
playing back of the panoramic video while displaying a field-of-view (FOV) segment (MacMillan ¶0059: “content manipulation is performed on the video server 240 with edits and playback using only the original source content”) comprising: 
during automatic tracking, automatically changing the FOV segment, from the panoramic video, to maintain the POI in the FOV segment based on the directional sound informationMacMillan ¶0032: “automatically locate a sequence of sub-frames from one or more of the spherical videos that depict the skier and follow his path through the resort” – the object of interest is moving in a path and tracked, indicating a positional change; MacMillan ¶0067 discussed above – the position of the sound source is identified and tracked); and 
during manual navigation, changing the FOV segment, based on a user input (MacMillan ¶0020: “The output video thus reduces the captured spherical content to a standard field of view video having the content of interest while eliminating extraneous data outside the targeted field of view”; MacMillan ¶0060: “the user interface also provides an interactive viewer that enables the user to pan around within the spherical content being viewed. This will allow the user to search for significant moments to incorporate into the output video and manually edit the automatically generated video”).
However, MacMillan does not appear to explicitly teach that the FOV segment is navigated away from the POI such that the POI is no longer in the FOV during manual navigation and presenting a POI reset so that the FOV segment is automatically adjusted in response to selecting the POI reset.
Pertaining to the same field of endeavor, Zhou teaches manually navigating the FOV segment away from the POI such that the POI is no longer in the FOV segment (Zhou ¶0096: “In response to determining that the viewer start looking and/or moving away from a salient object (e.g., a tiger lurching nearby, etc.), the media rendering system may make local visual adjustments and/or local sound adjustments related to the salient object”).
MacMillan and Zhou are considered to be analogous art because they are directed to image processing for tracking objects of interest. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method and system for automatic audio/visual analysis (as taught by MacMillan) to manually 
MacMillan, in view of Zhou, does not appear to explicitly teach presenting a POI reset so that the FOV segment is automatically adjusted in response to selecting the POI reset.
Pertaining to the same field of endeavor Glatt teaches presenting a POI reset and automatically adjusting the FOV segment to include the POI in connection with selection of the POI reset, the POI reset provided as a shortcut to allow resetting the FOV segment to include the POI (Note that no patentable distinction is made by an intended use or result limitations unless some structural difference is imposed by the use or result on the structure or material recited in the claim. Glatt Fig. 7: see “Image Reset” 789 as described in Glatt ¶0049: “perform similar horizontal manipulation to place the area of interest in the center of the view window”).
MacMillan, in view of Zhou, and Glatt are considered to be analogous art because they are directed to image processing. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method and system for automatic audio/visual analysis (as taught by MacMillan, in view of Zhou) to present a POI reset to automatically adjust the FOV (as taught by Glatt) because the combination allows the POI to remain in the scene so that the user can continue tracking the POI.

Regarding claim 2, MacMillan, in view of Zhou, and Glatt, teaches the method of claim 1, wherein the POI position data includes a view angle between the POI and a reference view direction of the coordinate system (Zhou ¶0030-¶0031: “track positions and motions of media elements represented by sound objects and/or video objects in any type of coordinate system such as a spherical coordinate system (e.g., on a unit spherical surface, in a spherical volume, etc.), a Cartesian coordinate system, a projection-based coordinate system, an absolute coordinate system (e.g., the World coordinate), a relative coordinate system (e.g., stationary to a camera system, etc.), etc. … A spatial position/angle of an audio or video object at a given Zhou ¶0055: “a position is given in a set of (x, y) coordinate values. Such a position may be an absolute position (e.g., represented in the World coordinate, etc.) or a relative position (e.g., represented in a relative coordinate system stationary to the camera system (106), etc.).”).

Regarding claim 4, MacMillan, in view of Zhou, and Glatt, teaches the method of claim 1, wherein, during the automatic tracking, stepping through at least a portion of frames in the panoramic video based on the sound directional information (MacMillan ¶0067: “Other techniques for identifying relevant sub-frames do not necessarily depend on location data associated with the target and instead identify sub-frames relevant to a particular target based on the spherical media content (e.g., visual and/or audio content) itself. FIG. 8 illustrates an embodiment of a process for generating an output video relevant to a particular target based on audio/video processing … The direction of the sound source can then be correlated to a particular spatial position thin the spherical video (using, for example, a known orientation of the camera determined based on sensor data or visual cues). The position of the sound source can then be identified and tracked”).

Regarding claim 5, MacMillan, in view of Zhou, and Glatt, teaches the method of claim 1, wherein, during the automatic tracking, changing the FOV segment based on movement indicated by the sound directional information, the panoramic video includes a series of frames that are 360° views of the scene over a time period (MacMillan Abstract: “A spherical content capture system captures spherical video content”; MacMillan Fig. 1; MacMillan ¶0005: “In a spherical video capture system, a video camera system (which may include multiple video cameras) captures video in a 360 degree field of view along a horizontal axis and 180 degree 

Regarding claim 6, MacMillan, in view of Zhou, and Glatt, teaches the method of claim 1, further comprising, while playing back the panoramic video, displaying a POI tracker based on the sound directional information to guide a user input for adjusting the FOV segment (Zhou ¶0105: “Media metadata as described herein can be used to provide directional guidance and aid for users in a VR space … Example perceptual cues/hints may include, but are not necessarily limited to only, any of: visual cues/hints, acoustic cues/hints, haptic cues/hints, non-visual non-acoustic cues/hints (e.g., mechanical vibration, etc.)”; note that the media metadata may be generated based on sound source, as described in Zhou ¶0034). 

Regarding claim 7, MacMillan, in view of Zhou, and Glatt, teaches the method of claim 6, further comprising displaying a POI tracker indicative of the position of the POI within the panoramic video, the POI tracker indicating a direction to move the FOV segment to track a location of a sound origin from the POI (Zhou ¶0034 & ¶0105 discussed above).  

Regarding claim 8, MacMillan, in view of Zhou, and Glatt, teaches the method of claim 7, wherein the POI tracker is displayed after the FOV segment navigates away from the POI (Zhou ¶0105 discussed above – the visual POI indicator is presented when the POI is outside the viewport of the user).  

Regarding claim 9, MacMillan, in view of Zhou, and Glatt, teaches the method of claim 1, further comprising, while playing back the panoramic video, automatically adjusting the field of view (FOV) based on the POI position data to maintain the POI in the FOV (MacMillan ¶0032 & ¶0067 discussed above).  

claim 11, MacMillan, in view of Zhou, and Glatt, further teaches a device, comprising a processor and a memory storing instructions accessible by the processor to perform the method described in claim 1. MacMillan, in view of Zhou, and Glatt, further teaches that the FOV segment is displayed from the panoramic video (MacMillan Fig. 1; MacMillan Fig. 3: System Control 320 and System Memory 330; MacMillan ¶0074: “the video server 240 displays a selected spherical video in a spherical video viewer and receives user inputs indicating selections of the desired sub-frames at each frame of the selected spherical video.”). Therefore, claim 11 is rejected using the same rationale as applied to claim 1 discussed above. 3RPS920170036-US-NP (018-0037US1) 

Regarding claim 12, MacMillan, in view of Zhou, and Glatt, teaches the device of claim 11, wherein, responsive to execution of the instructions, the processor to access metadata stored in connection with the panoramic video, the metadata including sound directional information related to a sound origin of the POI  associated with at least a portion of frames in the panoramic video, the processor to utilize the sound directional information for the sound origin during the playing back to guide the FOV segment to the POI (MacMillan ¶0032 & ¶0067 and Zhou ¶0034 & ¶0105 discussed above; also see Zhou ¶0049: “the sound object tracking block (128) generates a plurality of sound objects (e.g., object size information, object location information, volume, pitch, timbre, etc.) for sounds tracked … The plurality of audio objects may be used to generate a sound object position list 124, which may be a list of positions per sound object per unit time interval (e.g., per audio frame, per spherical image, etc.) as functions of time”; Zhou ¶0096: “may receive/determine/select salient objects from among a plurality of candidate salient objects generated based on sound objects and visual objects”; Zhou ¶0114: “The media system may also assign a sound object or a visual object that has a first spatial position to a second different spatial position, for example”).

claim 13, MacMillan, in view of Zhou, and Glatt, teaches the device of claim 12, wherein, responsive to execution of the instructions, the processor to track automatically the POI by adjusting the FOV segment during playback of the panoramic video based on the sound directional information (MacMillan ¶0032 & ¶0067 discussed above).  

Regarding claim 14, MacMillan, in view of Zhou, and Glatt, teaches the device of claim 11, wherein, responsive to execution of the instructions, the processor to adjust the FOV segment, to be displayed, based on movement 4RPS920170036-US-NP (018-0037US1) indicated by sound directional information related to a sound origin of the POI (MacMillan ¶0032 & ¶0067 and Zhou ¶0034 & ¶0105 discussed above).  

Regarding claim 16, MacMillan, in view of Zhou, and Glatt, teaches the device of claim 11, wherein, responsive to execution of the instructions, the processor to co-display the POI tracker and the FOV segment, the POI tracker representing a graphic of the scene with an FOV marker and a POI marker, the FOV marker indicative of a location of the FOV segment, the POT marker indicative of a location of the POI (MacMillan Fig. 1; Zhou Fig. 2 – the POI trackers are indicated as a bounding box and move along with the FOV; also see Glatt ¶0039: “Associated with the rectangular video view 104 is degree indicator 128. This indicates the orientation relative to a given direction from the circular video view 102”).

Regarding claim 17, MacMillan, in view of Zhou, and Glatt, teaches the device of claim 11, further comprising a 360°microphone unit to collect audio signals, the POI position data based on the audio signals (MacMillan ¶0005: “captures video in a 360 degree field of view”; MacMillan ¶0027: “Examples of metadata sources 210 include … camera inputs (such as an image sensor, microphones, buttons, and the like)”; MacMillan ¶0033: “a microphone array may be used to determine directionality associated with a received audio signal”; Zhou ¶0041: Zhou ¶0055: “sphere of 360 angular degree”).  

Regarding claim 18, MacMillan, in view of Zhou, and Glatt, further teaches a computer program product comprising a non-transitory computer readable storage medium comprising computer executable code according to claim 1 (MacMillan ¶0037: “client device 225 can include one or more processors and a non-transitory computer-readable storage medium storing instructions therein that when executed cause the processor to carry out the functions attributed to the respective devices described herein”). Therefore, claim 18 is rejected using the same rationale as applied to claim 1 above.

Regarding claim 19, MacMillan, in view of Zhou, and Glatt, teaches the computer program product of claim 18, wherein the computer executable code further tracks the POI automatically by adjusting the FOV segment during playback of the panoramic video in connection with automatic tracking of the POI (MacMillan ¶0032 & ¶0067 discussed above).  

Regarding claim 20, MacMillan, in view of Zhou, and Glatt, teaches the computer program product of claim 18, wherein the computer executable code further adjusts the FOV segment to be displayed, based on sound directional information related to a sound origin of the POI (MacMillan ¶0067 & Zhou ¶0105 discussed above).  

Regarding claim 21, MacMillan, in view of Zhou, and Glatt, teaches the device of claim 11, the POI tracker including one or more of a bird's-eye view perspective, side perspective view or rear view perspective of the scene represented as one or more of an animation, sketch and/or cartoon version of a generic scene (Note that only one of the alternative limitations is required by the claim language. MacMillan Fig. 4 & ¶0048: “FIG. 4 illustrates a side view of an Glatt Fig. 16A & ¶0061: “a side view of panoramic camera 1602 is shown”).

Regarding claim 22, MacMillan, in view of Zhou, and Glatt, teaches the device of claim 11, wherein the graphical indicia include an actual rendered image from an elevational view based on the video content within the panoramic video (Glatt Fig. 16B & ¶0064: “FIG. 16B shows a top-down view of the camera 1602”).

Claim(s) 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over MacMillan et al. (US 2015/0256746 A1), in view of Zhou et al. (US 2017/0364752 A1),  Glatt (US 2008/0218587 A1), and further in view of Rondinelli et al. (US 2006/0028542 A1), hereinafter referred to as MacMillan, Zhou, Glatt, and Rondinelli, respectively.
Regarding claim 3, MacMillan, in view of Zhou, and Glatt, teaches the method of claim 1, wherein the recording includes storing different sound directional information related to the sound origin of first and second POIs  during different frames 2RPS920170036-US-NP (018-0037US1) in the panoramic video, and utilizing the sound directional information during the playing back (Zhou ¶0049: “the sound object tracking block (128) generates a plurality of sound objects (e.g., object size information, object location information, volume, pitch, timbre, etc.) for sounds tracked … The plurality of audio objects may be used to generate a sound object position list 124, which may be a list of positions per sound object per unit time interval (e.g., per audio frame, per spherical image, etc.) as functions of time”; Zhou ¶0066: “The generated audio beam pattern may comprise different audio beams (e.g., 304-1 through 304-3, etc.) directed respectively to the different sound sources such as the real-world persons or objects (e.g., 202-4 through 202-6), etc.”; Zhou ¶0096: “may receive/determine/select salient objects from among a plurality of candidate salient objects generated based on sound objects and visual objects”; Zhou ¶0114: “The media system 
However, MacMillan, in view of Zhou, and Glatt, does not appear to explicitly teach switching the FOV segment between the first and second POIs based on the sound directional information.
Pertaining to the same field of endeavor, Rondinelli teaches switching the FOV segment between the first and second POIs based on the sound directional information (Rondinelli ¶0044: “if the direction of the source of sound and motion is changing, the user may be able to sense the direction of the moving source as the selected view of the panoramic imagery is changed”).
MacMillan, in view of Zhou and Glatt, and Rondinelli, are considered to be analogous art because they are directed to panoramic image processing. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method and system for automatic audio/visual analysis (as taught by MacMillan, in view of Zhou and Glatt) to switch FOV based on sound (as taught by Rondinelli) because the combination provides a realistic experience for the viewer (Rondinelli Abstract).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SOO J SHIN whose telephone number is (571)272-9753.  The examiner can normally be reached on M-F; 10-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571)272-7778.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/Soo Shin/Primary Examiner, Art Unit 2667