DETAILED ACTION
This Office Action is in response to the Applicants' communication filed on March 24, 2022, which amends the dependent claim 19, and presents arguments, is hereby acknowledged. Claims 1-20 are currently pending and have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Applicant’s arguments filed on March 24, 2022, have been fully considered.
	Applicant argues that by this response, the prior arts on record do not teach the claimed limitation “analyzing the recording of content, the analyzing including recognition of objects in the recording of content and detection of events occurring in the recording of content”:
“Thus, Gloudemans discloses "the texture is updated... automatically based on detection of a specified event," where the event can be "a specified period has passed since a last update or at specified times of day." However, the cited portions of Gloudemans do not disclose or suggest updating the texture based on an event detected in a content recording, or that any such event detection in a content recording even occurs. Although Gloudemans discloses "determination of whether a specified event occurs... can be achieved in different ways, such as a light sensor to detect ambient light level, or a sensor which detects whether stadium lights have been turned on," these determinations are made based on physical sensors (e.g., an ambient light sensor), not based on analyzing a recording of content.”
Examiner replies that in the Office action FAOM, the primary art, Guetter, etc. (US 20120087561 A1), teaches that N+1 dimensional image data are acquired, one or more key frames are selected to be edited on the identified ROI, and the edits on the key frame(s) are automatically applied to one or more other image frames (See Guetter: Fig. 1 and Step S11 and Step S14. Also See the Office action FAOM Page 3). That is, the analysis on the key frame(s) is mapped to the claimed limitation “analyzing the recording of content”, and the acquired N+1 dimensional image data are mapped to the recording of content. Therefore, the argument of the applicant in this aspect is not persuasive.
Applicant argues secondarily that by this response, the prior arts on record do not teach the claimed limitation “analyzing the recording of content, the analyzing including recognition of objects in the recording of content and detection of events occurring in the recording of content”:
“Conwell discloses that "inferred metadata can be augmented ...by known image recognition/classification techniques." Conwell, iJ34. However, the cited portions of Conwell also do not disclose or suggest any such "detection of events occurring in the recording of content," and consequently also does not disclose or suggest "generating metadata information based at least in part on the analyzing, the metadata information identifying the... detected events in the recording," as recited in independent claim 1. The Office Action does not assert that the cited portions of Guetter or Gloudemans cure the deficiencies of the cited portions of Conwell and the cited portions of Guetter and Gloudemans do not cure the deficiencies.”
Examiner replies that the primary art, Guetter teaches that the acquired N+1 dimensional image data are segmented to identify ROIs which may be an event. However, to better address the claimed limitation of “detection of events”, the Office action FAOM used a secondary art, Gloudemans, etc. (US 20090128549 A1) that teaches to update the textured 3D model based on detection of a specific event which may be a specified time. Therefore, the argument of the applicant in this aspect is not persuasive.
Applicant argues thirdly that by this response, the prior arts on record do not teach the claimed limitation “generating metadata information based at least in part on the analyzing, the metadata information identifying the recognized objects and detected events in the recording”:
“Conwell discloses that "inferred metadata can be augmented ...by known image recognition/classification techniques." Conwell, iJ34. However, the cited portions of Conwell also do not disclose or suggest any such "detection of events occurring in the recording of content," and consequently also does not disclose or suggest "generating metadata information based at least in part on the analyzing, the metadata information identifying the... detected events in the recording," as recited in independent claim 1. The Office Action does not assert that the cited portions of Guetter or Gloudemans cure the deficiencies of the cited portions of Conwell and the cited portions of Guetter and Gloudemans do not cure the deficiencies.”
Examiner replies that the primary art, Guetter teaches that the acquired N+1 dimensional image data are segmented to identify ROIs which may be an event, edit the ROI, and interpolate the ROI edits to non-edited frames, where some kind of metadata are used for the interpolation edits to the un-edited frames based on the ROI detection and analysing. However, to better address the claimed limitation of “generating metadata information”, the Office action FAOM used a third art, Conwell (US 20100046842 A1), and Conwell teaches that the metadata, alone with the captured content, are generated and submitted to the server in order to provide response to the user. Thus, it is obvious to modify Guetter to generate metadata, and interpolate the edits to the remaining frames, and provide the edited time series images to the users. Therefore, the argument of the applicant in this aspect is not persuasive. 
Examiner further replies that the remaining arguments of the applicant, related to the independent claims 11 and 20, are similar to the arguments for the independent claim 1, as mentioned above, are not persuasive per Examiner’s replies above; and the arguments for the dependent claims that the dependent claims are dependent on allowable independent claims are mooted as the arguments for the independent claim are not persuasive.
Examiner respectfully further replies that the Applicant's arguments have been fully considered and they are not persuasive. The present action is made final.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 7-15, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Guetter, etc. (US 20120087561 A1) in view of Gloudemans, etc. (US 20090128549 A1), further in view of Conwell (US 20100046842 A1).
Regarding claim 1, Guetter teaches that a method (See Guetter: Figs. 1-2, and [0029], "FIG. 1 is a flow chart illustrating a method for propagating user edits of computer-derived segmentation throughout a time series of images according to an exemplary embodiment of the present invention. FIG. 2 is a sequence of illustrations showing various stages of computer- derived segmentation according to the method of FIG. 1") comprising:
receiving a recording of content, the recording having been captured by an electronic device (See Guetter: Figs. 1-2, and [0030], "First, image data may be acquired (Step S11). As discussed above, image data may be medical image data. The image data may be of N-dimensions, where N is a positive integer, and may include a time series of images. The data may therefore be referred to herein as N+1 dimensional image data as it may include N dimensions (1, 2, or 3 of which may be spatial dimensions) and the "+1" dimension representing time. FIG. 2(a) is an illustration representing acquired image data. An anatomical structure 21 may be seen within the illustration");
analyzing the recording of content, the analyzing including recognition of objects in the recording of content (See Guetter: Figs. 1-2, and [0031], "Segmentation may then be performed on the acquired image data (Step S12). The performance of segmentation may include identification of one or more regions-of-interest ("ROls") within the image data. The segmentation may be performed across multiple frames of the acquired image, for example, all frames may be segmented. The segmentation may be fully automatic or may involve initial user input, for example, the selection of seed points within at least one image frame. Segmentation may be performed using a known segmentation algorithm or using one or more trained classifiers. FIG. 2(b) is an illustration representing the acquired image data, including the anatomical structure 21, and segmentation results 22. As can be seen, there may be regions in which the segmentation of the ROI deviates from the anatomical structure 21 that it seeks to delineate") and detection of events occurring in the recording of content;
generating metadata information based at least in part on the analyzing, the metadata information identifying the recognized objects and detected events in the recording;
identifying, based at least in part on at least one of a user preference or a detected event, a region of interest or an object of interest in the recording of content (See Guetter: Figs. 1-2, and [0031], "Segmentation may then be performed on the acquired image data (Step S12). The performance of segmentation may include identification of one or more regions-of-interest ("ROls") within the image data"); 
based at least in part on the identified region of interest or object of interest, generating a modified version of the recording of content, the modified version incorporating at least the generated metadata information (See Guetter: Figs. 1-2, and [0032], "The user may then perform one or more edits to adjust the computer-calculated segmentation (Step S13). Editing of the segmentation results may be performed either at predetermined frames or frames selected by the user. The user may use a cursor or touchscreen to manually adjust the segmentation results to better fit the anatomical structure. FIG. 2(c) illustrates the segmentation results 22 being edited by a user to more closely fit the anatomical structure 21 by placing one or more cursors 23 at portions of the segmentation results 22 that are most in need of editing. Implementation of editing may be assisted by algorithms so that the user does not necessarily need to edit the segmentation results at a pixel-by-pixel level. FIG. 2(d) illustrates the edited segmentation results 22"'); and
storing the modified version of the recording of content for subsequent playback on the electronic device (See Guetter: Fig. 5, and [0060], "FIG. 5 shows an example of a computer system which may implement a method and system of the present disclosure. The system and method of the present disclosure may be implemented in the form of a software application running on a computer system, for example, a mainframe, personal computer (PC), handheld computer, server, etc. The software application may be stored on a recording media locally accessible by the computer system and accessible via a hard wired or wireless connection to a network, for example, a local area network, or the Internet").
However, Guetter fails to explicitly disclose that detection of events occurring in the recording of content; and generating metadata information based at least in part on the analyzing, the metadata information identifying the recognized objects and detected events in the recording.
However, Gloudemans teaches that detection of events occurring in the recording of content (See Gloudemans: Figs. 19A-H, and [0131], "For instance, video texture can be applied from images which are obtained prior to, and/or during the event. At step 934, the texture is updated, e.g., based on a user command (on demand), or automatically based on detection of a specified event. A user interface device such as a button 943 allows the operator to update the texture of a stadium, such as from a current image. An updated textured 3d model of the event can be obtained by updating the initial textured 3d model. Moreover, updating can occur automatically when a specified event occurs. The specified event can be a specified time, e.g., after a specified period has passed since a last update or at specified times of day, e.g., relative to sunset or sunrise. The appearance of a stadium can change due to various factors, such as changing lighting in the stadium (e.g., due to presence of sun or clouds, or due to use or non- use of stadium electric lights), changes in the number of fans in the stands, changes in advertisements or signs in the stadium, movement of a roof of the stadium or other reconfiguration of the stadium or other event site, and so forth. In one approach, the texture is updated when an image used in an animation is captured. The determination of whether a specified event occurs which should trigger automatic updating of the texturing can be achieved in different ways, such as a light sensor to detect ambient light level, or a sensor which detects whether stadium lights have been turned on, for instance. Similarly, a timing device or process can be used to determine if a specified period has passed since last update or a specified time of day is reached"). 
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention was effectively filed to modify Guetter to have detection of events occurring in the recording of content as taught by Gloudemans in order to allow the viewer to better see how the players align themselves on the field, who is blocking who, and so forth (See Gloudemans: Figs. 9A-H, and [0129], "Fading of selected players is an advantage in that it allows the viewer to better see how the players align themselves on the field, who is blocking who, and so forth"). Guetter teaches a method and system that may select and edit the ROI (region of interest) of the captures images, and deriver the edited images to users, and Gloudemans teaches a system and method that may detect objects and events of the sports, and generate the virtual viewpoint display of the objects and events to users. Therefore, it is obvious to one of ordinary skill in the art to modify Guetter by Gloudemans to detect the objects and events in the captured image and then determine the ROI. The motivation to modify Guetter by Gloudemans is "Use of known technique to improve similar devices (methods, or products) in the same way".
However, Guetter, modified by Gloudemans, fails to explicitly disclose that generating metadata information based at least in part on the analyzing, the metadata information identifying the recognized objects and detected events in the recording.
However, Conwell teaches that generating metadata information based at least in part on the analyzing, the metadata information identifying the recognized objects and detected events in the recording (See Conwell: Fig. 1, and [0063], "One option is to submit the metadata, along with the captured content or data derived from the captured content (e.g., the FIG. 1 image, image feature data such as eigenvalues, machine readable data decoded from the image, etc.), to a service provider that acts on the submitted data, and provides a response to the user. Shazam, Snapnow, ClusterMedia Labs, Snaptell, Mobot, Mobile Acuity and Digimarc Mobile, are a few of several commercially available services that capture media content, and provide a corresponding response; others are detailed in the earlier-cited patent publications. By accompanying the content data with the metadata, the service provider can make a more informed judgment as to how it should respond to the user's submission").
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention was effectively filed to modify Guetter to have generating metadata information based at least in part on the analyzing, the metadata information identifying the recognized objects and detected events in the recording as taught by Conwell in order to be the most efficient mechanism to deliver similar content to users (See Conwell: Fig. 32, and [0317], "While P2P networks such as BitTorrent have permitted sharing of audio, image and video content, arrangements like that shown in FIG. 32 allow networks to share a contextually-richer set of experiential content. A basic tenet of P2P networks is that even in the face of technologies that that mine the long-tail of content, the vast majority of users are interested in similar content (the score of tonight's NBA game, the current episode of Lost, etc.) and that given sufficient bandwidth and protocols, the most efficient mechanism to deliver similar content to users not by sending individual streams, but by piecing the content together based on what your "neighbors" have on the network. This same mechanism can be used to provide metadata related to enhancing an experience such as being at the bar drinking a Dopplebock, or watching a highlight of tonight's NBA game on a phone while at the bar. The protocol used in the ad-hoc network described above, might leverage P2P protocols with the experience server providing a peer registration service (similar to early P2P networks) or in a true P2P modality, with all devices in the ad-hoc network advertising what experiences (metadata, content, social connections, etc.) they have available (either for free or for barter of information in-kind, etc.)"). Guetter teaches a method and system that may select and edit the ROI (region of interest) of the captures images, and deriver the edited images to users, and Conwell teaches a system and method that may generate metadata about the image and these metadata may provide automatic recognition of objects depicted in the images. Therefore, it is obvious to one of ordinary skill in the art to modify Guetter by Conwell to generate metadata of the images in order to provide automatic detecting the objects and events in the captured image and then determine the ROI. The motivation to modify Guetter by Conwell is "Use of known technique to improve similar devices (methods, or products) in the same way".
Regarding claim 2, Guetter, Gloudemans, and Conwell teach all the features with respect to claim 1 as outlined above. Further, Gloudemans teaches that the method of claim 1, further comprising:
providing the modified version of the recording to the electronic device for playback (See Gloudemans: Figs. 10A-C, and [0135], "FIG. 10a depicts a process for enabling a user to run an animation. In one approach, an animation which provides different virtual viewpoints of a live event is created by an operator, such as a technician, who is associated with a television broadcast company, and the animation is provided as part of the broadcast, such as during a replay of a particular event of interest, or during a half time analysis show. The viewer/user at his or her home may not have any control of the creation of playback of the animation in this approach. In another approach, the user can be provided with such a capability. This can provide added entertainment to the user. Further, a service provider may charge a fee for this added capability, resulting in additional revenue").
Regarding claim 3, Guetter, Gloudemans, and Conwell teach all the features with respect to claim 1 as outlined above. Further, Gloudemans and Conwell teach that the method of claim 1, wherein analyzing the recording of content further comprises:
detecting an object based at least in part on running an object classifier on the recording (See Gloudemans: Figs. 6A-B, and [0091], "In an example implementation, players in a sports event are detected. The player finder process begins at step 600. Step 602 includes capturing an image of a live event. For example, FIG. 6b depicts a video image 640 which is provided in a user interface seen by an operator. The user interface includes a set of tabs. Each tab can be selected by the operator to perform a different function. The currently selected tab 643 is entitled "Find Players." Step 604 includes setting a color matte. For example, FIG. 6d depicts a color matte image 660 of the video image of FIG. 6b. The image includes the field lines 664 and a field 666. The field lines 664, which are white in the original image, are depicted as being dark. Similarly, the field 666, which is a dark green in the original image, is depicted as being light. The currently selected tab 662 is entitled "Color Matte." A color matter image is usually black and white. What is shown in FIG. 6d is an overlay where the areas that would normally be black in the color matte are drawn in yellow over the original image. The yellow appears as a lighter color. The color of the lines in the image was not changed. The lines just appear dark next to the yellow"); and
recognizing the object using at least a database of objects (See Conwell: Fig. 19, and [0107], "A related example is a system that responds to a user-captured image of a car by identifying the car (using image features and associated database(s)), searching EBay and Craigslist for similar cars, and presenting the results on the screen. Pressing button 16b presents screens of information about cars offered for sale (e.g., including image, seller location, and price) based on similarity to the input image (same model year/same color first, and then nearest model years/colors), nationwide. Pressing button 16d yields such a sequence of screens, but limited to the user's state (or metropolitan region, or a 50 mile radius of the user's location, etc.). Pressing button 16a yields such a sequence of screens, again limited geographically, but this time presented in order of ascending price (rather than closest model year/color). Again, pressing the middle button loads the full web page (EBay or Craigslist) of the car last-displayed"), the database of objects including information to identify the object using multiple attributes of the object (See Gloudemans: Figs. 9A-H, and [0116], "Step 903 includes detecting an object in an image which is to be replaced by a virtual 3d object. For example, a goal post or other goal structure (e.g., as used in American football, soccer, hockey, basketball or lacrosse) may be detected in a sport event. The detection may be made automatically, e.g., without operator input, using image recognition techniques and knowledge of characteristics of the object and its location in the live event. For example, the known physical shape and color of the object can be used to assist detection. Further, the known predetermined location of the object in the live event and camera registration data can be used to assist detection. As an example, a goal post in a soccer game is typically white and has a specified size, shape and location in the live event in accordance with game regulations. The object can therefore be detected by examining pixels in a portion of the image which corresponds to the predetermined location in the live event. Once the object is detected, the pixels which make up the object can be removed. Optionally, the removed pixels can be automatically blended in with surrounding pixels, which might be green pixels of the field. Or some manual editing may be performed. However, generally such blending in or editing may not be needed as the virtual goal post which is used in the model accurately replaces the removed pixels. In one approach, the pixels are only replaced when viewing from the original (non-virtual) camera angle. Optionally, step 903 can be skipped").
Regarding claim 4, Guetter, Gloudemans, and Conwell teach all the features with respect to claim 3 as outlined above. Further, Conwell teaches that the method of claim 3, further comprising:
generating associated metadata information to the object based at least in part on recognizing the object (See Conwell: Figs. 16A-B, and [0141], "Metadata from this set of images is collected. The metadata can be of various types. One is words/phrases from a title given to an image. Another is information in metatags assigned to the image--usually by the photographer (e.g., naming the photo subject and certain attributes/keywords), but additionally by the capture device (e.g., identifying the camera model, the date/time of the photo, the location, etc.). Another is words/phrases in a narrative description of the photo authored by the photographer").
Regarding claim 5, Guetter, Gloudemans, and Conwell teach all the features with respect to claim 1 as outlined above. Further, Gloudemans and Conwell teach that the method of claim 1, wherein analyzing the recording of content further comprises:
detecting a presence of a moving person or a moving object in the recording (See Gloudemans: Figs. 6A-E, and [0090], "FIG. 6a depicts a player finder process. Generally, this process can be used to detect objects in an image, where the objects have particular characteristics, e.g., size, shape, aspect ratio, density and color profile. In one approach, the players are extracted from an image using an operator assisted color mask method after the field lines have been removed. This approach can provide benefits compared to a difference method, for instance, in which the moving components of a frame are detected. However, the difference method or other techniques may alternatively be used");
determining a motion vector of the moving person or the moving object (See Gloudemans: Fig. 2, and [0058], "Further, the line of position can be represented by a vector (LOP) which has unity magnitude, in one approach. The vector can be defined by two points along the LOP. The vector can be represented in the world coordinate system 230 using an appropriate transformation from the image coordinate system"); and
determining acoustic information corresponding to the moving person or the moving object from at least one of speech, voice, or audio (See Conwell: Fig. 32, and [0300], "FIG. 32 shows an arrangement employing several computers (A-E), some of which may be wearable computers (e.g., cell phones). The computers include the usual complement of processor, memory, storage, input/output, etc. The storage or memory can contain content, such as images, audio and video. The computers can also include one or more routers and/or response engines. Standalone routers and response engines may also be coupled to the network The computers are networked, shown schematically by link 50. This connection can be by any known networking arrangement, including the internet and/or wireless links (WiFi, WiMax, Bluetooth, etc.), Software in at least certain of the computers includes a peer-to-peer (P2P) client, which makes that computer's resources available to other computers on the network, and reciprocally enables that computer to employ resources of the other computers").
Regarding claim 7, Guetter, Gloudemans, and Conwell teach all the features with respect to claim 1 as outlined above. Further, Gloudemans and Conwell teach that the method of claim 1, wherein identifying, based at least in part on at least one of the user preference or the detected event, the region of interest further comprises:
detecting a presence of a particular person in the recording (See Gloudemans: Fig. 6A, and [0093], "Step 608 includes running a line finder and repair process, discussed further in connection with FIG. Sa, to remove lines from a mask image and/or to repair occluded portions of lines. For example, it may only be desired to remove the lines without repairing them before finding the blobs. In this case, step 520 of FIG. Sa can be omitted. Step 610 includes running a blob finding algorithm to detect players in the image. A blob finding algorithm refers to any algorithm which can detect an object in an image, where the object has specified characteristics. In one example implementation, the cvBlobsLib source code, available from Intel's Open Source Computer Vision Library, can be used");
determining the particular person is indicated as a person of interest based at least in part on the user preference (See Gloudemans: Figs. 6A-B, and [0093], "Step 610 includes running a blob finding algorithm to detect players in the image. A blob finding algorithm refers to any algorithm which can detect an object in an image, where the object has specified characteristics. In one example implementation, the cvBlobsLib source code, available from Intel's Open Source Computer Vision Library, can be used. Step 612 includes determining a bounding box for each blob, e.g., according to the height and width of each blob. For example, FIG. 6b depicts bounding boxes 644, 645 and 647-650. Step 614 includes transforming the box height in pixels to a real world player height based on the camera registration. For example, a certain pixel height in image space will correspond to a certain player height in feet. The height varies depending on where the player is in the image. An average player height can be used. Note that a player can be detected from a single frame without the need to track players across multiple video frames");
detecting an anomaly related to the particular person or the detected event in the recording (See Conwell: Figs. 24-26, and [0220], "If the subject is inferred to be a photo of a family member or friend, one screen presented to the user gives the option of posting a copy of the photo to the user's FaceBook page, annotated with the person(s)'s likely name(s). (Determining the names of persons depicted in a photo can be done by submitting the photo to the user's account at Picasa. Picasa performs facial recognition operations on submitted user images, and correlates facial eigenvectors with individual names provided by the user, thereby compiling a user-specific database of facial recognition information for friends and others depicted in the user's prior images.) Another screen starts a text message to the individual, with the addressing information having been obtained from the user's address book, indexed by the Picasa-determined identity. The user can pursue any or all of the presented options by switching between the associated screens"); and
generating metadata corresponding to the detected anomaly (See Conwell: Figs. 6A-B, and [0223], "One such open-ended approach is to submit the twice-weighted metadata noted above (e.g., "D" in FIG. 26B) to a general purpose search engine. Google, per se, is not necessarily best for this function, because current Google searches require that all search terms be found in the results. Better is a search engine that does fuzzy searching, and is responsive to differently-weighted keywords--not all of which need be found. The results can indicate different seeming relevance, depending on which keywords are found, where they are found, etc. (A result including "Prometheus" but lacking "RCA Building" would be ranked more relevant than a result including the latter but lacking the former.)"; and [0221], "If the subject appears to be a stranger (e.g., not recognized by Picasa), the system will have earlier undertaken an attempted recognition of the person using publicly available facial recognition information. (Such information can be extracted from photos of known persons").
Regarding claim 8, Guetter, Gloudemans, and Conwell teach all the features with respect to claim 1 as outlined above. Further, Gloudemans teaches that the method of claim 1, wherein identifying, based at least in part on at least one of the user preference or the detected event, the region of interest further comprises:
detecting a presence of a particular object in the recording (See Gloudemans: Fig. 6A, and [0093], "Step 608 includes running a line finder and repair process, discussed further in connection with FIG. Sa, to remove lines from a mask image and/or to repair occluded portions of lines. For example, it may only be desired to remove the lines without repairing them before finding the blobs. In this case, step 520 of FIG. Sa can be omitted. Step 610 includes running a blob finding algorithm to detect players in the image. A blob finding algorithm refers to any algorithm which can detect an object in an image, where the object has specified characteristics. In one example implementation, the cvBlobsLib source code, available from Intel's Open Source Computer Vision Library, can be used"); and 
determining the particular object is indicated as an object of interest based at least in part on the user preference (See Gloudemans: Figs. 6A-B, and [0093], "Step 610 includes running a blob finding algorithm to detect players in the image. A blob finding algorithm refers to any algorithm which can detect an object in an image, where the object has specified characteristics. In one example implementation, the cvBlobsLib source code, available from Intel's Open Source Computer Vision Library, can be used. Step 612 includes determining a bounding box for each blob, e.g., according to the height and width of each blob. For example, FIG. 6b depicts bounding boxes 644, 645 and 647-650. Step 614 includes transforming the box height in pixels to a real world player height based on the camera registration. For example, a certain pixel height in image space will correspond to a certain player height in feet. The height varies depending on where the player is in the image. An average player height can be used. Note that a player can be detected from a single frame without the need to track players across multiple video frames").
Regarding claim 9, Guetter, Gloudemans, and Conwell teach all the features with respect to claim 2 as outlined above. Further, Gloudemans teaches that the method of claim 2, wherein playback of the modified version of the recording is based at least in part on focusing on the region of interest (See Gloudemans: Figs. 10A-C, and [0135], "FIG. 10a depicts a process for enabling a user to run an animation. In one approach, an animation which provides different virtual viewpoints of a live event is created by an operator, such as a technician, who is associated with a television broadcast company, and the animation is provided as part of the broadcast, such as during a replay of a particular event of interest, or during a half time analysis show. The viewer/user at his or her home may not have any control of the creation of playback of the animation in this approach. In another approach, the user can be provided with such a capability. This can provide added entertainment to the user. Further, a service provider may charge a fee for this added capability, resulting in additional revenue").
Regarding claim 10, Guetter, Gloudemans, and Conwell teach all the features with respect to claim 1 as outlined above. Further, Gloudemans teaches that the method of claim 1, wherein analyzing the recording of content ls provided in a secure environment, the secure environment isolated from other executing processes (See Gloudemans: Figs. 3A-B, and [0064], "Further, the functionality described herein may be implemented using one or more processor readable storage devices having processor readable code embodied thereon for programming one or more processors to perform the processes described herein. The processor readable storage devices can include computer readable media such as volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above are also included within the scope of computer readable media").
Regarding claim 11, Guetter, Gloudemans, and Conwell teach all the features with respect to claim 1 as outlined above. Further, Guetter, Gloudemans, and Conwell teach that a system (See Guetter: Figs. 1-2, and [0029], "FIG. 1 is a flow chart illustrating a method for propagating user edits of computer-derived segmentation throughout a time series of images according to an exemplary embodiment of the present invention. FIG. 2 is a sequence of illustrations showing various stages of computer-derived segmentation according to the method of FIG. 1") comprising:
a processor (See Guetter: Fig. 5, and [0061], "The computer system referred to generally as system 1000 may include, for example, a central processing unit (CPU) 1001, random access memory (RAM) 1004, a printer interface 1010, a display unit 1011, a local area network (LAN) data transmission controller 1005, a LAN interface 1006, a network controller 1003, an internal bus 1002, and one or more input devices 1009, for example, a keyboard, mouse etc. As shown, the system 1000 may be connected to a data storage device, for example, a hard disk, 1008 via a link 1007"); 
a memory device containing instructions, which when executed by the processor cause the processor to perform operations (See Guetter: Fig. 5, and [0061], "The computer system referred to generally as system 1000 may include, for example, a central processing unit (CPU) 1001, random access memory (RAM) 1004, a printer interface 1010, a display unit 1011, a local area network (LAN) data transmission controller 1005, a LAN interface 1006, a network controller 1003, an internal bus 1002, and one or more input devices 1009, for example, a keyboard, mouse etc. As shown, the system 1000 may be connected to a data storage device, for example, a hard disk, 1008 via a link 1007"comprising:
receiving a recording of content, the recording having been captured by an electronic device (See Guetter: Figs. 1-2, and [0030], "First, image data may be acquired (Step S11). As discussed above, image data may be medical image data. The image data may be of N-dimensions, where N is a positive integer, and may include a time series of images. The data may therefore be referred to herein as N+1 dimensional image data as it may include N dimensions (1, 2, or 3 of which may be spatial dimensions) and the "+1" dimension representing time. FIG. 2(a) is an illustration representing acquired image data. An anatomical structure 21 may be seen within the illustration");
analyzing the recording of content, the analyzing including recognition of objects in the recording of content (See Guetter: Figs. 1-2, and [0031], "Segmentation may then be performed on the acquired image data (Step S12). The performance of segmentation may include identification of one or more regions-of-interest ("ROls") within the image data. The segmentation may be performed across multiple frames of the acquired image, for example, all frames may be segmented. The segmentation may be fully automatic or may involve initial user input, for example, the selection of seed points within at least one image frame. Segmentation may be performed using a known segmentation algorithm or using one or more trained classifiers. FIG. 2(b) is an illustration representing the acquired image data, including the anatomical structure 21, and segmentation results 22. As can be seen, there may be regions in which the segmentation of the ROI deviates from the anatomical structure 21 that it seeks to delineate") and detection of events occurring in the recording of content (See Gloudemans: Figs. 19A-H, and [0131], "For instance, video texture can be applied from images which are obtained prior to, and/or during the event. At step 934, the texture is updated, e.g., based on a user command (on demand), or automatically based on detection of a specified event. A user interface device such as a button 943 allows the operator to update the texture of a stadium, such as from a current image. An updated textured 3d model of the event can be obtained by updating the initial textured 3d model. Moreover, updating can occur automatically when a specified event occurs. The specified event can be a specified time, e.g., after a specified period has passed since a last update or at specified times of day, e.g., relative to sunset or sunrise. The appearance of a stadium can change due to various factors, such as changing lighting in the stadium (e.g., due to presence of sun or clouds, or due to use or non-use of stadium electric lights), changes in the number of fans in the stands, changes in advertisements or signs in the stadium, movement of a roof of the stadium or other reconfiguration of the stadium or other event site, and so forth. In one approach, the texture is updated when an image used in an animation is captured. The determination of whether a specified event occurs which should trigger automatic updating of the texturing can be achieved in different ways, such as a light sensor to detect ambient light level, or a sensor which detects whether stadium lights have been turned on, for instance. Similarly, a timing device or process can be used to determine if a specified period has passed since last update or a specified time of day is reached"); 
generating metadata information based at least in part on the analyzing, the metadata information identifying the recognized objects and detected events in the recording (See Conwell: Fig. 1, and [0063], "One option is to submit the metadata, along with the captured content or data derived from the captured content (e.g., the FIG. 1 image, image feature data such as eigenvalues, machine readable data decoded from the image, etc.), to a service provider that acts on the submitted data, and provides a response to the user. Shazam, Snapnow, ClusterMedia Labs, Snaptell, Mobot, Mobile Acuity and Digimarc Mobile, are a few of several commercially available services that capture media content, and provide a corresponding response; others are detailed in the earlier-cited patent publications. By accompanying the content data with the metadata, the service provider can make a more informed judgment as to how it should respond to the user's submission");
identifying, based at least in part on at least one of a user preference or a detected event, a region of interest or an object of interest in the recording of content (See Guetter: Figs. 1-2, and [0031], "Segmentation may then be performed on the acquired image data (Step S12). The performance of segmentation may include identification of one or more regions-of-interest ("ROls") within the image data");
based at least in part on the identified region of interest or object of interest, generating a modified version of the recording of content, the modified version incorporating at least the generated metadata information(See Guetter: Figs. 1-2, and [0032], "The user may then perform one or more edits to adjust the computer-calculated segmentation (Step S13). Editing of the segmentation results may be performed either at predetermined frames or frames selected by the user. The user may use a cursor or touchscreen to manually adjust the segmentation results to better fit the anatomical structure. FIG. 2(c) illustrates the segmentation results 22 being edited by a user to more closely fit the anatomical structure 21 by placing one or more cursors 23 at portions of the segmentation results 22 that are most in need of editing. Implementation of editing may be assisted by algorithms so that the user does not necessarily need to edit the segmentation results at a pixel-by-pixel level. FIG. 2(d) illustrates the edited segmentation results 22"'); and
storing the modified version of the recording of content for subsequent playback on the electronic device (See Guetter: Fig. 5, and [0060], "FIG. 5 shows an example of a computer system which may implement a method and system of the present disclosure. The system and method of the present disclosure may be implemented in the form of a software application running on a computer system, for example, a mainframe, personal computer (PC), handheld computer, server, etc. The software application may be stored on a recording media locally accessible by the computer system and accessible via a hard wired or wireless connection to a network, for example, a local area network, or the Internet").
Regarding claim 12, Guetter, Gloudemans, and Conwell teach all the features with respect to claim 11 as outlined above. Further, Gloudemans teaches that the system of claim 11, wherein the memory device contains further instructions, which when executed by the processor further cause the processor to perform further operations comprising:
providing the modified version of the recording to the electronic device for playback (See Gloudemans: Figs. 10A-C, and [0135], "FIG. 10a depicts a process for enabling a user to run an animation. In one approach, an animation which provides different virtual viewpoints of a live event is created by an operator, such as a technician, who is associated with a television broadcast company, and the animation is provided as part of the broadcast, such as during a replay of a particular event of interest, or during a half time analysis show. The viewer/user at his or her home may not have any control of the creation of playback of the animation in this approach. In another approach, the user can be provided with such a capability. This can provide added entertainment to the user. Further, a service provider may charge a fee for this added capability, resulting in additional revenue").
Regarding claim 13, Guetter, Gloudemans, and Conwell teach all the features with respect to claim 11 as outlined above. Further, Gloudemans and Conwell teach that the system of claim 11, wherein analyzing the recording of contents causes the processor to perform further operations comprising:
detecting an object based at least in part on running an object classifier in the recording (See Gloudemans: Figs. 6A-B, and [0091], "In an example implementation, players in a sports event are detected. The player finder process begins at step 600. Step 602 includes capturing an image of a live event. For example, FIG. 6b depicts a video image 640 which is provided in a user interface seen by an operator. The user interface includes a set of tabs. Each tab can be selected by the operator to perform a different function. The currently selected tab 643 is entitled "Find Players." Step 604 includes setting a color matte. For example, FIG. 6d depicts a color matte image 660 of the video image of FIG. 6b. The image includes the field lines 664 and a field 666. The field lines 664, which are white in the original image, are depicted as being dark. Similarly, the field 666, which is a dark green in the original image, is depicted as being light. The currently selected tab 662 is entitled "Color Matte." A color matter image is usually black and white. What is shown in FIG. 6d is an overlay where the areas that would normally be black in the color matte are drawn in yellow over the original image. The yellow appears as a lighter color. The color of the lines in the image was not changed. The lines just appear dark next to the yellow");
recognizing the object using at least a database of objects, the database of object including information to identify the object using multiple attributes of the object (See Conwell: Fig. 19, and [0107], "A related example is a system that responds to a user-captured image of a car by identifying the car (using image features and associated database(s)), searching EBay and Craigslist for similar cars, and presenting the results on the screen. Pressing button 16b presents screens of information about cars offered for sale (e.g., including image, seller location, and price) based on similarity to the input image (same model year/same color first, and then nearest model years/colors), nationwide. Pressing button 16d yields such a sequence of screens, but limited to the user's state (or metropolitan region, or a 50 mile radius of the user's location, etc.). Pressing button 16a yields such a sequence of screens, again limited geographically, but this time presented in order of ascending price (rather than closest model year/color). Again, pressing the middle button loads the full web page (EBay or Craigslist) of the car last-displayed"), the database of objects including information to identify the object using multiple attributes of the object (See Gloudemans: Figs. 9A-H, and [0116], "Step 903 includes detecting an object in an image which is to be replaced by a virtual 3d object. For example, a goal post or other goal structure (e.g., as used in American football, soccer, hockey, basketball or lacrosse) may be detected in a sport event. The detection may be made automatically, e.g., without operator input, using image recognition techniques and knowledge of characteristics of the object and its location in the live event. For example, the known physical shape and color of the object can be used to assist detection. Further, the known predetermined location of the object in the live event and camera registration data can be used to assist detection. As an example, a goal post in a soccer game is typically white and has a specified size, shape and location in the live event in accordance with game regulations. The object can therefore be detected by examining pixels in a portion of the image which corresponds to the predetermined location in the live event. Once the object is detected, the pixels which make up the object can be removed. Optionally, the removed pixels can be automatically blended in with surrounding pixels, which might be green pixels of the field. Or some manual editing may be performed. However, generally such blending in or editing may not be needed as the virtual goal post which is used in the model accurately replaces the removed pixels. In one approach, the pixels are only replaced when viewing from the original (non-virtual) camera angle. Optionally, step 903 can be skipped").

Regarding claim 14, Guetter, Gloudemans, and Conwell teach all the features with respect to claim 13 as outlined above. Further, Conwell teaches that the system of claim 13, wherein the memory device contains further instructions, which when executed by the processor further cause the processor to perform further operations comprising:
generating associated metadata information to the object based at least in part on recognizing the object (See Conwell: Figs. 16A-B, and [0141], "Metadata from this set of images is collected. The metadata can be of various types. One is words/phrases from a title given to an image. Another is information in metatags assigned to the image--usually by the photographer (e.g., naming the photo subject and certain attributes/keywords), but additionally by the capture device (e.g., identifying the camera model, the date/time of the photo, the location, etc.). Another is words/phrases in a narrative description of the photo authored by the photographer").
Regarding claim 15, Guetter, Gloudemans, and Conwell teach all the features with respect to claim 11 as outlined above. Further, Gloudemans teaches that the system of claim 11, wherein analyzing the recording of content causes the processor to perform further operations comprising:
detecting a presence of a moving person or a moving object in the recording (See Gloudemans: Figs. 6A-E, and [0090], "FIG. 6a depicts a player finder process. Generally, this process can be used to detect objects in an image, where the objects have particular characteristics, e.g., size, shape, aspect ratio, density and color profile. In one approach, the players are extracted from an image using an operator assisted color mask method after the field lines have been removed. This approach can provide benefits compared to a difference method, for instance, in which the moving components of a frame are detected. However, the difference method or other techniques may alternatively be used"); and
determining a motion vector of the moving person or the moving object (See Gloudemans: Fig. 2, and [0058], "Further, the line of position can be represented by a vector (LOP) which has unity magnitude, in one approach. The vector can be defined by two points along the LOP. The vector can be represented in the world coordinate system 230 using an appropriate transformation from the image coordinate system").
Regarding claim 17, Guetter, Gloudemans, and Conwell teach all the features with respect to claim 11 as outlined above. Further, Gloudemans and Conwell teach that the system of claim 11, wherein identifying, based at least in part on at least one of the user preference or the detected event, the region of interest causes the processor to perform further operations further comprising:
detecting a presence of a particular person in the recording (See Gloudemans: Fig. 6A, and [0093], "Step 608 includes running a line finder and repair process, discussed further in connection with FIG. Sa, to remove lines from a mask image and/or to repair occluded portions of lines. For example, it may only be desired to remove the lines without repairing them before finding the blobs. In this case, step 520 of FIG. Sa can be omitted. Step 610 includes running a blob finding algorithm to detect players in the image. A blob finding algorithm refers to any algorithm which can detect an object in an image, where the object has specified characteristics. In one example implementation, the cvBlobsLib source code, available from Intel's Open Source Computer Vision Library, can be used");
determining the particular person is indicated as a person of interest based at least in part on the user preference (See Gloudemans: Figs. 6A-B, and [0093], "Step 610 includes running a blob finding algorithm to detect players in the image. A blob finding algorithm refers to any algorithm which can detect an object in an image, where the object has specified characteristics. In one example implementation, the cvBlobsLib source code, available from Intel's Open Source Computer Vision Library, can be used. Step 612 includes determining a bounding box for each blob, e.g., according to the height and width of each blob. For example, FIG. 6b depicts bounding boxes 644, 645 and 647-650. Step 614 includes transforming the box height in pixels to a real world player height based on the camera registration. For example, a certain pixel height in image space will correspond to a certain player height in feet. The height varies depending on where the player is in the image. An average player height can be used. Note that a player can be detected from a single frame without the need to track players across multiple video frames");
detecting an anomaly related to the particular person or detected event in the recording (See Conwell: Figs. 24-26, and [0220], "If the subject is inferred to be a photo of a family member or friend, one screen presented to the user gives the option of posting a copy of the photo to the user's FaceBook page, annotated with the person(s)'s likely name(s). (Determining the names of persons depicted in a photo can be done by submitting the photo to the user's account at Picasa. Picasa performs facial recognition operations on submitted user images, and correlates facial eigenvectors with individual names provided by the user, thereby compiling a user-specific database of facial recognition information for friends and others depicted in the user's prior images.) Another screen starts a text message to the individual, with the addressing information having been obtained from the user's address book, indexed by the Picasa- determined identity. The user can pursue any or all of the presented options by switching between the associated screens"); and
generating metadata corresponding to the detected anomaly (See Conwell: Figs. 6A-B, and [0223], "One such open-ended approach is to submit the twice-weighted metadata noted above (e.g., "D" in FIG. 26B) to a general purpose search engine. Google, per se, is not necessarily best for this function, because current Google searches require that all search terms be found in the results. Better is a search engine that does fuzzy searching, and is responsive to differently-weighted keywords--not all of which need be found. The results can indicate different seeming relevance, depending on which keywords are found, where they are found, etc. (A result including "Prometheus" but lacking "RCA Building" would be ranked more relevant than a result including the latter but lacking the former.)"; and [0221], "If the subject appears to be a stranger (e.g., not recognized by Picasa), the system will have earlier undertaken an attempted recognition of the person using publicly available facial recognition information. (Such information can be extracted from photos of known persons").
Regarding claim 18, Guetter, Gloudemans, and Conwell teach all the features with respect to claim 11 as outlined above. Further, Gloudemans teaches that the system of claim 11, wherein identifying, based at least in part on at least one of the user preference or the detected event, the region of interest causes the processor to perform further operations further comprising:
detecting a presence of a particular object in the recording (See Gloudemans: Fig. 6A, and [0093], "Step 608 includes running a line finder and repair process, discussed further in connection with FIG. Sa, to remove lines from a mask image and/or to repair occluded portions of lines. For example, it may only be desired to remove the lines without repairing them before finding the blobs. In this case, step 520 of FIG. Sa can be omitted. Step 610 includes running a blob finding algorithm to detect players in the image. A blob finding algorithm refers to any algorithm which can detect an object in an image, where the object has specified characteristics. In one example implementation, the cvBlobsLib source code, available from Intel's Open Source Computer Vision Library, can be used"); and
determining the particular object is indicated as an object of interest based at least in part on the user preference (See Gloudemans: Figs. 6A-B, and [0093], "Step 610 includes running a blob finding algorithm to detect players in the image. A blob finding algorithm refers to any algorithm which can detect an object in an image, where the object has specified characteristics. In one example implementation, the cvBlobsLib source code, available from Intel's Open Source Computer Vision Library, can be used. Step 612 includes determining a bounding box for each blob, e.g., according to the height and width of each blob. For example, FIG. 6b depicts bounding boxes 644, 645 and 647-650. Step 614 includes transforming the box height in pixels to a real world player height based on the camera registration. For example, a certain pixel height in image space will correspond to a certain player height in feet. The height varies depending on where the player is in the image. An average player height can be used. Note that a player can be detected from a single frame without the need to track players across multiple video frames").
Regarding claim 19, Guetter, Gloudemans, and Conwell teach all the features with respect to claim 12 as outlined above. Further, Gloudemans teaches that the system of claim 12, wherein at least one of the detected events corresponds to motion of at least one of the recognized objects in the recording of content (See Gloudemans: Figs. 6A-E, and [0098], "Note that in this and other examples, the objects need not be players in a sporting event. The objects can be other participants in a sporting event, such as referees. Further, non-human objects may participate in a sporting event, either with or without humans, such as a horse in a polo contest or horse race. Also, a sporting event can be indoors or outdoors. Further, the event need not be a sporting event but can be any type of event in which physical movement of objects is of interest. As another example, an event can be analyzed for security purposes, accident reconstruction purposes and so forth").
Regarding claim 20, Guetter, Gloudemans, and Conwell teach all the features with respect to claim 1 as outlined above. Further, Guetter, Gloudemans, and Conwell teach that a non-transitory computer-readable medium comprising instructions, which when executed by a computing device, cause the computing device to perform operations (See Guetter: Figs. 1-2, and [0029], "FIG. 1 is a flow chart illustrating a method for propagating user edits of computer- derived segmentation throughout a time series of images according to an exemplary embodiment of the present invention. FIG. 2 is a sequence of illustrations showing various stages of computer-derived segmentation according to the method of FIG. 1") comprising:
receiving a recording of content, the recording having been captured by an electronic device (See Guetter: Figs. 1-2, and [0030], "First, image data may be acquired (Step S11). As discussed above, image data may be medical image data. The image data may be of N-dimensions, where N is a positive integer, and may include a time series of images. The data may therefore be referred to herein as N+1 dimensional image data as it may include N dimensions (1, 2, or 3 of which may be spatial dimensions) and the "+1" dimension representing time. FIG. 2(a) is an illustration representing acquired image data. An anatomical structure 21 may be seen within the illustration");
analyzing the recording of content, the analyzing including recognition of objects in the recording of content (See Guetter: Figs. 1-2, and [0031], "Segmentation may then be performed on the acquired image data (Step S12). The performance of segmentation may include identification of one or more regions-of-interest ("ROls") within the image data. The segmentation may be performed across multiple frames of the acquired image, for example, all frames may be segmented. The segmentation may be fully automatic or may involve initial user input, for example, the selection of seed points within at least one image frame. Segmentation may be performed using a known segmentation algorithm or using one or more trained classifiers. FIG. 2(b) is an illustration representing the acquired image data, including the anatomical structure 21, and segmentation results 22. As can be seen, there may be regions in which the segmentation of the ROI deviates from the anatomical structure 21 that it seeks to delineate") and detection of events occurring in the recording of content (See Gloudemans: Figs. 19A-H, and [0131], "For instance, video texture can be applied from images which are obtained prior to, and/or during the event. At step 934, the texture is updated, e.g., based on a user command (on demand), or automatically based on detection of a specified event. A user interface device such as a button 943 allows the operator to update the texture of a stadium, such as from a current image. An updated textured 3d model of the event can be obtained by updating the initial textured 3d model. Moreover, updating can occur automatically when a specified event occurs. The specified event can be a specified time, e.g., after a specified period has passed since a last update or at specified times of day, e.g., relative to sunset or sunrise. The appearance of a stadium can change due to various factors, such as changing lighting in the stadium (e.g., due to presence of sun or clouds, or due to use or non-use of stadium electric lights), changes in the number of fans in the stands, changes in advertisements or signs in the stadium, movement of a roof of the stadium or other reconfiguration of the stadium or other event site, and so forth. In one approach, the texture is updated when an image used in an animation is captured. The determination of whether a specified event occurs which should trigger automatic updating of the texturing can be achieved in different ways, such as a light sensor to detect ambient light level, or a sensor which detects whether stadium lights have been turned on, for instance. Similarly, a timing device or process can be used to determine if a specified period has passed since last update or a specified time of day is reached"); 
generating metadata information based at least in part on the analyzing, the metadata information identifying the recognized objects and detected events in the recording (See Conwell: Fig. 1, and [0063], "One option is to submit the metadata, along with the captured content or data derived from the captured content (e.g., the FIG. 1 image, image feature data such as eigenvalues, machine readable data decoded from the image, etc.), to a service provider that acts on the submitted data, and provides a response to the user. Shazam, Snapnow, ClusterMedia Labs, Snaptell, Mobot, Mobile Acuity and Digimarc Mobile, are a few of several commercially available services that capture media content, and provide a corresponding response; others are detailed in the earlier-cited patent publications. By accompanying the content data with the metadata, the service provider can make a more informed judgment as to how it should respond to the user's submission");
identifying, based at least in part on at least one of a user preference or a detected event, a region of interest or an object of interest in the recording of content (See Guetter: Figs. 1-2, and [0031], "Segmentation may then be performed on the acquired image data (Step S12). The performance of segmentation may include identification of one or more regions-of-interest ("ROls") within the image data");
based at least in part on the identified region of interest or object of interest, generating a modified version of the recording of content, the modified version incorporating at least the generated metadata information (See Guetter: Figs. 1-2, and [0032], "The user may then perform one or more edits to adjust the computer-calculated segmentation (Step S13). Editing of the segmentation results may be performed either at predetermined frames or frames selected by the user. The user may use a cursor or touchscreen to manually adjust the segmentation results to better fit the anatomical structure. FIG. 2(c) illustrates the segmentation results 22 being edited by a user to more closely fit the anatomical structure 21 by placing one or more cursors 23 at portions of the segmentation results 22 that are most in need of editing. Implementation of editing may be assisted by algorithms so that the user does not necessarily need to edit the segmentation results at a pixel-by-pixel level. FIG. 2(d) illustrates the edited segmentation results 22"'); and
storing the modified version of the recording of content for subsequent playback on the electronic device (See Guetter: Fig. 5, and [0060], "FIG. 5 shows an example of a computer system which may implement a method and system of the present disclosure. The system and method of the present disclosure may be implemented in the form of a software application running on a computer system, for example, a mainframe, personal computer (PC), handheld computer, server, etc. The software application may be stored on a recording media locally accessible by the computer system and accessible via a hard wired or wireless connection to a network, for example, a local area network, or the Internet").


Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Guetter, etc. (US 20120087561 A1) in view of Gloudemans, etc. (US 20090128549 A1), further in view of Conwell (US 20100046842 A1), and Schissler, etc. (US 20180232471 A1).
Regarding claim 6, Guetter, Gloudemans, and Conwell teach all the features with respect to claim 1 as outlined above. Further, Gloudemans teaches that the method of claim 1, wherein generating the modified version of the recording of content further comprises: 
generating a three-dimensional (3D) mesh of at least one of the recognized objects in the recording;
generating a 3D representation of the recording of content, the 3D representation including a rendering of the 3D mesh of the at least one of the recognized objects (See Gloudemans: Figs. 9A-H, and [0130], "FIG. 9c depicts a process for texturing a 3d model. As discussed in connection with FIG. 9d, a user interface 940 displays a video texture in a display region 942 which is applied to a stadium or other event facility or location. Step 930 includes building a stadium model. For example, a diagram 944 allows an operator to specify a geometry of the stadium for use in the model. In the current example, the length of the field is 120.19 yards, the width is 75 yards, the distance from the field to the advertising boards of the grandstands on each side of the field is 1 yard (top), and 4 yards (right, bottom and left sides). The depth and angle of the grandstands is 30 yards and 33 degrees (top), 25 yards and 33 degrees (right side), 20 yards and 10 degrees (bottom), and 25 yards and 33 degrees (left side). While the example provided shows a stadium with a rectangular configuration, other configurations may be provided as well. Once the geometry is specified, video texture from one or more cameras can be applied (step 932), e.g., to obtain an initial textured 3d model of the stadium or other event facility. Note that applying the texture to a 3d model results in greater realism than applying texture to a plan"); and
generating an acoustic mesh of a scene based on acoustic information.
However, Guetter, modified by Gloudemans and Conwell, fails to explicitly disclose that generating a three-dimensional (3D) mesh of at least one of the recognized objects in the recording; and generating an acoustic mesh of a scene based on acoustic information. 
However, Schissler teaches that generating a three-dimensional (3D) mesh of at least one of the recognized objects in the recording (See Schissler: Fig. 3, and [0050], "In this section, sound propagation aspects for 3D reconstructions of real-world scenes are discussed. FIG. 3 is a diagram illustrating aspects associated with an example acoustic classification and optimization approach 300. Referring to FIG. 3, approach 300 includes generating a 3D reconstruction of a real-world scene from multiple camera viewpoints, performing a visual material segmentation on camera images of the real-world scene, thereby producing a material classification for each triangle in the scene, and performing material optimization to determine appropriate acoustic material properties for the materials in the real-world scene. In some embodiments, results from approach 300 may include a 3D mesh with acoustic materials that can be used to perform plausible acoustic simulation for augmented reality"); and
generating an acoustic mesh of a scene based on acoustic information (See Schissler: Fig. 3, and [0062], "In some embodiments, a final step in preparing a reconstructed mesh for acoustic simulation may involve simplifying a dense triangle mesh. For example, dense 3D reconstructions frequently have triangles that are smaller than the smallest audible wavelength of 1.7 cm, given by the speed of sound in the air and human hearing range. However, geometric sound propagation algorithms are generally more accurate when surface primitives are larger than audible sound wavelengths. Therefore, an acoustic mesh simplification technique [19] may be applied to the dense 3D mesh and its material properties to increase the size of surface primitives and to reduce the number of edges for diffraction computation. The simplification algorithm may involve a combination of voxel remeshing, vertex welding, and the edge collapse algorithm to reduce the model complexity. Boundaries between the patches may be respected by the simplification so that no additional error is introduced. In some embodiments, a simplification algorithm may result is a mesh that is appropriate for geometric sound propagation").
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention was effectively filed to modify Guetter to have generating a three-dimensional (3D) mesh of at least one of the recognized objects in the recording; and generating an acoustic mesh of a scene based on acoustic information as taught by Schissler in order to provide a computer-implemented method that efficiently performs acoustic classification and optimization for multi-modal rendering of real-world scenes (See Schissler: [0036], "In accordance with some aspects of the present subject matter described herein, mechanisms, processes, algorithms, or systems may be provided for generating virtual acoustic effects in captured 3D models of real-world scenes for multimodal augmented reality. Further, mechanisms, processes, algorithms, or systems may be provided for automatically computing acoustic material properties associated with materials in the real-world scene. For example, one aspect described herein involves applying CNNs to estimate acoustic material properties, including frequency-dependent absorption coefficients, that may be used for interactive sound propagation. Another aspect described herein involves an iterative optimization algorithm for adjusting the materials or acoustic properties thereof until a virtual acoustic simulation converges to measured acoustic impulse responses (IRs), e.g., recorded IRs at the real-world scene. Using various aspects described herein, automatic computation of acoustic material properties from 3D reconstructed models for augmented reality applications may be performed efficiently and effectively"). Guetter teaches a method and system that may select and edit the ROI (region of interest) of the captures images, and deriver the edited images to users, and Schissler teaches a system and method that may generate the 3D virtual model of the real world scene based on the captured images of the real-world scenes and the 3D acoustic mesh model of the acoustic response of the real scene. Therefore, it is obvious to one of ordinary skill in the art to modify Guetter by Schissler to generate 3D mesh model and the acoustic mesh of the real world scene to generate virtual 3D model of the real world scene, and then determine the ROI. The motivation to modify Guetter by Schissler is "Use of known technique to improve similar devices (methods, or products) in the same way".
Regarding claim 16, Guetter, Gloudemans, and Conwell teach all the features with respect to claim 11 as outlined above. Further, Gloudemans and Schissler teaches that the system of claim 11, wherein generating the modified version of the recording of content causes the processor to perform further operations comprising:
generating a three-dimensional (3D) mesh of at least one of the recognized objects in the recording (See Schissler: Fig. 3, and [0050], "In this section, sound propagation aspects for 3D reconstructions of real-world scenes are discussed. FIG. 3 is a diagram illustrating aspects associated with an example acoustic classification and optimization approach 300. Referring to FIG. 3, approach 300 includes generating a 3D reconstruction of a real-world scene from multiple camera viewpoints, performing a visual material segmentation on camera images of the real-world scene, thereby producing a material classification for each triangle in the scene, and performing material optimization to determine appropriate acoustic material properties for the materials in the real-world scene. In some embodiments, results from approach 300 may include a 3D mesh with acoustic materials that can be used to perform plausible acoustic simulation for augmented reality");
generating a 3D representation of the recording of content, the 3D representation including a rendering of the 3D mesh of the at least one of the recognized objects (See Gloudemans: Figs. 9A-H, and [0130], "FIG. 9c depicts a process for texturing a 3d model. As discussed in connection with FIG. 9d, a user interface 940 displays a video texture in a display region 942 which is applied to a stadium or other event facility or location. Step 930 includes building a stadium model. For example, a diagram 944 allows an operator to specify a geometry of the stadium for use in the model. In the current example, the length of the field is 120.19 yards, the width is 75 yards, the distance from the field to the advertising boards of the grandstands on each side of the field is 1 yard (top), and 4 yards (right, bottom and left sides). The depth and angle of the grandstands is 30 yards and 33 degrees (top), 25 yards and 33 degrees (right side), 20 yards and 10 degrees (bottom), and 25 yards and 33 degrees (left side). While the example provided shows a stadium with a rectangular configuration, other configurations may be provided as well. Once the geometry is specified, video texture from one or more cameras can be applied (step 932), e.g., to obtain an initial textured 3d model of the stadium or other event facility. Note that applying the texture to a 3d model results in greater realism than applying texture to a plan");
determining acoustic information from speech, voices, or audio of the recording (See Schissler: Fig. 1, and [0039], "Node 102 may include a communications interface 104, a shared memory 106, and one or more processor cores 108. Communications interface 104 may be any suitable entity (e.g., a communications interface and/or a data acquisition and generation (DAG) card) for receiving and/or sending messages. For example, communications interface 104 may be interface between various nodes 102 in a computing cluster. In another example, communications interface 104 may be associated with a user interface or other entity and may receive configuration setting and/or source data, such as audio information, for processing during a sound propagation model application"); and
generating an acoustic mesh of a scene of the recording based at least in part on the acoustic information (See Schissler: Fig. 3, and [0062], "In some embodiments, a final step in preparing a reconstructed mesh for acoustic simulation may involve simplifying a dense triangle mesh. For example, dense 3D reconstructions frequently have triangles that are smaller than the smallest audible wavelength of 1.7 cm, given by the speed of sound in the air and human hearing range. However, geometric sound propagation algorithms are generally more accurate when surface primitives are larger than audible sound wavelengths. Therefore, an acoustic mesh simplification technique [19] may be applied to the dense 3D mesh and its material properties to increase the size of surface primitives and to reduce the number of edges for diffraction computation. The simplification algorithm may involve a combination of voxel remeshing, vertex welding, and the edge collapse algorithm to reduce the model complexity. Boundaries between the patches may be respected by the simplification so that no additional error is introduced. In some embodiments, a simplification algorithm may result is a mesh that is appropriate for geometric sound propagation").




Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 


Any inquiry concerning this communication or earlier communications from the examiner should be directed to GORDON G LIU whose telephone number is (571)270-0382. The examiner can normally be reached Monday - Friday 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on 571-272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GORDON G LIU/Primary Examiner, Art Unit 2612