Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 2016/0203386 to Porecki et al., hereinafter, “Porecki” in view of Multimedia Semantics: Interactions Between Content and Community to Sundaram et al., hereinafter, “Sundaram”.
Claim 1. A method comprising: receiving a user input via a user interface identifying an initial person and/or initial object; Porecki [0029] teaches there is provided a user terminal including at least one memory device configured to store at least one program for displaying a photo-story; and at least one processor configured to execute the at least one program stored in the at least one memory device, wherein the at least one program includes instructions for performing displaying a list of photo-stories generated based on tags that indicate properties of a context of a plurality of photo images; and reproducing a photo-story that is selected from the list according to a user input.

Porecki [0074] teaches the user may select a tag of personal interest from the selected photo-story. When the user selects the tag, only photo images that are mapped to the selected tag, i.e., photo images with properties corresponding to the selected tag, are displayed from among the photo-images in the photo-story.

using metadata generated from a plurality of video data streams to identify one or more correlations and/or interactions between the initial person and/or initial object and one or more other persons and/or other objects in at least one of the plurality of video data streams; Porecki [0060] teaches the properties of the photo images may include not only information acquirable from general metadata of the photo images, but also properties related to context of the photo images. The information acquirable from the general metadata may include, for example, a camera model and manufacturer, captured date and time of an image, image resolution, exposure time, whether flash is used, and a geographical location of an image.
Porecki [0062] teaches in operation 102, the apparatus for generating the photo-story predicts scenes referred to by the photo images, based on the tags generated in operation 101. A scene is a unit that configures the photo-story, and each scene may be defined according to a length (e.g., a number of images), a topic (e.g., a common topic among the images), etc. For example, a scene `SPORTS` may include images related to soccer, swimming, running, etc. Therefore, the apparatus may predict that a photo image of a context including playing soccer with friends is included in the scene `SPORTS.

Porecki [0081] teaches the tag generation block 320 may include a metadata extractor 321, a semantic information extractor 323, and a tag database (dB) 324. The metadata extractor 321 may load the photo images from the image DB 312, extract metadata from image files, and generate a tag. A format of the metadata is not limited to a specific format. For example, the format may include Exchangeable image file format (Exif) or Extensible Metadata Platform (XMP). The metadata may include information about a camera model and manufacturer, captured date and time of an image, image resolution, exposure time, a lens focal length, whether flash is used, a color space, and a location where an image is captured. 
Porecki [0082] teaches a tag DB 324 may store tags that indicate various pieces of information about the photo images. That is, the tags may indicate the metadata extracted by the metadata extractor 321 and/or the semantic information extracted by the semantic information extractor 323. 
Porecki [0087] teaches the photo-story generation block 330 may include a photo-story creator 331, a photo-story database (dB) 332, and a photo-story manager 333. The photo-story creator 331 creates a photo-story with reference to the tags stored in the tag DB 324. In particular, the photo-story creator 331 analyzes the tags and determines a photo-story topic, and generates a photo-story file according to a photo-story template related to the topic. The photo-story template will be described below with reference to FIG. 6. 
While Porecki implies interactions between the initial person and/or initial object and one or more other persons and/or other objects, Sundaram, in the similar field of media semantics, teaches Fig. 1. Relational structure in social media streams, which reveals the strong relationship among multiple facets: (a) photos (visual content),(b) time, (c) tags, and (d) users., Fig 1

Sundaram [D. Example Applications, page 2740] teaches Social media metadata can be used to improve media retrieval. Existing research on tagging includes improving tag recommendation [17], [18] and analyzing usage patterns of tagging systems 

Sundaram [page 2738] teaches a media item such as a photograph on Flickr therefore exists as part of a meaningful interrelationship among several attributes including time, visual content, users, and actions. The semantics of media objects as well as human activity on social media platforms needs to be understood as a relationship between people, actions, artifacts, and supportive contextual metadata. Fig. 1 illustrates the relationships among visual content, time, tags and users.

Sundaram [page 2739] teaches the semantics that arise from social interaction, including commenting, sharing, and tagging, around media objects-denoted in this paper as interaction semantics-are distinct from the semantics of the media object. Rather than asking “what is the meaning of this photo or video?” we seek the semantics of the relationship between people, actions, and media. an example is useful to illustrate the difference: A Flickr group on “Arizona Travel” may have a lot of posts on Sedona, a popular destination, in July from people who live in Phoenix but travel there to escape the heat. There are fewer posts in December, when it is cold in Sedona. Now, even if the meaning of each individual photo is known, the meaning of the relationship between location (Sedona), time (summer), specific users, and photo colors is not explicit in the data. This relationship may exist because the active members of the group are friends who live in Phoenix and plan an annual summer retreat together in Sedona. In other words, the relationship-among photo visual features, photo capture time, tagging, and commenting on the photo-arises due to human activity, both online and in the physical world. In this case, the interaction semantics-the meaning of the relationship-while not explicit, are known only to the group members. These semantics cannot be easily discovered by accessing the photo stream via a single object or attribute (e.g., photo tags) or through a simple aggregation of attributes. The discovery of latent structure in such social media platforms can point to emergent cultural behaviors. Interestingly, these behaviors may not even be explicitly identifiable by members of the network. Characteristics of social network data preclude simple representations of the social context. First, social media data typically involves multiple social relations. In Flickr for example, there are several relations including user-to-user relation (friendship or commenting), user-to-photo relation (tagging or “like”), photo-to-location relationship, and photo-to-time relationship.

and generating a storyboard that summarizes one or more of the correlations and/or the interactions between the initial person and/or initial object and one or more other persons and/or other objects.  Sundaram [page 2743] teaches we can determine through other coefficient matrices the most likely users who post photos belonging to the theme and the most likely tags associated with the theme photos. The middle part of Fig. 3 shows aggregated cluster strength over time for group A and group B. We can see that the theme strengths vary over time; some themes, such as A2 and B2, only appear at certain time periods and then diminish. Some others-A3 and A5 are examples-appear, then fall and then reappear. We have observed that these themes emerge due to dedicated users (e.g., the “bird” images in A4 are taken by the same user), tag co-occurrences (e.g., “sunset” in A2, “water” in B6, etc.), as well as similar visual content (e.g., A2, A4, A5, B2, B5, B6, etc.). These empirical results suggest that our analysis captures the dynamics of group patterns and gives meaningful summary of group photo streams.

Porecki [0029] teaches there is provided a user terminal including at least one memory device configured to store at least one program for displaying a photo-story; and at least one processor configured to execute the at least one program stored in the at least one memory device, wherein the at least one program includes instructions for performing displaying a list of photo-stories generated based on tags that indicate properties of a context of a plurality of photo images; and reproducing a photo-story that is selected from the list according to a user input.
Porecki [0087] teaches the photo-story generation block 330 may include a photo-story creator 331, a photo-story database (dB) 332, and a photo-story manager 333. The photo-story creator 331 creates a photo-story with reference to the tags stored in the tag DB 324. In particular, the photo-story creator 331 analyzes the tags and determines a photo-story topic, and generates a photo-story file according to a photo-story template related to the topic. The photo-story template will be described below with reference to FIG. 6. 
Thus, at the time of the invention, it would have been obvious to one of ordinary skill in the art to modify the teachings of Porecki with the teachings of Sundaram [Abstract] to show how the analysis of visual content, in particular tracing of content remixes, can help us understand the relationship among YouTube participants.
Claim 2. Porecki and Sundaram further teaches further comprising displaying the storyboard on the user interface. Porecki [0028] teaches the reproducing may include providing, while a photo image included in the selected photo-story and another photo-story is displayed, a user interface for jumping to the other photo-story.
Porecki [0029] teaches there is provided a user terminal including at least one memory device configured to store at least one program for displaying a photo-story
Sundaram [page 2743] teaches we developed an interactive interface to present the results of thematic cluster extraction

Claim 3. Porecki and Sundaram further teaches wherein the user input identifies an initial person. 
Porecki [0029] teaches there is provided a user terminal including at least one memory device configured to store at least one program for displaying a photo-story; and at least one processor configured to execute the at least one program stored in the at least one memory device, wherein the at least one program includes instructions for performing displaying a list of photo-stories generated based on tags that indicate properties of a context of a plurality of photo images; and reproducing a photo-story that is selected from the list according to a user input.

Porecki [0074] teaches the user may select a tag of personal interest from the selected photo-story. When the user selects the tag, only photo images that are mapped to the selected tag, i.e., photo images with properties corresponding to the selected tag, are displayed from among the photo-images in the photo-story.

Sundaram Fig 1

Claim 4. Porecki and Sundaram further teaches comprising using the metadata to identify an interaction between the initial person and one or more other persons in at least one of the plurality of video data streams. Porecki [0029] teaches there is provided a user terminal including at least one memory device configured to store at least one program for displaying a photo-story; and at least one processor configured to execute the at least one program stored in the at least one memory device, wherein the at least one program includes instructions for performing displaying a list of photo-stories generated based on tags that indicate properties of a context of a plurality of photo images; and reproducing a photo-story that is selected from the list according to a user input.

Porecki  [0062] teaches in operation 102, the apparatus for generating the photo-story predicts scenes referred to by the photo images, based on the tags generated in operation 101. A scene is a unit that configures the photo-story, and each scene may be defined according to a length (e.g., a number of images), a topic (e.g., a common topic among the images), etc. For example, a scene `SPORTS` may include images related to soccer, swimming, running, etc. Therefore, the apparatus may predict that a photo image of a context including playing soccer with friends is included in the scene `SPORTS.

Porecki [0074] teaches the user may select a tag of personal interest from the selected photo-story. When the user selects the tag, only photo images that are mapped to the selected tag, i.e., photo images with properties corresponding to the selected tag, are displayed from among the photo-images in the photo-story.

Sundaram Fig 1

Claim 5. Porecki further teaches comprising using the metadata to identify an interaction between the initial person and one or more objects in at least one of the plurality of video data streams. Porecki [0029] teaches there is provided a user terminal including at least one memory device configured to store at least one program for displaying a photo-story; and at least one processor configured to execute the at least one program stored in the at least one memory device, wherein the at least one program includes instructions for performing displaying a list of photo-stories generated based on tags that indicate properties of a context of a plurality of photo images; and reproducing a photo-story that is selected from the list according to a user input.

Porecki  [0062] teaches in operation 102, the apparatus for generating the photo-story predicts scenes referred to by the photo images, based on the tags generated in operation 101. A scene is a unit that configures the photo-story, and each scene may be defined according to a length (e.g., a number of images), a topic (e.g., a common topic among the images), etc. For example, a scene `SPORTS` may include images related to soccer, swimming, running, etc. Therefore, the apparatus may predict that a photo image of a context including playing soccer with friends is included in the scene `SPORTS.

Porecki [0074] teaches the user may select a tag of personal interest from the selected photo-story. When the user selects the tag, only photo images that are mapped to the selected tag, i.e., photo images with properties corresponding to the selected tag, are displayed from among the photo-images in the photo-story.

Claim 6. Porecki further teaches wherein the user input identifies an initial object. 
Porecki [0029] teaches there is provided a user terminal including at least one memory device configured to store at least one program for displaying a photo-story; and at least one processor configured to execute the at least one program stored in the at least one memory device, wherein the at least one program includes instructions for performing displaying a list of photo-stories generated based on tags that indicate properties of a context of a plurality of photo images; and reproducing a photo-story that is selected from the list according to a user input.
Porecki  [0062] teaches in operation 102, the apparatus for generating the photo-story predicts scenes referred to by the photo images, based on the tags generated in operation 101. A scene is a unit that configures the photo-story, and each scene may be defined according to a length (e.g., a number of images), a topic (e.g., a common topic among the images), etc. For example, a scene `SPORTS` may include images related to soccer, swimming, running, etc. Therefore, the apparatus may predict that a photo image of a context including playing soccer with friends is included in the scene `SPORTS.
Porecki [0074] teaches the user may select a tag of personal interest from the selected photo-story. When the user selects the tag, only photo images that are mapped to the selected tag, i.e., photo images with properties corresponding to the selected tag, are displayed from among the photo-images in the photo-story.
Claim 7. Porecki further teaches comprising using the metadata to identify an interaction between the initial object and one or more persons in at least one of the plurality of video data streams.  Porecki [0029] teaches there is provided a user terminal including at least one memory device configured to store at least one program for displaying a photo-story; and at least one processor configured to execute the at least one program stored in the at least one memory device, wherein the at least one program includes instructions for performing displaying a list of photo-stories generated based on tags that indicate properties of a context of a plurality of photo images; and reproducing a photo-story that is selected from the list according to a user input.

Porecki  [0062] teaches in operation 102, the apparatus for generating the photo-story predicts scenes referred to by the photo images, based on the tags generated in operation 101. A scene is a unit that configures the photo-story, and each scene may be defined according to a length (e.g., a number of images), a topic (e.g., a common topic among the images), etc. For example, a scene `SPORTS` may include images related to soccer, swimming, running, etc. Therefore, the apparatus may predict that a photo image of a context including playing soccer with friends is included in the scene `SPORTS.
Porecki [0074] teaches the user may select a tag of personal interest from the selected photo-story. When the user selects the tag, only photo images that are mapped to the selected tag, i.e., photo images with properties corresponding to the selected tag, are displayed from among the photo-images in the photo-story.
Claim 8. Porecki and Sundaram further teaches further comprising generating the metadata from the plurality of video data streams.  Porecki [0081] teaches the tag generation block 320 may include a metadata extractor 321, a semantic information extractor 323, and a tag database (dB) 324. The metadata extractor 321 may load the photo images from the image DB 312, extract metadata from image files, and generate a tag. A format of the metadata is not limited to a specific format. For example, the format may include Exchangeable image file format (Exif) or Extensible Metadata Platform (XMP). The metadata may include information about a camera model and manufacturer, captured date and time of an image, image resolution, exposure time, a lens focal length, whether flash is used, a color space, and a location where an image is captured. 
Sundaram Fig 1
Claim 9. Porecki and Sundaram further teaches wherein generating the metadata comprises identifying, locating and tracking the one or more persons and/or one or more objects in the plurality of video data streams, and generating metadata for each of the one or more persons and/or one or more objects. Porecki [0098] teaches the context of the photo image 404 may be predicted by applying the models 403 generated via machine learning to the photo image 404, and at least one tag 405 that indicates the predicted context may be generated. The at least one tag 405 may indicate, for example, a presence of a specific object in an image, an identity of a particular object, a location of a detected object, an identifier indicating another image having the same or similar context as a specific image, and semantic information found via unsupervised machine learning.
Sundaram Fig. 1. Relational structure in social media streams, which reveals the strong relationship among multiple facets: (a) photos (visual content), (b) time, (c) tags, and (d) users., Fig 1

Sundaram [D. Example Applications, page 2740] teaches Social media metadata can be used to improve media retrieval. Existing research on tagging includes improving tag recommendation [17], [18] and analyzing usage patterns of tagging systems 

storing metadata of each of the plurality of video data streams in the data repository; Sundaram [D. Example Applications, page 2740] teaches Social media metadata can be used to improve media retrieval. Existing research on tagging includes improving tag recommendation [17], [18] and analyzing usage patterns of tagging systems 

Sundaram [page 2742] teaches it is possible, for example, to add the EXIF metadata from photos and include location, camera model, and settings. The EXIF metadata can be represented in a manner similar to the basic contextual information discussed in this section. Fig. 2 shows the four relation matrices mentioned above.

Sundaram [page 2745] teaches For each video, apart from downloading the video, we collected contextual metadata: timestamp, tags, associated set of comments and their timestamps, and authors.

Sundaram [page 2750] teaches For each unique video, we segment shots, extract keyframes, and extract visual features from each keyframe. We also retrieve the associated metadata, including author, publish date, view-counts, and free-text title and descriptions.

Sundaram Fig 1

Porecki [0098] teaches the context of the photo image 404 may be predicted by applying the models 403 generated via machine learning to the photo image 404, and at least one tag 405 that indicates the predicted context may be generated. The at least one tag 405 may indicate, for example, a presence of a specific object in an image, an identity of a particular object, a location of a detected object, an identifier indicating another image having the same or similar context as a specific image, and semantic information found via unsupervised machine learning.
Claim 10. Porecki and Sundaram further teaches wherein generating the metadata comprises classifying one or more persons and/or one or more objects in the plurality of video data streams, and generating metadata indicative of the classification. Porecki [0098] teaches the context of the photo image 404 may be predicted by applying the models 403 generated via machine learning to the photo image 404, and at least one tag 405 that indicates the predicted context may be generated. The at least one tag 405 may indicate, for example, a presence of a specific object in an image, an identity of a particular object, a location of a detected object, an identifier indicating another image having the same or similar context as a specific image, and semantic information found via unsupervised machine learning.
storing metadata of each of the plurality of video data streams in the data repository; Sundaram [D. Example Applications, page 2740] teaches Social media metadata can be used to improve media retrieval. Existing research on tagging includes improving tag recommendation [17], [18] and analyzing usage patterns of tagging systems 
Sundaram [page 2742] teaches it is possible, for example, to add the EXIF metadata from photos and include location, camera model, and settings. The EXIF metadata can be represented in a manner similar to the basic contextual information discussed in this section. Fig. 2 shows the four relation matrices mentioned above.

Sundaram [page 2745] teaches For each video, apart from downloading the video, we collected contextual metadata: timestamp, tags, associated set of comments and their timestamps, and authors.

Sundaram [page 2750] teaches For each unique video, we segment shots, extract keyframes, and extract visual features from each keyframe. We also retrieve the associated metadata, including author, publish date, view-counts, and free-text title and descriptions.  

Sundaram Fig 1
Porecki [0010] teaches the generating of the tags may include detecting objects in the photo images by using visual pattern recognition models that are learned from training images; and determining, based on the detected objects, the properties of the context of each of the plurality of photo images. 
Porecki [0017] teaches the tag generator may include an object detector configured to detect objects in the photo images by using visual pattern recognition models that are learned from training images; and a properties determiner configured to determine, based on the detected objects, the properties of the context of each of the plurality of photo images. 
Porecki [0090] teaches the photo-story presentation block 340 may include a photo-story parser 341, a photo-story reproducer 342 and a feedback manager 343. The photo-story presentation block 340 may be implemented on a user terminal. The photo-story parser 341 parses the photo-story file and loads a photo image and relevant data used for reproducing the photo-story from the image DB 312. A photo-story reproducer 342 may receive the photo image and the relevant data from the photo-story parser 341 and render the photo-story. A feedback manager 343 notifies the photo-story manager 333 when the user edits or deletes the photo-story, and the photo-story manager 333 updates the photo-story file stored in the photo-story DB 332 with respect to a user command.
Claim 11. The method of claim 9, wherein generating the metadata comprises classifying one or more interactions between the initial person and/or initial object and one or more other persons and/or other objects. Porecki  [0062] teaches in operation 102, the apparatus for generating the photo-story predicts scenes referred to by the photo images, based on the tags generated in operation 101. A scene is a unit that configures the photo-story, and each scene may be defined according to a length (e.g., a number of images), a topic (e.g., a common topic among the images), etc. For example, a scene `SPORTS` may include images related to soccer, swimming, running, etc. Therefore, the apparatus may predict that a photo image of a context including playing soccer with friends is included in the scene `SPORTS.  
Claim 12. Porecki and Sundaram further teaches wherein the plurality of video data streams are captured and stored on a video management server before receiving the user input.  Sundaram [page 2737] teaches social networks have made application programming interfaces (APIs) available to researchers to access user-generated content and user interactions.

Sundaram [page 2750] teaches For each unique video, we segment shots, extract keyframes, and extract visual features from each keyframe. We also retrieve the associated metadata, including author, publish date, view-counts, and free-text title and descriptions.

Sundaram Fig 1

Porecki [0029] teaches there is provided a user terminal including at least one memory device configured to store at least one program for displaying a photo-story; and at least one processor configured to execute the at least one program stored in the at least one memory device, wherein the at least one program includes instructions for performing displaying a list of photo-stories generated based on tags that indicate properties of a context of a plurality of photo images; and reproducing a photo-story that is selected from the list according to a user input.
Claim 13. A method comprising: identifying, locating and tracking one or more persons and/or one or more objects captured in one or more video data streams; Porecki [0098] teaches the context of the photo image 404 may be predicted by applying the models 403 generated via machine learning to the photo image 404, and at least one tag 405 that indicates the predicted context may be generated. The at least one tag 405 may indicate, for example, a presence of a specific object in an image, an identity of a particular object, a location of a detected object, an identifier indicating another image having the same or similar context as a specific image, and semantic information found via unsupervised machine learning.
Sundaram [page 2739] also teaches the semantics that arise from social interaction, including commenting, sharing, and tagging, around media objects-denoted in this paper as interaction semantics-are distinct from the semantics of the media object. Rather than asking “what is the meaning of this photo or video?” we seek the semantics of the relationship between people, actions, and media. an example is useful to illustrate the difference: A Flickr group on “Arizona Travel” may have a lot of posts on Sedona, a popular destination, in July from people who live in Phoenix but travel there to escape the heat. There are fewer posts in December, when it is cold in Sedona. Now, even if the meaning of each individual photo is known, the meaning of the relationship between location (Sedona), time (summer), specific users, and photo colors is not explicit in the data. This relationship may exist because the active members of the group are friends who live in Phoenix and plan an annual summer retreat together in Sedona. In other words, the relationship-among photo visual features, photo capture time, tagging, and commenting on the photo-arises due to human activity, both online and in the physical world. In this case, the interaction semantics-the meaning of the relationship-while not explicit, are known only to the group members. These semantics cannot be easily discovered by accessing the photo stream via a single object or attribute (e.g., photo tags) or through a simple aggregation of attributes. The discovery of latent structure in such social media platforms can point to emergent cultural behaviors. Interestingly, these behaviors may not even be explicitly identifiable by members of the network. Characteristics of social network data preclude simple representations of the social context. First, social media data typically involves multiple social relations. In Flickr for example, there are several relations including user-to-user relation (friendship or commenting), user-to-photo relation (tagging or “like”), photo-to-location relationship, and photo-to-time relationship.

identifying one or more interactions of at least some of the identified, located and tracked persons and/or objects captured in one or more video data streams; Porecki  [0062] teaches in operation 102, the apparatus for generating the photo-story predicts scenes referred to by the photo images, based on the tags generated in operation 101. A scene is a unit that configures the photo-story, and each scene may be defined according to a length (e.g., a number of images), a topic (e.g., a common topic among the images), etc. For example, a scene `SPORTS` may include images related to soccer, swimming, running, etc. Therefore, the apparatus may predict that a photo image of a context including playing soccer with friends is included in the scene `SPORTS. 
identifying one or more relationships between at least some of the identified, located and tracked persons and/or objects captured in one or more video data streams; Sundaram [page 2738] teaches a media item such as a photograph on Flickr therefore exists as part of a meaningful interrelationship among several attributes including time, visual content, users, and actions. The semantics of media objects as well as human activity on social media platforms needs to be understood as a relationship between people, actions, artifacts, and supportive contextual metadata. Fig. 1 illustrates the relationships among visual content, time, tags and users.

Sundaram [page 2739] teaches the semantics that arise from social interaction, including commenting, sharing, and tagging, around media objects-denoted in this paper as interaction semantics-are distinct from the semantics of the media object. Rather than asking “what is the meaning of this photo or video?” we seek the semantics of the relationship between people, actions, and media. an example is useful to illustrate the difference: A Flickr group on “Arizona Travel” may have a lot of posts on Sedona, a popular destination, in July from people who live in Phoenix but travel there to escape the heat. There are fewer posts in December, when it is cold in Sedona. Now, even if the meaning of each individual photo is known, the meaning of the relationship between location (Sedona), time (summer), specific users, and photo colors is not explicit in the data. This relationship may exist because the active members of the group are friends who live in Phoenix and plan an annual summer retreat together in Sedona. In other words, the relationship-among photo visual features, photo capture time, tagging, and commenting on the photo-arises due to human activity, both online and in the physical world. In this case, the interaction semantics-the meaning of the relationship-while not explicit, are known only to the group members. These semantics cannot be easily discovered by accessing the photo stream via a single object or attribute (e.g., photo tags) or through a simple aggregation of attributes. The discovery of latent structure in such social media platforms can point to emergent cultural behaviors. Interestingly, these behaviors may not even be explicitly identifiable by members of the network. Characteristics of social network data preclude simple representations of the social context. First, social media data typically involves multiple social relations. In Flickr for example, there are several relations including user-to-user relation (friendship or commenting), user-to-photo relation (tagging or “like”), photo-to-location relationship, and photo-to-time relationship.

While Porecki implies interactions between the initial person and/or initial object and one or more other persons and/or other objects, Sundaram, in the similar field of media semantics, teaches generating metadata that is representative of the identified, located and tracked persons and/or objects, the identified interactions and the identified one or more relationships; Sundaram [page 2738] teaches a media item such as a photograph on Flickr therefore exists as part of a meaningful interrelationship among several attributes including time, visual content, users, and actions. The semantics of media objects as well as human activity on social media platforms needs to be understood as a relationship between people, actions, artifacts, and supportive contextual metadata. Fig. 1 illustrates the relationships among visual content, time, tags and users.

Sundaram [page 2739] teaches the semantics that arise from social interaction, including commenting, sharing, and tagging, around media objects-denoted in this paper as interaction semantics-are distinct from the semantics of the media object. Rather than asking “what is the meaning of this photo or video?” we seek the semantics of the relationship between people, actions, and media. an example is useful to illustrate the difference: A Flickr group on “Arizona Travel” may have a lot of posts on Sedona, a popular destination, in July from people who live in Phoenix but travel there to escape the heat. There are fewer posts in December, when it is cold in Sedona. Now, even if the meaning of each individual photo is known, the meaning of the relationship between location (Sedona), time (summer), specific users, and photo colors is not explicit in the data. This relationship may exist because the active members of the group are friends who live in Phoenix and plan an annual summer retreat together in Sedona. In other words, the relationship-among photo visual features, photo capture time, tagging, and commenting on the photo-arises due to human activity, both online and in the physical world. In this case, the interaction semantics-the meaning of the relationship-while not explicit, are known only to the group members. These semantics cannot be easily discovered by accessing the photo stream via a single object or attribute (e.g., photo tags) or through a simple aggregation of attributes. The discovery of latent structure in such social media platforms can point to emergent cultural behaviors. Interestingly, these behaviors may not even be explicitly identifiable by members of the network. Characteristics of social network data preclude simple representations of the social context. First, social media data typically involves multiple social relations. In Flickr for example, there are several relations including user-to-user relation (friendship or commenting), user-to-photo relation (tagging or “like”), photo-to-location relationship, and photo-to-time relationship.

and presenting a story of a selected one of the identified, located and tracked persons and/or objects based at least in part on the metadata, the story represents at least one of the one or more identified interactions involving the selected one of the identified, located and tracked persons and/or objects. Porecki [0029] teaches there is provided a user terminal including at least one memory device configured to store at least one program for displaying a photo-story; and at least one processor configured to execute the at least one program stored in the at least one memory device, wherein the at least one program includes instructions for performing displaying a list of photo-stories generated based on tags that indicate properties of a context of a plurality of photo images; and reproducing a photo-story that is selected from the list according to a user input.
Porecki [0087] teaches the photo-story generation block 330 may include a photo-story creator 331, a photo-story database (dB) 332, and a photo-story manager 333. The photo-story creator 331 creates a photo-story with reference to the tags stored in the tag DB 324. In particular, the photo-story creator 331 analyzes the tags and determines a photo-story topic, and generates a photo-story file according to a photo-story template related to the topic. The photo-story template will be described below with reference to FIG. 6. 
Sundaram [page 2743] also teaches we can determine through other coefficient matrices the most likely users who post photos belonging to the theme and the most likely tags associated with the theme photos. The middle part of Fig. 3 shows aggregated cluster strength over time for group A and group B. We can see that the theme strengths vary over time; some themes, such as A2 and B2, only appear at certain time periods and then diminish. Some others-A3 and A5 are examples-appear, then fall and then reappear. We have observed that these themes emerge due to dedicated users (e.g., the “bird” images in A4 are taken by the same user), tag co-occurrences (e.g., “sunset” in A2, “water” in B6, etc.), as well as similar visual content (e.g., A2, A4, A5, B2, B5, B6, etc.). These empirical results suggest that our analysis captures the dynamics of group patterns and gives meaningful summary of group photo streams.
Thus, at the time of the invention, it would have been obvious to one of ordinary skill in the art to modify the teachings of Porecki with the teachings of Sundaram [Abstract] to show how the analysis of visual content, in particular tracing of content remixes, can help us understand the relationship among YouTube participants.
Claim 14. Porecki and Sundaram further teaches wherein the story represents at least one of the one or more identified relationships involving the selected one of the identified, located and tracked persons and/or objects. Sundaram [page 2739] teaches the semantics that arise from social interaction, including commenting, sharing, and tagging, around media objects-denoted in this paper as interaction semantics-are distinct from the semantics of the media object. Rather than asking “what is the meaning of this photo or video?” we seek the semantics of the relationship between people, actions, and media. an example is useful to illustrate the difference: A Flickr group on “Arizona Travel” may have a lot of posts on Sedona, a popular destination, in July from people who live in Phoenix but travel there to escape the heat. There are fewer posts in December, when it is cold in Sedona. Now, even if the meaning of each individual photo is known, the meaning of the relationship between location (Sedona), time (summer), specific users, and photo colors is not explicit in the data. This relationship may exist because the active members of the group are friends who live in Phoenix and plan an annual summer retreat together in Sedona. In other words, the relationship-among photo visual features, photo capture time, tagging, and commenting on the photo-arises due to human activity, both online and in the physical world. In this case, the interaction semantics-the meaning of the relationship-while not explicit, are known only to the group members. These semantics cannot be easily discovered by accessing the photo stream via a single object or attribute (e.g., photo tags) or through a simple aggregation of attributes. The discovery of latent structure in such social media platforms can point to emergent cultural behaviors. Interestingly, these behaviors may not even be explicitly identifiable by members of the network. Characteristics of social network data preclude simple representations of the social context. First, social media data typically involves multiple social relations. In Flickr for example, there are several relations including user-to-user relation (friendship or commenting), user-to-photo relation (tagging or “like”), photo-to-location relationship, and photo-to-time relationship.

Porecki [0098] teaches the context of the photo image 404 may be predicted by applying the models 403 generated via machine learning to the photo image 404, and at least one tag 405 that indicates the predicted context may be generated. The at least one tag 405 may indicate, for example, a presence of a specific object in an image, an identity of a particular object, a location of a detected object, an identifier indicating another image having the same or similar context as a specific image, and semantic information found via unsupervised machine learning.
Claim 15. Porecki and Sundaram further teaches further comprising classifying at least some of the identified, located and tracked persons and/or objects captured in one or more video data streams, wherein generating metadata includes generating metadata indicative of the classification of the at least some of the identified, located and tracked persons and/or objects.  Sundaram [page 2745] teaches For each video, apart from downloading the video, we collected contextual metadata: timestamp, tags, associated set of comments and their timestamps, and authors.

Sundaram [page 2750] teaches For each unique video, we segment shots, extract keyframes, and extract visual features from each keyframe. We also retrieve the associated metadata, including author, publish date, view-counts, and free-text title and descriptions.
Porecki [0010] teaches the generating of the tags may include detecting objects in the photo images by using visual pattern recognition models that are learned from training images; and determining, based on the detected objects, the properties of the context of each of the plurality of photo images. 
Porecki [0017] teaches the tag generator may include an object detector configured to detect objects in the photo images by using visual pattern recognition models that are learned from training images; and a properties determiner configured to determine, based on the detected objects, the properties of the context of each of the plurality of photo images. 
Porecki [0090] teaches the photo-story presentation block 340 may include a photo-story parser 341, a photo-story reproducer 342 and a feedback manager 343. The photo-story presentation block 340 may be implemented on a user terminal. The photo-story parser 341 parses the photo-story file and loads a photo image and relevant data used for reproducing the photo-story from the image DB 312. A photo-story reproducer 342 may receive the photo image and the relevant data from the photo-story parser 341 and render the photo-story. A feedback manager 343 notifies the photo-story manager 333 when the user edits or deletes the photo-story, and the photo-story manager 333 updates the photo-story file stored in the photo-story DB 332 with respect to a user command.
Porecki [0098] teaches the context of the photo image 404 may be predicted by applying the models 403 generated via machine learning to the photo image 404, and at least one tag 405 that indicates the predicted context may be generated. The at least one tag 405 may indicate, for example, a presence of a specific object in an image, an identity of a particular object, a location of a detected object, an identifier indicating another image having the same or similar context as a specific image, and semantic information found via unsupervised machine learning.
Claim 16. Porecki and Sundaram further teaches further comprising classifying at least one of the one or more identified interaction and/or at least one or more of the identified relationships, wherein generating metadata includes generating metadata indicative of the classification of the at least one of the one or more identified interaction and/or at least one or more of the identified relationships.  Sundaram [page 2738] teaches a media item such as a photograph on Flickr therefore exists as part of a meaningful interrelationship among several attributes including time, visual content, users, and actions. The semantics of media objects as well as human activity on social media platforms needs to be understood as a relationship between people, actions, artifacts, and supportive contextual metadata. Fig. 1 illustrates the relationships among visual content, time, tags and users.

Sundaram [page 2739] teaches the semantics that arise from social interaction, including commenting, sharing, and tagging, around media objects-denoted in this paper as interaction semantics-are distinct from the semantics of the media object. Rather than asking “what is the meaning of this photo or video?” we seek the semantics of the relationship between people, actions, and media. an example is useful to illustrate the difference: A Flickr group on “Arizona Travel” may have a lot of posts on Sedona, a popular destination, in July from people who live in Phoenix but travel there to escape the heat. There are fewer posts in December, when it is cold in Sedona. Now, even if the meaning of each individual photo is known, the meaning of the relationship between location (Sedona), time (summer), specific users, and photo colors is not explicit in the data. This relationship may exist because the active members of the group are friends who live in Phoenix and plan an annual summer retreat together in Sedona. In other words, the relationship-among photo visual features, photo capture time, tagging, and commenting on the photo-arises due to human activity, both online and in the physical world. In this case, the interaction semantics-the meaning of the relationship-while not explicit, are known only to the group members. These semantics cannot be easily discovered by accessing the photo stream via a single object or attribute (e.g., photo tags) or through a simple aggregation of attributes. The discovery of latent structure in such social media platforms can point to emergent cultural behaviors. Interestingly, these behaviors may not even be explicitly identifiable by members of the network. Characteristics of social network data preclude simple representations of the social context. First, social media data typically involves multiple social relations. In Flickr for example, there are several relations including user-to-user relation (friendship or commenting), user-to-photo relation (tagging or “like”), photo-to-location relationship, and photo-to-time relationship.
Porecki [0010] teaches the generating of the tags may include detecting objects in the photo images by using visual pattern recognition models that are learned from training images; and determining, based on the detected objects, the properties of the context of each of the plurality of photo images. 
Porecki [0017] teaches the tag generator may include an object detector configured to detect objects in the photo images by using visual pattern recognition models that are learned from training images; and a properties determiner configured to determine, based on the detected objects, the properties of the context of each of the plurality of photo images. 
Porecki [0090] teaches the photo-story presentation block 340 may include a photo-story parser 341, a photo-story reproducer 342 and a feedback manager 343. The photo-story presentation block 340 may be implemented on a user terminal. The photo-story parser 341 parses the photo-story file and loads a photo image and relevant data used for reproducing the photo-story from the image DB 312. A photo-story reproducer 342 may receive the photo image and the relevant data from the photo-story parser 341 and render the photo-story. A feedback manager 343 notifies the photo-story manager 333 when the user edits or deletes the photo-story, and the photo-story manager 333 updates the photo-story file stored in the photo-story DB 332 with respect to a user command.
Porecki [0098] teaches the context of the photo image 404 may be predicted by applying the models 403 generated via machine learning to the photo image 404, and at least one tag 405 that indicates the predicted context may be generated. The at least one tag 405 may indicate, for example, a presence of a specific object in an image, an identity of a particular object, a location of a detected object, an identifier indicating another image having the same or similar context as a specific image, and semantic information found via unsupervised machine learning.
Claim 17. It differs from claim 1 in that it is a non-transitory computer readable medium storing instructions thereon that when executed by one or more processors causes the one or more processors to perform the method of claim 1. Therefore claim 17 has been analyzed and reviewed in same way as claim 1. See the above analysis.   

Claim 18. It differs from claim 2 in that it is a non-transitory computer readable medium storing instructions thereon that when executed by one or more processors causes the one or more processors to perform the method of claim 2. Therefore claim 18 has been analyzed and reviewed in same way as claim 2. See the above analysis.   

Claim 19. It differs from claim 9 in that it is a non-transitory computer readable medium storing instructions thereon that when executed by one or more processors causes the one or more processors to perform the method of claim 9. Therefore claim 19 has been analyzed and reviewed in same way as claim 9. See the above analysis.   

Claim 20. It differs from claims 10 and 11 in that it is a non-transitory computer readable medium storing instructions thereon that when executed by one or more processors causes the one or more processors to perform the method of claims 10 and 11. Therefore claim 20 has been analyzed and reviewed in same way as claims 10 and 11. See the above analysis.   

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1 and (similarly recited claims 13 and 17) rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of U.S. Patent No. 11, 087139 B2. Although the claims at issue are not identical, they are not patentably distinct from each other because the claims are similar in scope.
17/384406
US 11, 087139 B2
A method comprising: receiving a user input via a user interface identifying an initial person and/or initial object;
1. A method comprising: storing in a data repository a plurality of video data streams each capturing video of a corresponding monitored region; storing streams of metadata of each of the plurality of video data streams in the data repository; receiving a user input via a user interface identifying an initial person and/or initial object;
using metadata generated from a plurality of video data streams to identify one or more correlations and/or interactions between the initial person and/or initial object and one or more other persons and/or other objects in at least one of the plurality of video data streams;
using the streams of metadata to identify one or more correlations and/or interactions between the initial person and/or initial object and one or more other persons and/or other objects in at least one of the plurality of video data streams;
and generating a storyboard that summarizes one or more of the correlations and/or the interactions between the initial person and/or initial object and one or more other persons and/or other objects.  
herein replicating the story of the incident includes generating a storyboard using the respective streams of metadata of each of the plurality of video data streams, and wherein the storyboard summarizes the correlations and/or the interactions between the initial person and/or initial object and one or more other persons and/or other objects


Likewise claims 2-12, 14-16 and 18-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 11, 087139 B2.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DELOMIA L GILLIARD whose telephone number is (571)272-1681. The examiner can normally be reached 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached on 571 272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DELOMIA L GILLIARD/Primary Examiner, Art Unit 2661