DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 10/19/2022 have been fully considered but they are not persuasive.
In re pages 7-8, the applicant argues that “Because Gormish operates at a document level and the claims are limited to processing "with respect to plural frames of the digital video work", modifying Folta in view of Gormish would not result in the invention as claimed. Rather, if one were to modify Folta in view of Gormish (without using the claims as a hindsight guide), Gormish would dictate that Folta's video as a whole would be hashed and stored to a log. In the claimed invention, however, processing is done on individual frames of the video. Gormish does not suggest even suggest that processing should be done on individual frames of a video or pages of a document. Thus, even if Folta in view of Kumar were modified in view of Gormish, the result would not be the claimed invention here. 
	Accordingly, the claims are not obvious in view of these references and the rejection should be withdrawn.”
	In response, the examiner respectfully disagrees. Folta et al. discloses in fig. 7, paragraph 0039 that “The metadata that identifies people in a show may then be saved, as generally represented by step 714. An efficient way is to maintain metadata that identifies which person is present in which frame (timestamp) of which show at which location, e.g., {ShowID, frame number, location in frame, ActorID}. Other ways of formatting the metadata, e.g., via one or more GUI Ds may be used, however in general, the metadata allows an unknown face in a show, frame, and frame location to be efficiently matched to an identity (if previously recognized)”, paragraph 0042 teaches “If the show has been processed as described above, the metadata 880 may be accessed by a face matching mechanism 888 with the {ShowID, frame number, location in frame} to find the ActorID. The ActorID in turn may be used to lookup a database 890 to provide results that, for example, identify who that person is in that frame, provide biographical information about that person, provide links to more data, and so forth.”; Herein, Folta et al. discloses detecting an object (face) in the digital still image (shots) of the frame of the digital video work (step 702 shows separate video into shots), assigning object metadata to the recognized object, for example, ShowID, frame number, location in frame, etc. 
	Kumar et al. discloses from paragraph 0059-0061 that “The identity of the active speaker may be provided by voice recognizer 120 to video decomposer 312. In many instances, the user identity associated with one of the face streams generated by video decomposer 312 will match the identity of the active speaker, since it is typical that one of the recognized faces will correspond to the active speaker.”, paragraph 0072 teaches “Data processor 612 may receive the identity of the active speaker from voice recognizer 120. Data processor 612 may additionally receive location streams paired with the corresponding identity of the user tracked by the location stream from face recognizer 210. In the example of FIG. 6B, the identity of the active speaker matches the identity paired with the location stream of Face 1. Based on this match, the location stream of Face 1 may be tagged, by data processor 612, as corresponding to the active speaker (e.g., Active Speaker=T).”); Kumar et al. discloses object metadata linking audio
	Gormish et al. discloses in fig. 7, paragraph 0121 that “Documents and metadata are saved in the image and metadata store 702, they are also logged (e.g., a hash is generated) and saved in the log 703, which is in memory 704.”, Table 2 shows hash of message content. Furthermore, paragraph 0032 teaches “Many of the inventions described here-in require the ability to refer to a document, video, song, piece of paper, or electronic file by an identifier. For purposes herein, the document, video, song, piece of paper, or electronic file is referred herein to as the media. An identifier used to identify the media is called a media identifier and, in one embodiment, is a string of bytes.”, paragraph 0118 teaches “The information available to authenticate a document includes any metadata entered when the document was logged. This could include a digital signature done by a smartcard from the scanner or a PIN from the printer driver. The timing data available can be more complex. In one embodiment, the timing data is a timestamp from the SOX server. Such a timestamp might have been changed by someone with access to the machine. Thus, it is possible to follow the chain to other servers and retrieve their timestamp for the chain. By using a hash chain, it is possible to authenticate any log entry as occurring before a timestamp on the second server. For example, the local server might assert that a document was entered at 9:57 AM on Thursday Sep. 28, 2006 (PST). A server that had an entangled log somewhat later could only confirm that the document existed before 10:03 AM, and a server that entangled with that server only once per day might only be able to confirm that the document existed before 5:00 PM. Assuming servers entangle at least once a day, the confidence in the date of any particular document will be absolute.” Herein, Gormish et al. discloses cryptographic has of different documents and metadata, thus meets claimed invention.
Therefore, the combination of Folta et al., Kumar et al. and Gormish et al. teach the limitation as claimed.
	Therefore, in view of the above, the examiner believes that the features of the claims are taught by the applied arts. See also the Office Action sets for the below
	
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 7-8, 11-13, 17-18, 21-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 2012/0106806 by Folta et al. in view of US 2019/0215464 by Kumar et al. and US 2008/0243898 by Gormish et al.

Regarding claim 1, Folta et al. discloses an apparatus comprising a non-volatile machine-readable medium storing a program having instructions which when executed by a processor will cause the processor to manufacture an inventory of image products from frame of digital video work, each frame comprising a still image in sequence in the digital video work (fig. 7), the instructions of the program for, with respect to plural frames of the digital video work: 
detecting an object in the digital still image of the frame of the digital video work; 
recognizing the detected object in the still image; 
assigning object metadata to the recognized object (Abstract, fig. 7, paragraph 0039, 0042);
Folta et al. fails to disclose
the digital video work including audio which respectively corresponds to objects in the still images in the digital video work;
the object metadata linking audio in the digital video work to the recognized object in the digital still image which produced the audio, wherein the metadata includes one of: a location in the digital image of the recognized object; an identification of what the recognized object is; an image of the recognized object; an image of an actor corresponding to the recognized object; a name of an actor corresponding to the recognized object; or spoken lines; or the audio; 
generating at least one cryptographic hash of the object metadata; and 
writing the hash to a node of a transaction processing network
Kumar et al. discloses 
the digital video work including audio which respectively corresponds to objects in the still images in the digital video work (paragraph 0047-0048);
the object metadata linking audio in the digital video work to the recognized object in the digital still image which produced the audio (paragraph 0047-0048, 0054, 0059-0061 teaches “The identity of the active speaker may be provided by voice recognizer 120 to video decomposer 312. In many instances, the user identity associated with one of the face streams generated by video decomposer 312 will match the identity of the active speaker, since it is typical that one of the recognized faces will correspond to the active speaker.”, paragraph 0072 teaches “Data processor 612 may receive the identity of the active speaker from voice recognizer 120. Data processor 612 may additionally receive location streams paired with the corresponding identity of the user tracked by the location stream from face recognizer 210. In the example of FIG. 6B, the identity of the active speaker matches the identity paired with the location stream of Face 1. Based on this match, the location stream of Face 1 may be tagged, by data processor 612, as corresponding to the active speaker (e.g., Active Speaker=T).”), wherein the metadata includes one of: a location in the digital image of the recognized object (paragraph 0073); an identification of what the recognized object is (paragraph 0048); an image of the recognized object (paragraph 0048); an image of an actor corresponding to the recognized object (paragraph 0061); a name of an actor corresponding to the recognized object (paragraph 0074); or spoken lines (paragraph 0072); or the audio (paragraph 0072); 
	It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to incorporate the ability to include the object metadata linking sound to the object in the digital image which produced the sound, as taught by Kumar et al. into the system of Folta et al., because such incorporation would allow more options to a user by providing audio with video, thus increase user flexibility of the system.
Folta et al. and Kumar et al. fail to disclose
generating at least one cryptographic hash of the object metadata; and 
writing the hash to a node of a transaction processing network
Gormish et al. discloses 
generating at least one cryptographic hash of the object metadata (fig. 7); and 
writing the hash to a node of a transaction processing network (fig. 7, paragraph 0121, 0127, Table 2).
It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to incorporate the ability to include generating at least one cryptographic hash of the object metadata; and writing the hash to a node of a transaction processing network, as taught by Gormish et al. into the system of Folta et al., Kumar et al., because such incorporation would allow a user to secure stored information, thus increase user accessibility of the system.

Regarding claim 2, the apparatus further comprising assigning image metadata to the digital image, the image metadata including an identification of the digital image and provenance of the digital image (Folta et al., Abstract, fig. 7, paragraph 0039, 0042; Kumar et al., paragraph 0044, 0055-0056).
The motivation for combining references has been discussed in independent claim above.

Regarding claim 3, the apparatus further comprising generating at least one other cryptographic hash of the image metadata (in addition to discussion above, Gormish et al., fig. 7, paragraph 0121, 0127, Table 2).
The motivation for combining references has been discussed in independent claim above.

Regarding claim 7, the apparatus wherein the object is a person, an animal or a good (in addition to discussion above, Folta et al., Abstract, fig. 7, paragraph 0039, 0042; Kumar et al., paragraph 0048, 0053).
The motivation for combining references has been discussed in independent claim above.

Regarding claim 8, the apparatus wherein, when the object is a person, the object metadata comprises the person's name (in addition to discussion above, Folta et al., Abstract, fig. 7, paragraph 0019, 0024, 0039-0042; Kumar et al., paragraph 0048, 0053).
The motivation for combining references has been discussed in independent claim above.

Claim 11 is rejected for the same reason as discussed in the corresponding claim 1 above.
Claim 12 is rejected for the same reason as discussed in the corresponding claim 2 above.
Claim 13 is rejected for the same reason as discussed in the corresponding claim 3 above.
Claim 17 is rejected for the same reason as discussed in the corresponding claim 7 above.
Claim 18 is rejected for the same reason as discussed in the corresponding claim 8 above.

Regarding claim 21, the apparatus wherein the sound is spoken lines (in addition to discussion above, Kumar et al., paragraph 0060-0062).

Claim 22 is rejected for the same reason as discussed in the corresponding claim 21 above.

Claims 6, 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 2012/0106806 by Folta et al., US 2019/0215464 by Kumar et al. and US 2008/0243898 by Gormish et al. in view of US 2010/0029380 by Rhoads et al.
Regarding claim 6, Folta et al. discloses detecting objects in each frame's image; recognizing the object; assigning metadata to the objects, Kumar et al. discloses the object metadata linking audio from the digital video work to the corresponding object in the frame's image which produces the audio, Gormish et al. discloses for each frame, generating at least one cryptographic hash of the object metadata; writing the hash to a node of a transaction processing network, but fail to disclose the apparatus wherein the digital video work is a scan of an analog video work.
Rhoads et al. discloses the apparatus wherein the digital video work is a scan of an analog video work (paragraph 0007, 0049, 0052, 0068 teaches digital image from analog image)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to incorporate the ability to include the digital image is a scan of an analog image, as taught by Rhoads et al. into the system of Folta et al., Kumar et al., and Gormish et al. because such incorporation would allow a user more options to have different images, thus increase user accessibility of the system.

Claim 16 is rejected for the same reason as discussed in the corresponding claim 6 above.

Claims 9-10, 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 2012/0106806 by Folta et al., US 2019/0215464 by Kumar et al. and US 2008/0243898 by Gormish et al. in view of US 2018/0121635 by Tormasov et al.

Regarding claim 9, Folta et al. discloses detecting objects in each frame's image; recognizing the object; assigning metadata to the objects, Kumar et al. discloses the object metadata linking audio from the digital video work to the corresponding object in the frame's image which produces the audio, Gormish et al. discloses for each frame, generating at least one cryptographic hash of the object metadata; writing the hash to a node of a transaction processing network, but fail to disclose the apparatus wherein the transaction processing network is a blockchain ledger.
Tormasov et al. discloses the transaction processing network is a blockchain ledger (paragraph 0032).
It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to incorporate the ability to include the transaction processing network is a blockchain ledger, as taught by Tormasov et al. into the system of Folta et al., Kumar et al., and Gormish et al. because such incorporation would allow a user to ensure security by providing transaction verification, thus increase user accessibility of the system.

Regarding claim 10, Folta et al. discloses detecting objects in each frame's image; recognizing the object; assigning metadata to the objects, Kumar et al. discloses the object metadata linking audio from the digital video work to the corresponding object in the frame's image which produces the audio, Gormish et al. discloses for each frame, generating at least one cryptographic hash of the object metadata; writing the hash to a node of a transaction processing network, but fail to disclose the apparatus further comprising adding a watermark to the hash value before it is written to the node.
Tormasov et al. discloses adding a watermark to the hash value before it is written to the node (paragraph 0032).
It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to incorporate the ability to include adding a watermark to the hash value before it is written to the node, as taught by Tormasov et al. into the system of Folta et al., Kumar et al., and Gormish et al. because such incorporation would allow a user to ensure security by providing transaction verification, thus increase user accessibility of the system.

Claim 19 is rejected for the same reason as discussed in the corresponding claim 9 above.
Claim 20 is rejected for the same reason as discussed in the corresponding claim 10 above.

Claims 23-24 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 2012/0106806 by Folta et al., US 2019/0215464 by Kumar et al. and US 2008/0243898 by Gormish et al. in view of US 2016/0104511 by An et al.
Regarding claim 10, Folta et al. discloses detecting objects in each frame's image; recognizing the object; assigning metadata to the objects, Kumar et al. discloses the object metadata linking audio from the digital video work to the corresponding object in the frame's image which produces the audio, Gormish et al. discloses for each frame, generating at least one cryptographic hash of the object metadata; writing the hash to a node of a transaction processing network, but fail to disclose the apparatus wherein the object is an audio object having speech, recognizing the object includes conversion of the speech to text of the speech, and the metadata links the speech to the text of the speech.
	An et al. discloses the apparatus wherein the object is an audio object having speech, recognizing the object includes conversion of the speech to text of the speech, and the metadata links the speech to the text of the speech (paragraph 0108 teaches “According to various embodiments of the present disclosure, the electronic device 101 may link the first metadata information with the first image or video in the form of a tag, and the electronic device 101 may be configured to link at least one of (1) the voice data, (2) the first metadata information, or (3) the second metadata information 320b with the second image or video in the form of a tag. Here, for example, the first metadata information may include speech-to-text information extracted from the voice data. Furthermore, the electronic device 101 may determine the relation using at least one of an image analysis, location information, time information, text information, or face recognition information associated with the first image or video and the second image or video”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to incorporate the ability to include the object is an audio object having speech, recognizing the object includes conversion of the speech to text of the speech, and the metadata links the speech to the text of the speech, as taught by An et al. into the system of Folta et al., Kumar et al., and Gormish et al. because such incorporation would allow a user more options to identify the object of digital image, thus increase user accessibility of the system.

Claim 24 is rejected for the same reason as discussed in the corresponding claim 23 above.

Conclusion
	THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NIGAR CHOWDHURY whose telephone number is (571)272-8890.  The examiner can normally be reached on Monday-Friday 9AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Thai Tran can be reached on 571-272-7382.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/NIGAR CHOWDHURY/Primary Examiner, Art Unit 2484