DETAILED ACTION
	Receipt of Applicant’s Amendment, filed February 22, 2021 is acknowledged.  
Claims 1, 24, and 25 were amended.
Claims 1-25 are pending in this office action.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-25 rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-29 of U.S. Patent No. 9436692. 
Instant Claim Set filed 10/28/2020

Claims 1 24, and 25…
	transcoding a video file 

including by splitting the video file into a plurality of chunks respectively comprising unstructured video data; 
wherein a key of the key-value pair includes a visual key and a value of the key-value pair includes an attribute value associated with the visual key, including by:








	calculating the visual key including by:
	 detecting an object in a frame in the video data, 

	extracting a set of features from the frame, 
	calculating a representation of the extracted feature set based at least in part on image content analysis, and 
	setting a value of the visual key to include a characteristic feature of the object based at least in part on the calculated representation of the extracted feature set, and 










	calculating the associated attribute value for the key-value pair, wherein the attribute value comprises one or more coordinates corresponding to a location of the object in an image frame with which the key-value pair is associated and a timestamp associated with the image frame with which the key-value pair is associated; 

	providing the key-value pair as an output, wherein the key-value pair locates video data using reduced computational resources; and 
	storing, in a database, the key-value pair as structured information;

	identifying each video frame of the unstructured video data that includes the object using an index that includes the structured information; and



	performing an operation on one or more of the identified video frames.  






2. The method of claim 1, wherein the splitting the video file into the plurality of chunks comprises in the event that a boundary corresponding to at least one of the plurality of chunks does not conform to a Group of Pictures boundary associated with a first Group of Pictures, sending a first subset of one or more frames corresponding to the first Group Application Serial No. 15/227,532 Attorney Docket No. EMCCP345C12of Pictures to a processor that is to process a second subset of one or more frames corresponding to the first Group of Pictures.  



3. The method of claim 1, wherein the processing of at least the subset of 
	associating each detected moving foreground object with an image frame, a timestamp associated with the image frame, and a fingerprint representation of the corresponding moving foreground object.  

4. The method of claim 1, further comprising: 
	receiving a query from a client; 
	querying the database based at least in part on the query; 
	in the event that the database includes data that is responsive to the query, providing the client with a query result, the query result including an image displaying an object corresponding to the query result; and 
	in response to receiving an input from the client, the input corresponding to a selection of the query result, retrieving corresponding underlying raw video data of the object corresponding to the query result.  

5. The method of claim 3, wherein the fingerprint representation of the corresponding moving foreground object is computed based on an image content analysis.  

6. The method of claim 1, wherein the one or more coordinates indicate a geo-location corresponding to the location of the object in the image frame.  


7. The method of claim 1, wherein the database is searchable using natural- language queries.  



9. The method of claim 8, further comprising detecting the standard video file format.  

10. The method of claim 9, wherein transcoding the video file includes transcoding the video file from the standard video file format to a distributed Application Serial No. 15/227,532 Attorney Docket No. EMCCP345C13file system-friendly format.  

11. The method of claim 1, wherein each chunk is processed at least in part by a map stage to which the chunk is provided as input.  

12. The method of claim 11, wherein the visual key and the associated attribute value are computed by the map stage and provided as output in the form of a composite key-value pair.  

13. The method of claim 1, wherein the key-value pair is sent to a reduce stage determined based at least in part on the visual key.  

14. The method of claim 13, wherein the reduce stage is configured to collect key-value pairs associated with a same moving object and to form and store a trajectory data based on the collected key-value pairs.  

15. The method of claim 1, further comprising using the visual key and associated attribute value to compute and store structured information associated with at least a portion of the video file.  

16. The method of claim 15, wherein the structured information, optional with a 

17. The method of claim 16, further comprising using an index associated with the structured information as stored in the database to retrieve at least a portion of said structured information from the database in response to a database query.  


18.  The method of claim 17, wherein the query comprises a SQL query. 
 
19. The method of claim 16, wherein the database includes an advanced data analytics capability and further comprising using the advanced data analytics capability to perform advanced data analytics processing with respect to at least a portion of the structured information.  

20. The method of claim 15, further comprising providing a visual representation of at least a defined subset of said structured information.  

21.  The method of claim 20, wherein the structured information comprises a trajectory of the object and the visual representation comprises a visual Application Serial No. 15/227,532 Attorney Docket No. EMCCP345C14representation of the trajectory.  

22.  The method of claim 21, wherein the trajectory is represented as a line or one or more other graphical elements arranged in the visual representation to indicate the trajectory of the object with respect to a static or other background or reference.  

23. The method of claim 1, further comprising decoding, in parallel, one or more image frames from a plurality of 


4. The method of claim 3, further comprising transcoding the video …

1. …splitting a video file into a plurality of chunks respectively comprising unstructured video data…
and an associated attribute value, wherein the attribute value comprises one or more of a set of one or more coordinates indicating a geo-location corresponding to a location of the moving object in an image frame with which a key-value pair is associated and a timestamp associated with the image frame with which the key-value pair is associated



…by detecting one or more moving objects…

14. The method of claim 9, further comprising providing a visual representation of at least a defined subset of said structured information.

15. The method of claim 14, wherein the structured information comprises a trajectory of the moving object and the visual representation comprises a visual representation of the trajectory.

16. The method of claim 15, wherein the trajectory is represented as a line or one or more other graphical elements arranged in the visual representation to indicate the trajectory of the moving object with respect to a static or other background or reference.


18. … associating each detected moving foreground object with an image frame, a timestamp associated with the image frame, and a fingerprint representation of the corresponding moving foreground object…




1. …providing the visual key and the associated attribute value as output…


storing, in a database, the visual key and the associated attribute value as structured information in a searchable format.

11. … using an index associated with the structured information as stored in the database to retrieve ta least a potion of said structured information from the database in response to a database query.

21. … in response to receiving an input from the client, the input corresponding to a selection of the query result, retrieving corresponding underlying raw video data of the object corresponding to the query request.


1. …splitting a video file into a plurality of chunks respectively comprising unstructured video data, wherein the splitting the video file into the plurality of chunks comprises in the event that a boundary corresponding to at least one of the plurality of chunks does not conform to a Group of Pictures boundary associated with a first Group of Pictures, sending a first subset of one or more frames corresponding to the first Group of Pictures to a processor that is to process a second subset of one or more frames corresponding to the first Group of Pictures…

18. The method of claim 1, wherein the processing of at least the subset of 

21. The method of claim 1, further comprising: 
	receiving a query from a client; 
	querying the database based at least in part on the query;
	 in the event that the database includes data that is responsive to the query, providing the client with a query result, the query result including an image displaying an object corresponding to the query result; and 
	in response to receiving a input from the client, the input corresponding to a selection of the query result, retrieving corresponding underlying raw video data of the object corresponding to the query result.

18. … a fingerprint representation of the corresponding moving foreground object.



26. a geo-location corresponding to a location of the moving object in an image frame with which a key-value pair is associated and a timestamp associated with the image frame with which the key-value pair is associated…

20. The method of claim 1, wherein the database is searchable using natural-language queries.



3. The method of claim 2, further comprising detecting the standard video file format.

4. The method of claim 3, further comprising transcoding the video file from the standard video file format to distributed file system-friendly format.


5. The method of claim 1, wherein each chunk is processed at least in part by a map stage to which the chunk is provided as input.

6. The method of claim 5, wherein the visual key and the associated attribute value are computed by the map stage and provided as output in the form of a composite key-value pair.

7. The method of claim 1, wherein the key-value pair is sent to a reduce stage determined based at least in part on the visual key.

8. The method of claim 7, wherein the reduce stage is configured to collect key-value pairs associated with a same moving object and to form and store a trajectory data based on the collected key-value pairs.

9. The method of claim 1, further comprising using the visual key and associated attribute value to compute and store structured information associated with at least a portion of the video file.

10. The method of claim 9, wherein the structured information, optional with a 

11. The method of claim 10, further comprising using an index associated with the structured information as stored in the database to retrieve at least a portion of said structured information from the database in response to a database query.


12. The method of claim 11, wherein the query comprises a SQL query.

13. The method of claim 10, wherein the database includes an advanced data analytics capability and further comprising using the advanced data analytics capability to perform advanced data analytics processing with respect to at least a portion of the structured information.

14. The method of claim 9, further comprising providing a visual representation of at least a defined subset of said structured information.

15. The method of claim 14, wherein the structured information comprises a trajectory of the moving object and the visual representation comprises a visual representation of the trajectory.

16. The method of claim 15, wherein the trajectory is represented as a line or one or more other graphical elements arranged in the visual representation to indicate the trajectory of the moving object with respect to a static or other background or reference.

17. The method of claim 1, further comprising decoding, in parallel, one or more image frames from a plurality of 


Although the claims at issue are not identical, they are not patentably distinct from each other because the claims recite all the limitations of the patented claims.  Since the difference of the claimed subject matters in scope is denims and unrelated to the overall aesthetic appearance of the claims being compared.  It would have been obvious to one of ordinary skill in the art that the instant claim limitations recite generic limitations which are recited in the specific format in the patented case.  Note that the italicized text has been identified as distinct language which is addressed by the mapped to claim language.
	
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-25 are rejected under 35 U.S.C. 103 as being unpatentable over Binyu [CN102495725] (all citations are mapped to the provided English translation) in view of Kim, Myoungjin [A hadoop-based Multimedia Transcoding System for Processing Social Media in the PaaS Platform of SMCCSE], Venetianer [2005/0162515], and Leow [20050128318].

	With regard to claim 1, Binyu teaches A method, comprising: 
transcoding a video file (Binyu, ¶29 “better achieve real-time processing of feature extraction for image/video retrieval”) including by splitting the video file into a plurality of chunks (Binyu, ¶11 “dividing the processed image or video frame data into several sub-regions”) …; 
processing at least a subset of the chunks in parallel (Binyu, ¶11 “simultaneously processing the data of these sub-regions in parallel”), including detecting one or more … objects (Binyu, ¶12 “process of feature extraction algorithm for image/video retrieval”) and computing for the one or more … moving objects a key-value pair (Binyu, ¶12 “feature detection and feature description”), wherein a key of the key-value pair includes a visual key (Binyu, ¶12 feature detection and feature description”; ¶18 “the current mainstream feature extraction algorithm SURF (Speeded-Up Robust Features) for image/video retrieval”) and a value of the key-value pair includes an attribute value (Binyu, ¶12 “the output is the position information of the detected image features”; ¶21 “the output is the location information of the image features”) associated with the visual key as the detected image feature (Binyu, ¶12), including by: 
calculating the visual key (Binyu, ¶12 feature detection and feature description”; ¶18 “the current mainstream feature extraction algorithm SURF (Speeded-Up Robust Features) for image/video retrieval”) including by: 
…, 
extracting a set of features from the frame as extracting the features (Binyu, ¶5 “features of the image are extracted according to the , 
calculating a representation of the extracted feature set based at least in part on image content analysis as performing the SURF algorithm to perform feature description (Binyu, ¶12 feature detection and feature description”; ¶18 “the current mainstream feature extraction algorithm SURF (Speeded-Up Robust Features) for image/video retrieval”), and 
setting a value of the visual key to include a characteristic feature of the object based at least in part on the calculated representation of the extracted feature set (Binyu, ¶21 “After this state is completed, the feature location information is stored in the designated cache area.  The description thread reads the characteristics from the cache and describes them”), and 
calculating the associated attribute value for the key-value pair (Binyu, ¶12 “the output is the position information of the detected image features”; ¶21 “the output is the location information of the image features”), wherein the attribute value comprises one or more coordinates corresponding to a location of the object in an image frame (Binyu, ¶12 “the output is the position information of the detected image features”; ¶21 “the output is the location information of the image features”) … associated with the image frame with which the key-value pair is associated (Binyu, ¶10 “extracting feature information from image or video key frames”); 
providing the key-value pair as an output (Binyu, ¶29 “processing feature extraction for image/video retrieval”), wherein the key-value pair locates video data (Binyu, ¶21 “the output is the location information of the image features”) using reduced computational resources (Binyu, ¶9 “pipeline-level parallel technology to improve performance and make the processing speed of the feature extraction algorithm for image/video retrieval meet the requirements of real-time processing”; ¶14 “which not only alleviates the problem of unbalanced load amount threads in the task-level parallel technology, but also release the scalability crisis in the pipeline-level technology, while reducing hardware resources as much as possible Load”); and 
storing, in a database (Binyu, ¶21 “After this state is completed, the feature location information is stored in the designated cache area.  The description thread reads the characteristics from the cache and describes them”), the key-value pair as structured information as the features information must inherently be maintained in order for them to be used for image/video retrieval (Binyu, ¶29 “processing feature extraction for image/video retrieval”); …
 
Binyu does not explicitly teach transcoding a video file… respectively comprising unstructured video data…
Kim teaches transcoding a video file (Kim, Abstract “a video transcoding module”)… respectively comprising unstructured video data (Interpreted in view of Paragraph [0018] of the original specification, “a received video file 104, e.g. in a 
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the parallel video feature extraction algorithm taught by Binyu using the video transcoding process taught by Kim as it delivers excellent speed and quality (Kim, Abstract, “the proposed Hadoop-based multimedia transcoding system delivers excellent speed and quality”) to providing parallel processing for video transcoding (Kim, Page 2828 “provide a distributed and parallel processing system for media transcoding functions and delivering social media”).
Binyu does not explicitly teach one or more moving objects and computing for the one or more detected moving objects a key-value pair… detecting an object in a frame in the video data, … with which the key-value pair is associated and a timestamp.
Venetianer teaches one or more moving objects (Venetianer, ¶113 “movement of an object”) and computing for the one or more detected moving objects a key-value pair (Venetianer, ¶154 “The activity records includes, for example: details of a trajectory of an object; a time of detection of an object; a position of detection of an object, and a description or definition of the event discriminator that was employed”) … detecting an object in a frame in the video data (Venetianer, ¶140 “objects are detected via movement”), … with which the key-value pair is associated and a timestamp (Venetianer, ¶136 “temporal attributes”).  
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the proposed 
Binyu does not explicitly teach identifying each video frame of the unstructured video data that includes the object using an index that includes the structured information; and performing an operation on one or more of the identified video frames.  
Leow teaches identifying each video frame of the unstructured video data as the video streams (Loew, ¶15 “The video server stores the video streams”) that includes the object using an index that includes the structured information (Loew, Claim 34 “wherein the data attribute serves as an index to retrieve video and data segments of the same characteristic inferred by the data attribute”); and
performing an operation on one or more of the identified video frames (Loew, ¶21 “A block 58 permits the user to select the video sequences for playback that match all or some of the tagged string by the use of the graphical interface”).  
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the proposed combination using an index to enable search and retrieval as taught by Leow as it yields the predictable results of facilitating the searching and retrieval of video segments containing those attributes.  The data extracted by the proposed combination provide reasonable attributes which one of ordinary skill in the art would readable recognize as 

With regard to 2 the proposed combination further teaches wherein the splitting the video file into the plurality of chunks (Binyu, ¶11 “dividing the processed image or video frame data into several sub-regions”) comprises in the event that a boundary corresponding to at least one of the plurality of chunks as the sub-regions (Binyu, ¶11 “dividing the processed image or video frame data into several sub-regions”) does not conform to a Group of Pictures boundary associated with a first Group of Pictures (Binyu, ¶13 “The treads in each tread group are divided in to two types: detection thread and description thread”), sending a first subset of one or more frames as the detection thread (Id) corresponding to the first GroupApplication Serial No. 15/227,532 Attorney Docket No. EMCCP345C12of Pictures to a processor that is to process (Binyu, ¶13 “each [thread] independently processes the detection and description of a data block”)  a second subset of one or more frames corresponding to the first Group of Pictures as the description thread (Binyu, ¶13 “in each thread group, the detection thread completes the features detection of the previous image block After the calculation, the image block is handed over to the description thread for the next description calculation”).  

With regard to 3 the proposed combination further teaches wherein the processing of at least the subset of chunks (Binyu, ¶11 “simultaneously processing the data of these sub-regions in parallel”) and computing for each detected (Binyu, ¶12 “process of feature extraction algorithm for image/video retrieval”) moving object  the visual key (Binyu, ¶12 feature detection and feature description”; ¶18 “the current mainstream feature extraction algorithm SURF (Speeded-Up Robust Features) for image/video retrieval”) and the associated attribute value (Binyu, ¶12 “the output is the position information of the detected image features”; ¶21 “the output is the location information of the image features”) comprises: 
associating each detected moving foreground object with an image frame (Venetianer, ¶113 “movement of an object”), a timestamp associated with the image frame (Venetianer, ¶136 “temporal attributes”), and a fingerprint representation (Venetianer, ¶163 “Each object is assigned a label”) of the corresponding moving foreground object (Venetianer, ¶113 “movement of an object”).  

With regard to 4 the proposed combination further teaches further comprising (Binyu, ¶29 “for image/video retrieval”): 
receiving a query from a client (Venetianer, ¶123 “the query, “Show me any red vehicle,” 171 is posed”); 
querying the database based at least in part on the query (Venetianer, ¶121 “The primitive data can be thought of as data stored in a database.  To detect events occurrences in it, an efficient query language is required”); 
in the event that the database includes data that is responsive to the query (Venetianer, ¶121 “The primitive data can be thought of as data stored in a database.  To detect events occurrences in it, an efficient query language is required”), providing the client with a query result (Venetianer, ¶75 “the system may be used to produce, , the query result including an image displaying an object corresponding to the query result (Venetianer, ¶157 “The output can include one or more reports… Example of a report include… representative imagery of each event occurrence; representative video of each event occurrence”; Figure 15); and 
in response to receiving an input from the client, the input corresponding to a selection of the query result (Venetianer, ¶161 “a point-and-click interface allows the operator to navigate through representative still and video imagery of regions and/or activities that the system has detected and archived”), retrieving corresponding underlying raw video data of the object corresponding to the query result as navigating through representative still and video imagery (Id).  

With regard to 5 the proposed combination further teaches wherein the fingerprint representation (Venetianer, ¶163 “Each object is assigned a label”) of the corresponding moving foreground object (Venetianer, ¶113 “movement of an object”) is computed based on an image content analysis (Venetianer, ¶163 “two objects are each classified as one person, and one object is classified as not a person”; ¶144 “any technique for generating blobs can be used for this block”; ¶145 “any technique for tracking blobs can be used for this block”).  

 wherein the one or more coordinates indicate a geo-location corresponding to the location of the object in the image frame (Venetianer, ¶108 “Position refers to a spatial attribute of an object.  The position may be, for example, an image position in pixel coordinates, an absolute real-world position in some world coordinate system”).  

With regard to 7 the proposed combination further teaches wherein the database is searchable using natural- language queries as the example query is a natural language query (Venetianer, ¶123 “Show me any red vehicle”).  

With regard to 8 the proposed combination further teaches wherein the video file comprises a standard video file format (Interpreted in view of Paragraph [0018] of the original specification, “a received video file 104, e.g. in a standard video file format such as MPEG, AVI, H.264, ect”; Kim Page 2839 see Table 2 Original video file being AVI format).  

With regard to 9 the proposed combination further teaches detecting the standard video file format (Interpreted in view of Paragraph [0018] of the original specification, “a received video file 104, e.g. in a standard video file format such as MPEG, AVI, H.264, ect”; Kim Page 2839 see Table 2 Original video file being AVI format).  

 wherein transcoding the video file includes transcoding the video file from the standard video file format (Interpreted in view of Paragraph [0018] of the original specification, “a received video file 104, e.g. in a standard video file format such as MPEG, AVI, H.264, ect”; Kim Page 2839 see Table 2 Original video file being AVI format) to a distributed Application Serial No. 15/227,532 Attorney Docket No. EMCCP345C13file system-friendly format (Kim, Page 2829 “Improvements in quality and speed are achieved by adopting Hadoop Distributed File system (HDFS) for storing large amounts of video data”; Page 2839 Table 2, see MPEG-4).  

With regard to 11 the proposed combination further teaches wherein each chunk is processed at least in part by a map stage (Binyu, ¶12 “the detection stage”) to which the chunk (Binyu, ¶11 “dividing the processed image or video frame into several sub-regions… This method divides the input image into multiple data blocks, and performs feature detection and feature description on these data blocks in parallel”) is provided as input (Binyu, ¶12 “The input of the detection stage is the image, and the output is the position information of the detected features”).  

With regard to 12 the proposed combination further teaches wherein the visual key (Binyu, ¶12 feature detection and feature description”; ¶18 “the current mainstream feature extraction algorithm SURF (Speeded-Up Robust Features) for image/video retrieval”) and the associated attribute value (Binyu, ¶12 “the output is the position information of the detected image features”; ¶21 “the output is the location information of the image features”) are computed by the map stage (Binyu, ¶12 “The input of the and provided as output in the form of a composite key-value pair (Binyu, ¶12 “The input of the detection stage is the image, and the output is the position information of the detected features”).  

With regard to 13 the proposed combination further teaches wherein the key-value pair is sent to a reduce stage (Binyu, ¶12 “the description stage”) determined based at least in part on the visual key (Binyu, ¶13 “the detection thread completes the feature detection of the previous image block After the calculation, the image block is handed over to the description thread for the next description calculation”).  

With regard to 14 the proposed combination further teaches wherein the reduce stage (Binyu, ¶12 “the description stage”) is configured to collect key-value pairs associated with a same moving object (Venetianer, ¶113 “movement of an object”) and to form and store a trajectory data (Venetianer, ¶115 “a trajectory”) based on the collected key-value pairs (Binyu, ¶12 “The detection component (thread) stores the detected feature point location information in the specified cache area, and the description component (thread) reads the characteristics from the cache area and describes them.”).  

With regard to 15 the proposed combination further teaches using the visual key and associated attribute value to compute and store structured information associated with at least a portion of the video file (Binyu, ¶12 “The detection .  

With regard to 16 the proposed combination further teaches wherein the structured information, optional with a thumbnail image for future reference (Venetianer, Figure 15 see the image icons associated with the object), is stored in a database (Binyu, ¶12 “The detection component (thread) stores the detected feature point location information in the specified cache area, and the description component (thread) reads the characteristics from the cache area and describes them.”).  

With regard to claim 17 the proposed combination further teaches using an index (Leow, Claim 34 “wherein the data attributes serves as an index to retrieve video and data segments of the same characteristics inferred by the data attribute”) associated with the structured information as stored in the database (Venetianer, ¶121 “The primitive data can be thought of as data stored in a database.  To detect events occurrences in it, an efficient query language is required”) to retrieve at least a portion of said structured information from the database (Venetianer, ¶75 “the system may be used to produce, for example, security or market research reports that can be tailored according to the needs of an operator, and as an option, can be presented through an interactive web-based interface, or other reporting mechanism”) in response to a database query (Venetianer, ¶123 “the query, “Show me any red vehicle,” 171 is posed”).  

With regard to claim 18 the proposed combination further teaches wherein the query comprises a SQL query (Leow, ¶21 “A block 54 automatically composes an SQL query based on the data tag search string.  The search using this SQL query returns all rows of the SQL database containing the search string”).  
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the proposed combination using SQL queries to search data tags stored in SQL readable format, as this is a known format that one of ordinary skill in the art would expect to facilitate the search and retrieval of data elements within a computing environment.


With regard to 19 the proposed combination further teaches wherein the database includes an advanced data analytics capability and further comprising using the advanced data analytics capability to perform advanced data analytics processing with respect to at least a portion of the structured information (Binyu, ¶18 “All parallel technologies are based on the current mainstream feature extraction algorithm SURF (speeded-Up Robust Features) for image/ video retrieval, and are implemented using pthreads”).  

With regard to 20 the proposed combination further teaches providing a visual representation of at least a defined subset of said structured information (Venetianer, Figure 15).  

With regard to 21 the proposed combination further teaches wherein the structured information comprises a trajectory of the object (Venetianer, ¶113-115) and the visual representation comprises a visual Application Serial No. 15/227,532 Attorney Docket No. EMCCP345C14representation of the trajectory (Venetianer, Figure 15 see the lines on the image).  

With regard to 22 the proposed combination further teaches wherein the trajectory is represented as a line (Venetianer, Figure 15 see the lines on the image) or one or more other graphical elements (Venetianer, Figure 15, see the trajectory of the person 1032 and 1033) arranged in the visual representation to indicate the trajectory of the object with respect to a static or other background or reference as the movement of the two distinct people moving through the aisle in the image (Venetianer, Figure 15).  

With regard to 23 the proposed combination further teaches decoding, in parallel, one or more image frames (Binyu, ¶11 “dividing the processing image or video frame data into several sub-regions, and simultaneously processing the data of these sub-regions in parallel”) from a plurality of localized distributed chunks in a Hadoop Distributed File System (HDFS) (Kim, Page 2829 “Improvements in quality and speed are achieved by adopting Hadoop Distributed File system (HDFS)”).  

With regard to claim 24 Binyu teaches A system, comprising: 
a processor (Binyu, ¶2 “parallel processors”); and 
a memory coupled with the processor, wherein the memory is configured to provide the processor with instructions which when executed cause the processor (Binyu, ¶23 “The test of the present invention on a 16-core general-purpose processor indicates that the cache performance is higher than the cache size is 128 or greater”) to: 
transcode a video file (Binyu, ¶29 “better achieve real-time processing of feature extraction for image/video retrieval”) including by splitting the video file into a plurality of chunks (Binyu, ¶11 “dividing the processed image or video frame data into several sub-regions”) …; 
process at least a subset of the chunks in parallel (Binyu, ¶11 “simultaneously processing the data of these sub-regions in parallel”), including detecting one or more … objects (Binyu, ¶12 “process of feature extraction algorithm for image/video retrieval”) and computing for the one or more … moving objects a key-value pair (Binyu, ¶12 “feature detection and feature description”), wherein a key of the key-value pair includes a visual key (Binyu, ¶12 feature detection and feature description”; ¶18 “the current mainstream feature extraction algorithm SURF (Speeded-Up Robust Features) for image/video retrieval”) and a value of the key-value pair includes an attribute value (Binyu, ¶12 “the output is the position information of the detected image features”; ¶21 “the output is the location information of the image features”) associated with the visual key as the detected image feature (Binyu, ¶12), including by: 
calculating the visual key (Binyu, ¶12 feature detection and feature description”; ¶18 “the current mainstream feature extraction algorithm SURF (Speeded-Up Robust Features) for image/video retrieval”) including by: 
…, 
extracting a set of features from the frame as extracting the features (Binyu, ¶5 “features of the image are extracted according to the feature extraction algorithm… The current mainstream local feature algorithms include SIFT and SURF algorithms”), 
calculating a representation of the extracted feature set based at least in part on image content analysis as performing the SURF algorithm to perform feature description (Binyu, ¶12 feature detection and feature description”; ¶18 “the current mainstream feature extraction algorithm SURF (Speeded-Up Robust Features) for image/video retrieval”), and 
setting a value of the visual key to include a characteristic feature of the object based at least in part on the calculated representation of the extracted feature set (Binyu, ¶21 “After this state is completed, the feature location information is stored in the designated cache area.  The description thread reads the characteristics from the cache and describes them”), and 
calculating the associated attribute value for the key-value pair (Binyu, ¶12 “the output is the position information of the detected image features”; ¶21 “the output is the location information of the image , wherein the attribute value comprises one or more coordinates corresponding to a location of the object in an image frame (Binyu, ¶12 “the output is the position information of the detected image features”; ¶21 “the output is the location information of the image features”) … associated with the image frame with which the key-value pair is associated (Binyu, ¶10 “extracting feature information from image or video key frames”); 
provide the key-value pair as an output (Binyu, ¶29 “processing feature extraction for image/video retrieval”), wherein the key-value pair locates video data (Binyu, ¶21 “the output is the location information of the image features”) using reduced computational resources (Binyu, ¶9 “pipeline-level parallel technology to improve performance and make the processing speed of the feature extraction algorithm for image/video retrieval meet the requirements of real-time processing”; ¶14 “which not only alleviates the problem of unbalanced load amount threads in the task-level parallel technology, but also release the scalability crisis in the pipeline-level technology, while reducing hardware resources as much as possible Load”); and 
store, in a database (Binyu, ¶21 “After this state is completed, the feature location information is stored in the designated cache area.  The description thread reads the characteristics from the cache and describes them”), the key-value pair as structured information as the features information must inherently be maintained in order for them to be used for image/video retrieval(Binyu, ¶29 “processing feature extraction for image/video retrieval”)... 
respectively comprising unstructured video data…
Kim teaches transcoding a video file (Kim, Abstract “a video transcoding module”)… respectively comprising unstructured video data (Interpreted in view of Paragraph [0018] of the original specification, “a received video file 104, e.g. in a standard video file format such as MPEG, AVI, H.264, ect”; Kim Page 2839 see Table 2 Original video file being AVI format).
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the parallel video feature extraction algorithm taught by Binyu using the video transcoding process taught by Kim as it delivers excellent speed and quality (Kim, Abstract, “the proposed Hadoop-based multimedia transcoding system delivers excellent speed and quality”) to providing parallel processing for video transcoding (Kim, Page 2828 “provide a distributed and parallel processing system for media transcoding functions and delivering social media”).
Binyu does not explicitly teach one or more moving objects and computing for the one or more detected moving objects a key-value pair… detecting an object in a frame in the video data, … with which the key-value pair is associated and a timestamp.
Venetianer teaches one or more moving objects (Venetianer, ¶113 “movement of an object”) and computing for the one or more detected moving objects a key-value pair (Venetianer, ¶154 “The activity records includes, for example: details of a trajectory of an object; a time of detection of an object; a position of detection of an object, and a description or definition of the event discriminator that was employed”) … detecting an object in a frame in the video data (Venetianer, ¶140 “objects are detected via movement”), … with which the key-value pair is associated and a timestamp (Venetianer, ¶136 “temporal attributes”).  
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the proposed combination to extract features including object trajectories, and temporal information, and to provide the ability to query videos based on these features as taught by Venetianer as it yields the predictable results providing a means of reviewing video surveillance data (Venetianer, ¶30, ¶36) and to identify desired portions of video surveillance data (Venetianer, ¶34).
Binyu does not explicitly teach identifying each video frame of the unstructured video data that includes the object using an index that includes the structured information; and performing an operation on one or more of the identified video frames.  
Leow teaches identifying each video frame of the unstructured video data as the video streams (Loew, ¶15 “The video server stores the video streams”) that includes the object using an index that includes the structured information (Loew, Claim 34 “wherein the data attribute serves as an index to retrieve video and data segments of the same characteristic inferred by the data attribute”); and
performing an operation on one or more of the identified video frames (Loew, ¶21 “A block 58 permits the user to select the video sequences for playback that match all or some of the tagged string by the use of the graphical interface”).  


With regard to claim 25 Binyu teaches A computer program product, the computer program product being embodied in a non-transitory tangible computer readable storage medium and comprising computer instructions (Binyu, ¶23 “The test of the present invention on a 16-core general-purpose processor indicates that the cache performance is higher than the cache size is 128 or greater”) for:
transcoding a video file (Binyu, ¶29 “better achieve real-time processing of feature extraction for image/video retrieval”) including by splitting the video file into a plurality of chunks (Binyu, ¶11 “dividing the processed image or video frame data into several sub-regions”) …; 
processing at least a subset of the chunks in parallel (Binyu, ¶11 “simultaneously processing the data of these sub-regions in parallel”), including detecting one or more … objects (Binyu, ¶12 “process of feature extraction algorithm for image/video retrieval”) and computing for the one or more … moving objects a key-value pair (Binyu, ¶12 “feature detection and feature description”), wherein a key of the key-value pair includes a visual key (Binyu, ¶12 feature detection and feature description”; ¶18 “the current mainstream feature extraction algorithm SURF (Speeded-Up Robust Features) for image/video retrieval”) and a value of the key-value pair includes an attribute value (Binyu, ¶12 “the output is the position information of the detected image features”; ¶21 “the output is the location information of the image features”) associated with the visual key as the detected image feature (Binyu, ¶12), including by: 
calculating the visual key (Binyu, ¶12 feature detection and feature description”; ¶18 “the current mainstream feature extraction algorithm SURF (Speeded-Up Robust Features) for image/video retrieval”) including by: 
…
extracting a set of features from the frame as extracting the features (Binyu, ¶5 “features of the image are extracted according to the feature extraction algorithm… The current mainstream local feature algorithms include SIFT and SURF algorithms”), 
calculating a representation of the extracted feature set based at least in part on image content analysis as performing the SURF algorithm to perform feature description (Binyu, ¶12 feature detection and feature description”; ¶18 “the current mainstream feature extraction algorithm SURF (Speeded-Up Robust Features) for image/video retrieval”), and 
setting a value of the visual key to include a characteristic feature of the object based at least in part on the calculated representation of the extracted feature set (Binyu, ¶21 “After this state is completed, the feature location information is stored in the designated cache area.  The description thread reads the characteristics from the cache and describes them”), and 
calculating the associated attribute value for the key-value pair (Binyu, ¶12 “the output is the position information of the detected image features”; ¶21 “the output is the location information of the image features”), wherein the attribute value comprises one or more coordinates corresponding to a location of the object in an image frame (Binyu, ¶12 “the output is the position information of the detected image features”; ¶21 “the output is the location information of the image features”)… associated with the image frame with which the key-value pair is associated (Binyu, ¶10 “extracting feature information from image or video key frames”); 
providing the key-value pair as an output (Binyu, ¶29 “processing feature extraction for image/video retrieval”), wherein the key-value pair locates video data (Binyu, ¶21 “the output is the location information of the image features”) using reduced computational resources (Binyu, ¶9 “pipeline-level parallel technology to improve performance and make the processing speed of the feature extraction algorithm for image/video retrieval meet the requirements of real-time processing”; ¶14 “which not only alleviates the problem of unbalanced load amount threads in the task-level parallel technology, but also release the scalability crisis in the pipeline-level technology, while reducing hardware resources as much as possible Load”); and 
storing, in a database (Binyu, ¶21 “After this state is completed, the feature location information is stored in the designated cache area.  The description thread reads the characteristics from the cache and describes them”), the key-value pair as structured information as the features information must inherently be maintained in order for them to be used for image/video retrieval(Binyu, ¶29 “processing feature extraction for image/video retrieval”)…  
Binyu does not explicitly teach transcoding a video file… respectively comprising unstructured video data…
Kim teaches transcoding a video file (Kim, Abstract “a video transcoding module”)… respectively comprising unstructured video data (Interpreted in view of Paragraph [0018] of the original specification, “a received video file 104, e.g. in a standard video file format such as MPEG, AVI, H.264, ect”; Kim Page 2839 see Table 2 Original video file being AVI format).
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the parallel video feature extraction algorithm taught by Binyu using the video transcoding process taught by Kim as it delivers excellent speed and quality (Kim, Abstract, “the proposed Hadoop-based multimedia transcoding system delivers excellent speed and quality”) to providing parallel processing for video transcoding (Kim, Page 2828 “provide a distributed and parallel processing system for media transcoding functions and delivering social media”).

Venetianer teaches one or more moving objects (Venetianer, ¶113 “movement of an object”) and computing for the one or more detected moving objects a key-value pair (Venetianer, ¶154 “The activity records includes, for example: details of a trajectory of an object; a time of detection of an object; a position of detection of an object, and a description or definition of the event discriminator that was employed”) … detecting an object in a frame in the video data (Venetianer, ¶140 “objects are detected via movement”), … with which the key-value pair is associated and a timestamp (Venetianer, ¶136 “temporal attributes”).  
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the proposed combination to extract features including object trajectories, and temporal information, and to provide the ability to query videos based on these features as taught by Venetianer as it yields the predictable results providing a means of reviewing video surveillance data (Venetianer, ¶30, ¶36) and to identify desired portions of video surveillance data (Venetianer, ¶34).
Binyu does not explicitly teach identifying each video frame of the unstructured video data that includes the object using an index that includes the structured information; and performing an operation on one or more of the identified video frames.  
identifying each video frame of the unstructured video data as the video streams (Loew, ¶15 “The video server stores the video streams”) that includes the object using an index that includes the structured information (Loew, Claim 34 “wherein the data attribute serves as an index to retrieve video and data segments of the same characteristic inferred by the data attribute”); and
performing an operation on one or more of the identified video frames (Loew, ¶21 “A block 58 permits the user to select the video sequences for playback that match all or some of the tagged string by the use of the graphical interface”).  
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the proposed combination using an index to enable search and retrieval as taught by Leow as it yields the predictable results of facilitating the searching and retrieval of video segments containing those attributes.  The data extracted by the proposed combination provide reasonable attributes which one of ordinary skill in the art would readable recognize as being usable within the index and search system taught by Loew to enable the system to provide search functionality on video files.

Response to Arguments
Applicant’s arguments with have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMANDA WILLIS whose telephone number is (571)270-7691.  The examiner can normally be reached on Monday-Friday 8am-2pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Boris Gorney can be reached on 571-270-5626.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/AMANDA L WILLIS/Primary Examiner, Art Unit 2158