DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on July 16th, 2021 has been entered.

 Response to Amendment
The amendment filed on July 16th, 2021 has been entered.
The amendment of claims 1-3, 5, 8-10, 12, 14-16, 18, and 21 has been acknowledged.
In view of the amendment, the 35 U.S.C. 112(b) rejections have been withdrawn.

Response to Arguments
Applicant’s arguments filed on July 16th, 2021, with respect to the pending claims, have been fully considered but are moot because the arguments rely on newly added and/or amended claim limitations (e.g., “a hierarchy of abstraction”). The examiner has revised the rejections to match the new claim limitations

Claim Rejections - 35 USC § 103
Claim(s) 1-21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Grace Li (U.S. Patent No. 10,353,951), in view of Cao et al. (U.S. Patent No. 9,251,433), and further in view of Wang et al. (U.S. Patent No. 8,451,292), hereinafter referred to as Li, Cao, and Wang, respectively.
Regarding claim 1, Li teaches a method, implemented on a machine having at least one processor, storage, and a communication platform for responding to an image related query (Li Abstract: “A method is provided for receiving from a user a first search query for media files from a collection of media files”; Li col. 1 lines 8-11: “The present disclosure generally relates to a computer-based search engine, and more particularly to methods and systems to refine a search query in the search engine based on user image selection”; Li Fig. 8: see processor 802, memory 804, data storage 806, communications module 812, device 814 & 816; Li col. 1 lines 49-67: “a system includes one or more processors and a computer-readable storage medium coupled to the one or more processors”; Li col. 4 lines 44-60: “Server 130 … having an appropriate processor, memory, and communications capability for hosting the neural network … any other devices having appropriate processor, memory, and communications capabilities for accessing the search engine on one of the servers 130”), comprising: 
receiving, via the communication platform, information related to each image of a plurality of images, wherein the information represents concepts co-existing in each image of the plurality of images (Li Fig. 8: 812; Li col. 5 lines 51-67: “The visual media vector Li Fig. 5: the image shows information representing visual concepts co-existing in the image, e.g., cat, color, breed); 
creating visual semantics for each image of the plurality of images based on the information related thereto (Li col. 5 lines 51-67 discussed above; Li col. 7 lines 39-53: “processor 236 of server 130 executes instructions to submit a plurality of training images containing content identifying different semantic concepts (e.g., woman, coffee, beach, soccer) to NN 240 that is configured to analyze pixel data collected from different frames in a time sequence from a scene for each of the plurality of training visual media files to identify selected features. The selected features may correspond to a particular semantic concept”); and 
obtaining via machine learning, for each image of the plurality of images, a representation of a corresponding scene based on the visual semantics of each image of the plurality of images, wherein the representation of the corresponding scene captures relationships among the concepts co-existing in each image of the plurality of images associated with the corresponding scenes (Li col. 5 lines 51-67 & col. 7 lines 39-53 discussed above; Li col. 4 lines 1-10: “The system provides for machine learning capability where the system can learn from a content item such as prior search queries from one or multiple users”; Li col. 5 lines 7-32: “NN 240 learns and adjusts its weights to better fit a desired outcome, including provided image data (e.g., the selection of a specific frame in a sequence of frames, or an object, or a specific scene in a video clip)”; Li col. 9 lines 10-36: “Multi-dimensional space 330 is dense, including clusters 340-1 and 340-2 (hereinafter, collectively referred to as "clusters 340"), of closely related vectors 331 and 332”). 

Pertaining to the same field of endeavor, Cao teaches obtaining scene representations that capture spatial relationships among the co-existing concepts (Cao col. 5 lines 23-64: “in step 106 the spatial relationship of the semantic regions is taken into account and the list of potential matches is further pruned … further pruned by taking into account the spatial relationship of the semantic labeled regions in the query/reference images … step 106 is performed by modeling the spatial relationships of the semantic regions in the images using spatial graph representing the relative positioning of each semantic region in the image”).
Li and Cao are considered to be analogous art because they are directed to image processing using visual semantics. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the search query refinement method and system (as taught by Li) to use spatial relationships of semantic concepts (as taught by Cao) because the combination produces a robust matching of images of different types and across different views (Cao col 1 lines 25-47).
Li, in view of Cao, does not appear to explicitly teach conceptually summarizing the scene wherein the summarizing is inferred based on the concepts co-existing in each image of the plurality of images.
Pertaining to the same field of endeavor, Wang teaches that the visual semantics for each image of the plurality of images comprises a hierarchy of abstraction having multiple abstraction levels and conceptually summarizing the scene wherein the summarizing is inferred based on the concepts co-existing in each image of the plurality of images (Wang Abstract: “A video summarized method based on mining the story structure and semantic relations among concept entities”; Wang col. 1 lines 41-49: “An objective of the present invention is to provide a Wang col. 5 lines 59-65: “The root is defined as a first level. Derive several levels of child nodes from the root r, and each of the child nodes directly or indirectly represents the expanded meanings of the root”; Wang col. 6 lines 53-59: “The concept of hypernym can be used to express the hierarchical relation between two words. Given the example of the word ‘teacher’, this word belongs to a subset of the word ‘person’ while ‘person’ is defined in a concept classification of WordNet. In other words, ‘person’ is a hypernym of ‘teacher’” – ¶0043 of the specification describes that an example of a hierarchy of abstraction includes a ‘person’ representing abstracted concept encompassing concepts e.g., ‘conductor,’ ‘bandleader,’ and violinist musician.’ Similarly, the ‘person’ and ‘teacher’ correspond to a hierarchy of abstraction; Wang Fig. 8).
Li, in view of Cao, and Wang are considered to be analogous art because they are directed to image processing using visual semantics. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the search query refinement method and system using spatial relationships of semantic concepts (as taught by Li, in view of Cao) to conceptually summarize the scene and use a hierarchical trees of concepts (as taught by Wang) because the combination allows the user to dynamically skim the conceptually organized videos (Wang col. 1 lines 27-33).

Regarding claim 2, Li, in view of Cao and Wang, teaches the method of claim 1, wherein the visual semantics created for each image of the plurality of images includes an identifier of the image and one or more annotations of the concepts co-existing in each image of the plurality images (Li col. 4 lines 32-43: “identify features of images corresponding to one or more image identifiers”; Li col. 6 lines 51-67: “Memory 232 also includes an annotated training database 240 … may include visual media files from visual media database 252 that are human annotated with information indicating a caption, a keyword, or text descriptor associated with the Li col. 7 lines 9-22: “the metadata file may include one or more rows of data including an image identifier, a video URL, and a style identifier (e.g., identifying the corresponding style class)”; Li col. 7 lines 39-53: “processor 236 of server 130 executes instructions to submit a plurality of training images containing content identifying different semantic concepts (e.g., woman, coffee, beach, soccer) to NN 240 that is configured to analyze pixel data collected from different frames in a time sequence from a scene for each of the plurality of training visual media files to identify selected features. The selected features may correspond to a particular semantic concept”; Li Fig. 5). 

Regarding claim 3, Li, in view of Cao and Wang, teaches the method of claim 2, wherein the identifier of each image of the plurality of images provides a context of the visual semantics; and the one or more annotations specify the concepts co-existing in each image of the plurality of images (see Li col. 4 lines 32-43 & col. 7 line 9-22 discussed above regarding an image identifier; see Li col. 7 lines 39-53 discussed above regarding visual semantics; see Li col. 6 lines 51-67 regarding annotated training images). 

Regarding claim 4, Li, in view of Cao and Wang, teaches the method of claim 1, wherein the representation of the corresponding scene corresponds to a scene embedding (Li col. 5 lines 33-50: “Training database 248 can be, for example, a dataset of content items (e.g., visual media files) corresponding to any one of abstract images, sport images, outdoor images, pet images, scenes containing logo images, scenes containing icon images, scenes containing texture images, scenes containing Instagram images, scenes containing illustration images, scenes containing background images, scenes containing stock people images, scenes containing high dynamic range (HDR) images, scenes containing collection images, scenes containing macro images, scenes containing candid people images, scenes containing vector images, scenes containing pattern images, and the like”). 

Regarding claim 5, Li, in view of Cao and Wang, teaches the method of claim 2, wherein the representation of the corresponding scene includes: 
a plurality of vectors for the one or more annotations related to the concepts co-existing in each image of the plurality of images, the identifier of each image of the plurality of images, and at least one combination thereof (Li col. 5 lines 7-32: “NN 240 has access to a multi-dimensional image vector space to provide images from a visual media database 252 … the plurality of values may form the coordinates of an image vector in a multi-dimensional vector space”; Li col. 6 lines 5-50: “Training vectors for each of the visual media files may be clustered into a number of clusters … methods of vector quantization, or other clustering approaches … visual media database 252 stores the training vectors (e.g., a 256 dimensional vector) for each visual media file .. NN 240 can be used to train a model to generate training vectors for visual media files”; Li col. 7 lines 39-53: “a plurality of training images containing content identifying different semantic concepts”; Li Fig. 3 & col. 8 lines 40-67: “vectors 335”); and 
an artificial neural network (ANN) with a plurality of layers of nodes and connections therein connecting the nodes (Li col. 5 lines 7-32: “NN 240 may include a feed-forward artificial neural network where individual neurons are tiled in such a way that individual neurons (or ‘nodes’) respond to overlapping regions in a visual field”). 

Regarding claim 6, Li, in view of Cao and Wang, teaches the method of claim 1, further comprising: 
receiving the image related query (Li Abstract: “receiving from a user a first search query”; Li Fig. 6: 602); 
obtaining a response to the image related query based on representations obtained via the machine learning (Li Abstract: “providing, in response to the first search query, a first Li Fig. 6: 604, 612, and 614; Li col. 4 lines 1-10: “The system provides for machine learning capabilities where the system can learn from a content item such as prior search queries from one or multiple users to better focus a search scope”; Li col. 5 lines 7-32: “NN 240 learns and adjusts its weights to better fit a desired outcome, including provided image data”). 

Regarding claim 7, Li, in view of Cao and Wang, teaches the method of claim 6, wherein the image related query is directed to at least one (Note that only one of the alternative limitations is required by the claim language) of: 
a request to receive a summary of at least one concept, from the concepts co-existing in each of the plurality of images, included in the image related query (Li col. 5 lines 51-& col. 7 lines 39-53: discussed above; different semantic concepts (e.g., woman, coffee, beach, soccer); also see Li Fig. 6); and 
a request to receive one or more images that meet a conceptual similarity criterion with respect to the image related query (Li Abstract: “detecting a user selection of a responsive media file based on an interaction between the user and the responsive media file and selecting multiple similar media files having a visual similarity with the responsive media file. The method also includes generating a refined query based on a caption associated with a refined cluster of media files, the refined cluster of media files being proximal to the similar media file, and displaying, to the user and based on the refined query, a refined search result comprising refined media files from the refined query”; Li Fig. 6: 608-610; Li col. 4 lines 11-19: “advantageously add limited computational overhead by including a plurality of similar images to selected images from the search results for a first user provided query”; Li col. 10 lines 49-67: “Based on similar visual media file set 445, the system may obtain proposed queries 422-1 through 422-3”). 

claim 8, Li, in view of Cao and Wang, further teaches a system for responding to an image related query, the system comprising a visual semantics generator and an image scene embedding training unit implemented by a processor and configured to perform the method of claim 1 (Li Abstract & Fig. 8; Li col. 8 lines 15-39: “Processor 236, upon receiving the search query for search engine 242, submits a search request … receives an identification of a plurality of images … Processor 236, using a logistic regression model, identifies the level of relevance for each of the visual media files in visual media database”). Therefore, claim 8 is rejected using the same rationale as applied to claim 1 set forth above. 

Regarding claim 14, Li, in view of Cao and Wang, further teaches a machine readable and non-transitory medium having information including machine executable instructions stored thereon for responding to an image related query, wherein the information, when read by the machine, causes the machine to perform the method of claim 1 (Li col. 2 lines 3-21: “a non-transitory, machine-readable storage medium is described that includes machine-readable instructions for causing a processor to execute a method”; Li col.7 lines 39-53: “Processor 236 is configured to execute instructions, such as instructions physically coded into processor 236, instructions received from software in memory 232, or a combination of both”). Therefore, claim 14 is rejected using the same rationale as applied to claim 1 set forth above.

Claims 9 and 15 are rejected using the same rationale as applied to claim 2 set forth above.

Claims 10 and 16 are rejected using the same rationale as applied to claim 3 set forth above.

Claims 11 and 17 are rejected using the same rationale as applied to claim 4 set forth above.

Claim 12 and 18 are rejected using the same rationale as applied to claim 5 set forth above.

Claims 13 and 19 are rejected using the same rationale as applied to claim 6 set forth above.

Claim 20 is rejected using the same rationale as applied to claim 7 set forth above.

Regarding claim 21, Li, in view of Cao and Wang, teaches the method of claim 4, wherein the scene embedding corresponds to hierarchical relationships between the concepts co-existing in each image of the plurality of images associated with the scene (Li col. 4 lines 32-60: “The neural network (NN), which can be a convolutional neural network (CNN), is trained to identify features of images corresponding to one or more image identifiers … network 150 can include, but is not limited to, any one or more of the following network topologies, including a bus network, a star network, a ring network, a mesh network, a star-bus network, tree or hierarchical network, and the like”; Li Fig. 4; Li Fig. 5C).  

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SOO J SHIN whose telephone number is (571)272-9753.  The examiner can normally be reached on M-F; 10-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/Soo Shin/Primary Examiner, Art Unit 2667