DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is responsive to communications filed on 05/04/2020. Claims 1-15 are pending in the instant application. Claims 1, 8 and 15 are independent. An Office Action on the merits follows here below. 
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 01/07/2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 2, 6, 7, 8, 9, 13, 14 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Liao (US 20130268504 A1) in combination with Hu et al. (US 20190318405 A1).

Regarding Claim 1: Liao discloses a method for obtaining product training images (Refer to para [018]; “computing systems and computing processes used in methods of determining search engine rank for digital content generated by a category independent search. A category independent search is any search that is not confined to search a specific category of results but is able to provide all results that best match the query. A category independent search by a search engine may return search results that include non-category specific digital content as well as category specific digital content, such as images, videos, news, shopping, real-time, blogs, books, places, discussions, recipes, patents, calculator, stock, timelines, and other digital content that is closely related and directed toward a certain type of digital content so as to be in a category of digital content.”)  comprising: obtaining product images on each of product webpages in an e-commerce website (Refer to para [018-019]; “A category independent search by a search engine may return search results that include non-category specific digital content as well as category specific digital content, such as images, videos, news, shopping, real-time, blogs, books, places, discussions, recipes, patents, calculator, stock, timelines, and other digital content that is closely related and directed toward a certain type of digital content so as to be in a category of digital content. For example, in response to a query for soccer cleats, a search engine may provide digital content relevant to soccer cleats as well as images of soccer cleats, videos regarding soccer cleats, recent news on soccer cleats, specific soccer cleats that are available for sale (shopping), blogs discussing soccer cleats, books on soccer cleats, places that sell soccer cleats or where soccer cleats may be used, patents on soccer cleats, stock of companies that sell soccer cleats, among other digital content.”).

Liao does not expressly disclose determining a product feature vector of each product image on a product webpage.

Hu teaches “methods, systems, and programs for identifying products embedded within an image and, more particularly, methods, systems, and computer programs for identifying the brand and product identifier of the products within the image.”

More specifically, Hu teaches determining a product feature vector of each product image on a product webpage (Refer to para [081]; “an image processing service 711 is run to perform object detection and extraction of various image features. Extraction of various image features can include extraction, from the query image, of DNN features, recognition features, and additional features used for duplicate detection. Herein, DNN features refer to a vector produced by DNN, from a given image input to the DNN, to describe the content of the given image.”) for each product webpage, according to the product feature vector of each product image on the product webpage, dividing the product images on the product webpage into at least one image set of the product webpage (Refer to para [082-084]; “In image understanding 710, a next process, which may be subsequent to image processing service 711, can include text query inference 712. Here, a best text query may be generated to represent the input image, such as a “best representative query” (BRQ). A BRQ may identify a minimal and human-readable set of terms that can identify the key concept in the image. BRQs are used in a Bing® image search, where Bing® is a web search engine owned and operated by Microsoft Corporation®. Various APIs are available via a Bing® image search product. Text query inference 712 can operate on a caption associated with a web page. In various embodiments, web page text metadata associated with the query image is used to generate a text query to describe the image query. To accomplish this search, a technique known in the vision area as visual words is employed. This technique allows a system to quantize a dense feature vector into a set of discrete visual words, which are essentially a clustering of similar feature vectors into clusters, using a joint k-means algorithm. The visual words are then used to narrow down a set of candidates from billions to several millions.”) and determining a target image set of the product webpage with the largest number of product images from the at least one image set (Refer to para [085]; “After the matching process 720, a stage of multilevel ranking 725 is entered. In various embodiments, a Lambda-mart algorithm is used as a ranker of candidate index images. A Lambda-mart algorithm is a known algorithm that is a multivariate regression tree model with a ranking loss function. Various features may be used to train the ranker. These features can include multiple product quantization (PQ) features based on different training data, network structure, and loss functions. The features used in a PQ procedure can be derived from multiple DNN feature trainings using one or more of different DNN network structures, different loss functions, and different training data. The set of features can include category matching, color matching, matching face-related features. The set of features can also include text matching features, such as but not limited to, a BRQ query and matching a document stream.”) determining an average product feature vector of the target image set according to product feature vectors of the product images in the target image set of the product webpage (Refer to para [087]; “A high-dimensional vector may be decomposed into many low-dimensional sub-vectors to form a PQ vector. A calculation of a sub-vector with a cluster codebook is used to generate a nearest centroid of a number of elements, where a codebook is a set of codewords. After quantization is complete, distances between the query-image and result-image vectors are calculated. A Euclidean distance calculation can be conducted. However, in various embodiments, instead of using a conventional Euclidean distance calculation, a table lookup against a set of pre-calculated values is performed to accelerate the search process.”) classifying target image sets of the product webpages according to the average product feature vector to obtain at least one type of image set (Refer to para [088-090 and 094]; “For example, a target is defined to assign 25 bytes for each 100-dimensional DNN encoder from the index images. In a first step of a training algorithm, each 100-dimensional DNN encoder is divided into 25 four-dimensional vectors. In another step of the training algorithm, for each four-dimensional vector, a k-means clustering algorithm is run, and 256 codebooks are generated. For new 100-dimensional DNN encoders, each new 100-dimensional DNN encoder is divided into 25 four-dimensional vectors. For each four-dimensional vector, the nearest codebook identification (ID) is determined. Each DNN encoder can be represented by 25 codebook IDs of 25 bytes forming a PQ vector. In some example embodiments, conducting an image search includes receiving a query image followed by the generation of features from the query image. The features include information from text associated with the query image and a visual appearance of the image. Generating features from the query image can include applying the query image to a deep neural network to extract a set of deep neural network features from the query image. The deep neural network may be realized by a number of different types of deep neural networks. Further, a set of visual words representing the query image is generated from the generated features and the visual words of the query image are compared with visual words of index images. The visual words of the query image can be compared with visual words of index images of an image index database by comparing DNN vectors of index images with a DNN vector of the query image. Further, a set of candidate images is generated from the index images resulting from matching one or more visual words in the comparison. An object detection model 802 is a trained data model (or models) implementing a state-of-the-art framework for object detection that is configured to execute processing operations related to detection and classification of objects within an image. State-of-the-art object detection networks depend on regional proposed algorithms to hypothesize object locations, object bounds and the nature of objects at positions within image content. An example object detection model is an underlying detection model for visual search processing that enhances processing efficiency of visual search processing by utilizing categorical object classifications to identify contextually relevant content for a detected object. Objects may relate to any visible content including: physical objects, and nouns/pronouns such as people, animals, places, things, languages, etc. As an example, the object detection data model 802 may be a trained neural network model (e.g., artificial neural network (ANN), convolutional neural network (CNN), DNN) or other types of adaptive or deep machine-learning processing. Methods and processing for building, training and adapting deep learning models including building of feature maps.”) and generating the product training images according to the at least one type of image set (Refer to para [111-114]; “an example object detection model 802 may be configured to propagate detected information including layers of output feature maps for multi-modal ranking training of a ranker utilized for visual search processing. In one example, the object detection model 802 may be applied to both a query image as well as indexed images to extract both object categories and feature vectors that represents the object in the detected bounding box. Feature vectors from query-side image content as well as indexed image content may be fed into ranker learning to tailor a visual search ranker for object classification evaluation. This may enable visual search processing to identify and output visual search results 806 that are more contextually relevant to a detected object as well as provide richer representations of image content (as compared with general image classification processing), among other technical advantages. In present examples, since metadata is stored at different (object classification) levels of hierarchy, object detection category matching can be applied to different levels of classification during ranking processing. For example, categorical object classification may be applied as BRQ to match page text and metadata. Alternatively, categorical object classification may be used as a filter set and L1/L2 ranking may be applied to further filter out semantically irrelevant documents and enhancing ranking results relevance. Further, candidate for visual search results 806 may be ranked not only based on relevance to a detected object but also relevance to the image content as a whole. Preliminary empirical research indicates example ranking processing shows greater gains in accuracy and relevance (e.g., as measured by Discounted Cumulative Gain (DCG) or the like). The visual search model 804 is configured to output a ranked listing of visual search (image) results 806. Example visual search results 806 comprise one or more visually similar images for a detected object, where visually similar images may be surfaced as visual search results based on ranking processing executed by the visual search model 804. Any number of results may be selected for output from the visual search results 806, for example, based on application/service processing, available display space, etc. Image content in visual search results 806 is contextually relevant for one or more detected objects within example image content. Visual search results 806 may vary depending on detected objects within image content as well as determined intent associated with the image content (e.g., from a query, user-signal data, device signal data). For instance, if the user is looking for outfit inspiration in a search engine, processing described may be utilized to predict the search/shopping intent of users, automatically detect several objects of user interests and marks them so users don't have to manipulate a bounding box associated with the object as in existing techniques, execute further queries, etc. Furthermore, in some instances, the visual search model 804 may be further configured to generate a representation of a detected object (or objects). In other examples, the visual search model 804 is configured to propagate visual search results 806 and other associated data to an example application/service (e.g., productivity service) for generation of a representation of one or more detected objects through a user interface of the application/service. A representation of a detected object may comprise one or more of: visual identification/tagging of a detected object (e.g., categorical classification(s) for a detected object), presentation of contextually relevant visual search results or suggestions for a detected object and/or surfacing of an example bounding box for a detected object, among other examples.”).

Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to modify Liao by adding the “methods, systems, and computer programs directed to identifying the brand and model of products embedded within an image” as rejected above by Hu.

The suggestion/motivation for combining the teachings of Liao and Hu would have been in order to enhance the processor for “analyzing the image to determine the location within the image of one or more products further includes training a machine-learning program to generate feature maps for feature extraction, object detection, and object classification.” (at para [130], Hu).

Therefore, it would have been obvious to one of ordinary skill in the art to combine the teachings of Liao and Hu in order to obtain the specified claimed elements of Claim 1. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to the claim in question.

Regarding Claim 2: Hu teaches for each product webpage, performing a product main body detection on each product image on the product webpage to obtain a product main body area of each product image (Refer to para [117-120]; “The product recognition server 912 may be accessed for recognizing products within images, as discussed above with reference to FIGS. 1-8. Further, the shopping assistant 918 interacts with the product-recognition server 912 to provide shopping options for the identified products. FIG. 10 is a flowchart of a method for identifying the brand and model of products embedded within an image, according to some example embodiments. While the various operations in this flowchart are presented and described sequentially, one of ordinary skill will appreciate that some or all of the operations may be executed in a different order, be combined or omitted, or be executed in parallel. Operation 1002 is for receiving, via a GUI, a selection of an image (e.g., image 104 in FIG. 1). From operation 1002, the method flows to operation 1004 for analyzing, by one or more processors, the image to determine a location within the image of one or more products. See for example, bounding box 106 in FIG. 1 identifying the location of a pair of shoes. Operation 1006 is performed for each product in the image, where the one or more processors determine a unique identification of the product that includes a manufacturer of the product and a model identifier.”) and according to a preset product model, performing a feature extraction on the product main body area of each product image to obtain the product feature vector of each product image (Refer to para [081]; “As a first procedure in the query-image understanding process 710, an image processing service 711 is run to perform object detection and extraction of various image features. Extraction of various image features can include extraction, from the query image, of DNN features, recognition features, and additional features used for duplicate detection. Herein, DNN features refer to a vector produced by DNN, from a given image input to the DNN, to describe the content of the given image.”).

Regarding Claim 6: Hu teaches the product images comprise a product main image (Refer to para [026]; “After the user selects the image, the fashion-finder user interface 102 is presented, according to some example embodiments. Initially, image 104 is presented and the product-recognition program then analyzes the image to identify embedded products. In the example of FIG. 1, four products have been identified in image 104: a jacket, pants, a purse, and a pair of shoes.”).

Regarding Claim 7: Hu teaches the product images comprise an image of a purchased product (Refer to para [027 and 028]; “In some example embodiments, a bounding box 106 is placed around each identified item as well as a product description 108 (e.g., Brand A leather jacket). More details are provided below with reference to FIG. 8 regarding the calculation of the bounding boxes. Further, an information message 110 indicates that four products have been found and prompts the user to select one of the identified products for obtaining additional information, such as product details and buying information. After the user selects one of the products (e.g., the purse), a detailed shopping window 112 shows information about the selected item and buying options 114. The detailed shopping window 112 includes an image 116 of the item, a product identifier (e.g., Brand D, Model A), a description of the item (e.g., leather purse from Brand D, white leather and gold accents with over-the-shoulder strap), and buying options 112. The product identifier uniquely defines the product among all identifiable products. In some example embodiments, the product identifier includes, at least, a manufacturer identifier, and a model identifier. The manufacturer identifier uniquely identifies the maker of the product, and the model identifier uniquely identifies the product from all other products manufactured by the same manufacturer. In other example embodiments, other product identifiers may be utilized, such as a barcode.”).

Regarding Claim 8: Liao discloses an apparatus for obtaining product training images (Refer to para [018]; “computing systems and computing processes used in methods of determining search engine rank for digital content generated by a category independent search. A category independent search is any search that is not confined to search a specific category of results but is able to provide all results that best match the query. A category independent search by a search engine may return search results that include non-category specific digital content as well as category specific digital content, such as images, videos, news, shopping, real-time, blogs, books, places, discussions, recipes, patents, calculator, stock, timelines, and other digital content that is closely related and directed toward a certain type of digital content so as to be in a category of digital content.”) comprising: one or more processors (Refer to para [089]; “a typical data processing system generally includes one or more of a system unit housing, a video display device, a memory such as volatile and non-volatile memory, processors such as microprocessors and digital signal processors, computational entities such as operating systems, drivers, graphical user interfaces, and applications programs, one or more interaction devices, such as a touch pad or screen, and/or control systems including feedback loops and control motors (e.g., feedback for sensing position and/or velocity; control motors for moving and/or adjusting components and/or quantities).”) a memory storing instructions executable by the one or more processors (Refer to para [081-082]; “The computer program product may be located on a computer memory device, which may be removable or integrated with the computing system. Some embodiments described herein include a computing system capable of performing the methods described herein. As such, the computing system may include a memory device that has the computer-executable instructions for performing the method.”) wherein the one or more processors are configured to: obtain product images on each of product webpages in an e-commerce website (Refer to para [018-019]; “A category independent search by a search engine may return search results that include non-category specific digital content as well as category specific digital content, such as images, videos, news, shopping, real-time, blogs, books, places, discussions, recipes, patents, calculator, stock, timelines, and other digital content that is closely related and directed toward a certain type of digital content so as to be in a category of digital content. For example, in response to a query for soccer cleats, a search engine may provide digital content relevant to soccer cleats as well as images of soccer cleats, videos regarding soccer cleats, recent news on soccer cleats, specific soccer cleats that are available for sale (shopping), blogs discussing soccer cleats, books on soccer cleats, places that sell soccer cleats or where soccer cleats may be used, patents on soccer cleats, stock of companies that sell soccer cleats, among other digital content.”).

Liao does not expressly disclose determining a product feature vector of each product image on a product webpage.

Hu teaches “methods, systems, and programs for identifying products embedded within an image and, more particularly, methods, systems, and computer programs for identifying the brand and product identifier of the products within the image.”

More specifically, Hu teaches determining a product feature vector of each product image on a product webpage (Refer to para [081]; “an image processing service 711 is run to perform object detection and extraction of various image features. Extraction of various image features can include extraction, from the query image, of DNN features, recognition features, and additional features used for duplicate detection. Herein, DNN features refer to a vector produced by DNN, from a given image input to the DNN, to describe the content of the given image.”) for each product webpage, according to the product feature vector of each product image on the product webpage, divide the product images on the product webpage into at least one image set of the product webpage (Refer to para [082-084]; “In image understanding 710, a next process, which may be subsequent to image processing service 711, can include text query inference 712. Here, a best text query may be generated to represent the input image, such as a “best representative query” (BRQ). A BRQ may identify a minimal and human-readable set of terms that can identify the key concept in the image. BRQs are used in a Bing® image search, where Bing® is a web search engine owned and operated by Microsoft Corporation®. Various APIs are available via a Bing® image search product. Text query inference 712 can operate on a caption associated with a web page. In various embodiments, web page text metadata associated with the query image is used to generate a text query to describe the image query. To accomplish this search, a technique known in the vision area as visual words is employed. This technique allows a system to quantize a dense feature vector into a set of discrete visual words, which are essentially a clustering of similar feature vectors into clusters, using a joint k-means algorithm. The visual words are then used to narrow down a set of candidates from billions to several millions.”) and determine a target image set of the product webpage with the largest number of product images from the at least one image set (Refer to para [085]; “After the matching process 720, a stage of multilevel ranking 725 is entered. In various embodiments, a Lambda-mart algorithm is used as a ranker of candidate index images. A Lambda-mart algorithm is a known algorithm that is a multivariate regression tree model with a ranking loss function. Various features may be used to train the ranker. These features can include multiple product quantization (PQ) features based on different training data, network structure, and loss functions. The features used in a PQ procedure can be derived from multiple DNN feature trainings using one or more of different DNN network structures, different loss functions, and different training data. The set of features can include category matching, color matching, matching face-related features. The set of features can also include text matching features, such as but not limited to, a BRQ query and matching a document stream.”) determine an average product feature vector of the target image set according to product feature vectors of the product images in the target image set of the product webpage (Refer to para [087]; “A high-dimensional vector may be decomposed into many low-dimensional sub-vectors to form a PQ vector. A calculation of a sub-vector with a cluster codebook is used to generate a nearest centroid of a number of elements, where a codebook is a set of codewords. After quantization is complete, distances between the query-image and result-image vectors are calculated. A Euclidean distance calculation can be conducted. However, in various embodiments, instead of using a conventional Euclidean distance calculation, a table lookup against a set of pre-calculated values is performed to accelerate the search process.”) classify target image sets of the product webpages according to the average product feature vector to obtain at least one type of image set (Refer to para [088-090 and 094]; “For example, a target is defined to assign 25 bytes for each 100-dimensional DNN encoder from the index images. In a first step of a training algorithm, each 100-dimensional DNN encoder is divided into 25 four-dimensional vectors. In another step of the training algorithm, for each four-dimensional vector, a k-means clustering algorithm is run, and 256 codebooks are generated. For new 100-dimensional DNN encoders, each new 100-dimensional DNN encoder is divided into 25 four-dimensional vectors. For each four-dimensional vector, the nearest codebook identification (ID) is determined. Each DNN encoder can be represented by 25 codebook IDs of 25 bytes forming a PQ vector. In some example embodiments, conducting an image search includes receiving a query image followed by the generation of features from the query image. The features include information from text associated with the query image and a visual appearance of the image. Generating features from the query image can include applying the query image to a deep neural network to extract a set of deep neural network features from the query image. The deep neural network may be realized by a number of different types of deep neural networks. Further, a set of visual words representing the query image is generated from the generated features and the visual words of the query image are compared with visual words of index images. The visual words of the query image can be compared with visual words of index images of an image index database by comparing DNN vectors of index images with a DNN vector of the query image. Further, a set of candidate images is generated from the index images resulting from matching one or more visual words in the comparison. An object detection model 802 is a trained data model (or models) implementing a state-of-the-art framework for object detection that is configured to execute processing operations related to detection and classification of objects within an image. State-of-the-art object detection networks depend on regional proposed algorithms to hypothesize object locations, object bounds and the nature of objects at positions within image content. An example object detection model is an underlying detection model for visual search processing that enhances processing efficiency of visual search processing by utilizing categorical object classifications to identify contextually relevant content for a detected object. Objects may relate to any visible content including: physical objects, and nouns/pronouns such as people, animals, places, things, languages, etc. As an example, the object detection data model 802 may be a trained neural network model (e.g., artificial neural network (ANN), convolutional neural network (CNN), DNN) or other types of adaptive or deep machine-learning processing. Methods and processing for building, training and adapting deep learning models including building of feature maps.”) and generate the product training images according to the at least one type of image set (Refer to para [111-114]; “an example object detection model 802 may be configured to propagate detected information including layers of output feature maps for multi-modal ranking training of a ranker utilized for visual search processing. In one example, the object detection model 802 may be applied to both a query image as well as indexed images to extract both object categories and feature vectors that represents the object in the detected bounding box. Feature vectors from query-side image content as well as indexed image content may be fed into ranker learning to tailor a visual search ranker for object classification evaluation. This may enable visual search processing to identify and output visual search results 806 that are more contextually relevant to a detected object as well as provide richer representations of image content (as compared with general image classification processing), among other technical advantages. In present examples, since metadata is stored at different (object classification) levels of hierarchy, object detection category matching can be applied to different levels of classification during ranking processing. For example, categorical object classification may be applied as BRQ to match page text and metadata. Alternatively, categorical object classification may be used as a filter set and L1/L2 ranking may be applied to further filter out semantically irrelevant documents and enhancing ranking results relevance. Further, candidate for visual search results 806 may be ranked not only based on relevance to a detected object but also relevance to the image content as a whole. Preliminary empirical research indicates example ranking processing shows greater gains in accuracy and relevance (e.g., as measured by Discounted Cumulative Gain (DCG) or the like). The visual search model 804 is configured to output a ranked listing of visual search (image) results 806. Example visual search results 806 comprise one or more visually similar images for a detected object, where visually similar images may be surfaced as visual search results based on ranking processing executed by the visual search model 804. Any number of results may be selected for output from the visual search results 806, for example, based on application/service processing, available display space, etc. Image content in visual search results 806 is contextually relevant for one or more detected objects within example image content. Visual search results 806 may vary depending on detected objects within image content as well as determined intent associated with the image content (e.g., from a query, user-signal data, device signal data). For instance, if the user is looking for outfit inspiration in a search engine, processing described may be utilized to predict the search/shopping intent of users, automatically detect several objects of user interests and marks them so users don't have to manipulate a bounding box associated with the object as in existing techniques, execute further queries, etc. Furthermore, in some instances, the visual search model 804 may be further configured to generate a representation of a detected object (or objects). In other examples, the visual search model 804 is configured to propagate visual search results 806 and other associated data to an example application/service (e.g., productivity service) for generation of a representation of one or more detected objects through a user interface of the application/service. A representation of a detected object may comprise one or more of: visual identification/tagging of a detected object (e.g., categorical classification(s) for a detected object), presentation of contextually relevant visual search results or suggestions for a detected object and/or surfacing of an example bounding box for a detected object, among other examples.”).

Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to modify Liao by adding the “methods, systems, and computer programs directed to identifying the brand and model of products embedded within an image” as rejected above by Hu.

The suggestion/motivation for combining the teachings of Liao and Hu would have been in order to enhance the processor for “analyzing the image to determine the location within the image of one or more products further includes training a machine-learning program to generate feature maps for feature extraction, object detection, and object classification.” (at para [130], Hu).

Therefore, it would have been obvious to one of ordinary skill in the art to combine the teachings of Liao and Hu in order to obtain the specified claimed elements of Claim 8. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to the claim in question.

Regarding Claim 13: Hu teaches the product images comprise a product main image (Refer to para [026]; “After the user selects the image, the fashion-finder user interface 102 is presented, according to some example embodiments. Initially, image 104 is presented and the product-recognition program then analyzes the image to identify embedded products. In the example of FIG. 1, four products have been identified in image 104: a jacket, pants, a purse, and a pair of shoes.”).

Regarding Claim 14: Hu teaches the product images comprise an image of a purchased product (Refer to para [027 and 028]; “In some example embodiments, a bounding box 106 is placed around each identified item as well as a product description 108 (e.g., Brand A leather jacket). More details are provided below with reference to FIG. 8 regarding the calculation of the bounding boxes. Further, an information message 110 indicates that four products have been found and prompts the user to select one of the identified products for obtaining additional information, such as product details and buying information. After the user selects one of the products (e.g., the purse), a detailed shopping window 112 shows information about the selected item and buying options 114. The detailed shopping window 112 includes an image 116 of the item, a product identifier (e.g., Brand D, Model A), a description of the item (e.g., leather purse from Brand D, white leather and gold accents with over-the-shoulder strap), and buying options 112. The product identifier uniquely defines the product among all identifiable products. In some example embodiments, the product identifier includes, at least, a manufacturer identifier, and a model identifier. The manufacturer identifier uniquely identifies the maker of the product, and the model identifier uniquely identifies the product from all other products manufactured by the same manufacturer. In other example embodiments, other product identifiers may be utilized, such as a barcode.”).

Regarding Claim 15: Liao discloses a non-transitory computer-readable storage medium having a computer program stored thereon (Refer to para [006]; “] A computer readable medium encoded with a computer program having computer-executable instructions for causing a computing system to determine search engine rank for digital content is disclosed. The operations may include performing a search for digital content on a network using a search term to obtain search results”) wherein when the program is executed by a processor (Refer to para [081-082]; “The computer program product may be located on a computer memory device, which may be removable or integrated with the computing system. Some embodiments described herein include a computing system capable of performing the methods described herein. As such, the computing system may include a memory device that has the computer-executable instructions for performing the method.”)  a method for obtaining product training images is implemented (Refer to para [018]; “computing systems and computing processes used in methods of determining search engine rank for digital content generated by a category independent search. A category independent search is any search that is not confined to search a specific category of results but is able to provide all results that best match the query. A category independent search by a search engine may return search results that include non-category specific digital content as well as category specific digital content, such as images, videos, news, shopping, real-time, blogs, books, places, discussions, recipes, patents, calculator, stock, timelines, and other digital content that is closely related and directed toward a certain type of digital content so as to be in a category of digital content.”) and the method comprises: obtaining product images on each of product webpages in an e-commerce website (Refer to para [018-019]; “A category independent search by a search engine may return search results that include non-category specific digital content as well as category specific digital content, such as images, videos, news, shopping, real-time, blogs, books, places, discussions, recipes, patents, calculator, stock, timelines, and other digital content that is closely related and directed toward a certain type of digital content so as to be in a category of digital content. For example, in response to a query for soccer cleats, a search engine may provide digital content relevant to soccer cleats as well as images of soccer cleats, videos regarding soccer cleats, recent news on soccer cleats, specific soccer cleats that are available for sale (shopping), blogs discussing soccer cleats, books on soccer cleats, places that sell soccer cleats or where soccer cleats may be used, patents on soccer cleats, stock of companies that sell soccer cleats, among other digital content.”).

Liao does not expressly disclose determining a product feature vector of each product image on a product webpage.

Hu teaches “methods, systems, and programs for identifying products embedded within an image and, more particularly, methods, systems, and computer programs for identifying the brand and product identifier of the products within the image.”

More specifically, Hu teaches determining a product feature vector of each product image on a product webpage (Refer to para [081]; “an image processing service 711 is run to perform object detection and extraction of various image features. Extraction of various image features can include extraction, from the query image, of DNN features, recognition features, and additional features used for duplicate detection. Herein, DNN features refer to a vector produced by DNN, from a given image input to the DNN, to describe the content of the given image.”) for each product webpage, according to the product feature vector of each product image on the product webpage, dividing the product images on the product webpage into at least one image set of the product webpage, (Refer to para [082-084]; “In image understanding 710, a next process, which may be subsequent to image processing service 711, can include text query inference 712. Here, a best text query may be generated to represent the input image, such as a “best representative query” (BRQ). A BRQ may identify a minimal and human-readable set of terms that can identify the key concept in the image. BRQs are used in a Bing® image search, where Bing® is a web search engine owned and operated by Microsoft Corporation®. Various APIs are available via a Bing® image search product. Text query inference 712 can operate on a caption associated with a web page. In various embodiments, web page text metadata associated with the query image is used to generate a text query to describe the image query. To accomplish this search, a technique known in the vision area as visual words is employed. This technique allows a system to quantize a dense feature vector into a set of discrete visual words, which are essentially a clustering of similar feature vectors into clusters, using a joint k-means algorithm. The visual words are then used to narrow down a set of candidates from billions to several millions.”) and determining a target image set of the product webpage with the largest number of product images from the at least one image set (Refer to para [085]; “After the matching process 720, a stage of multilevel ranking 725 is entered. In various embodiments, a Lambda-mart algorithm is used as a ranker of candidate index images. A Lambda-mart algorithm is a known algorithm that is a multivariate regression tree model with a ranking loss function. Various features may be used to train the ranker. These features can include multiple product quantization (PQ) features based on different training data, network structure, and loss functions. The features used in a PQ procedure can be derived from multiple DNN feature trainings using one or more of different DNN network structures, different loss functions, and different training data. The set of features can include category matching, color matching, matching face-related features. The set of features can also include text matching features, such as but not limited to, a BRQ query and matching a document stream.”) determining an average product feature vector of the target image set according to product feature vectors of the product images in the target image set of the product webpage (Refer to para [087]; “A high-dimensional vector may be decomposed into many low-dimensional sub-vectors to form a PQ vector. A calculation of a sub-vector with a cluster codebook is used to generate a nearest centroid of a number of elements, where a codebook is a set of codewords. After quantization is complete, distances between the query-image and result-image vectors are calculated. A Euclidean distance calculation can be conducted. However, in various embodiments, instead of using a conventional Euclidean distance calculation, a table lookup against a set of pre-calculated values is performed to accelerate the search process.”) classifying target image sets of the product webpages according to the average product feature vector to obtain at least one type of image set (Refer to para [088-090 and 094]; “For example, a target is defined to assign 25 bytes for each 100-dimensional DNN encoder from the index images. In a first step of a training algorithm, each 100-dimensional DNN encoder is divided into 25 four-dimensional vectors. In another step of the training algorithm, for each four-dimensional vector, a k-means clustering algorithm is run, and 256 codebooks are generated. For new 100-dimensional DNN encoders, each new 100-dimensional DNN encoder is divided into 25 four-dimensional vectors. For each four-dimensional vector, the nearest codebook identification (ID) is determined. Each DNN encoder can be represented by 25 codebook IDs of 25 bytes forming a PQ vector. In some example embodiments, conducting an image search includes receiving a query image followed by the generation of features from the query image. The features include information from text associated with the query image and a visual appearance of the image. Generating features from the query image can include applying the query image to a deep neural network to extract a set of deep neural network features from the query image. The deep neural network may be realized by a number of different types of deep neural networks. Further, a set of visual words representing the query image is generated from the generated features and the visual words of the query image are compared with visual words of index images. The visual words of the query image can be compared with visual words of index images of an image index database by comparing DNN vectors of index images with a DNN vector of the query image. Further, a set of candidate images is generated from the index images resulting from matching one or more visual words in the comparison. An object detection model 802 is a trained data model (or models) implementing a state-of-the-art framework for object detection that is configured to execute processing operations related to detection and classification of objects within an image. State-of-the-art object detection networks depend on regional proposed algorithms to hypothesize object locations, object bounds and the nature of objects at positions within image content. An example object detection model is an underlying detection model for visual search processing that enhances processing efficiency of visual search processing by utilizing categorical object classifications to identify contextually relevant content for a detected object. Objects may relate to any visible content including: physical objects, and nouns/pronouns such as people, animals, places, things, languages, etc. As an example, the object detection data model 802 may be a trained neural network model (e.g., artificial neural network (ANN), convolutional neural network (CNN), DNN) or other types of adaptive or deep machine-learning processing. Methods and processing for building, training and adapting deep learning models including building of feature maps.”) and generating the product training images according to the at least one type of image set (Refer to para [111-114]; “an example object detection model 802 may be configured to propagate detected information including layers of output feature maps for multi-modal ranking training of a ranker utilized for visual search processing. In one example, the object detection model 802 may be applied to both a query image as well as indexed images to extract both object categories and feature vectors that represents the object in the detected bounding box. Feature vectors from query-side image content as well as indexed image content may be fed into ranker learning to tailor a visual search ranker for object classification evaluation. This may enable visual search processing to identify and output visual search results 806 that are more contextually relevant to a detected object as well as provide richer representations of image content (as compared with general image classification processing), among other technical advantages. In present examples, since metadata is stored at different (object classification) levels of hierarchy, object detection category matching can be applied to different levels of classification during ranking processing. For example, categorical object classification may be applied as BRQ to match page text and metadata. Alternatively, categorical object classification may be used as a filter set and L1/L2 ranking may be applied to further filter out semantically irrelevant documents and enhancing ranking results relevance. Further, candidate for visual search results 806 may be ranked not only based on relevance to a detected object but also relevance to the image content as a whole. Preliminary empirical research indicates example ranking processing shows greater gains in accuracy and relevance (e.g., as measured by Discounted Cumulative Gain (DCG) or the like). The visual search model 804 is configured to output a ranked listing of visual search (image) results 806. Example visual search results 806 comprise one or more visually similar images for a detected object, where visually similar images may be surfaced as visual search results based on ranking processing executed by the visual search model 804. Any number of results may be selected for output from the visual search results 806, for example, based on application/service processing, available display space, etc. Image content in visual search results 806 is contextually relevant for one or more detected objects within example image content. Visual search results 806 may vary depending on detected objects within image content as well as determined intent associated with the image content (e.g., from a query, user-signal data, device signal data). For instance, if the user is looking for outfit inspiration in a search engine, processing described may be utilized to predict the search/shopping intent of users, automatically detect several objects of user interests and marks them so users don't have to manipulate a bounding box associated with the object as in existing techniques, execute further queries, etc. Furthermore, in some instances, the visual search model 804 may be further configured to generate a representation of a detected object (or objects). In other examples, the visual search model 804 is configured to propagate visual search results 806 and other associated data to an example application/service (e.g., productivity service) for generation of a representation of one or more detected objects through a user interface of the application/service. A representation of a detected object may comprise one or more of: visual identification/tagging of a detected object (e.g., categorical classification(s) for a detected object), presentation of contextually relevant visual search results or suggestions for a detected object and/or surfacing of an example bounding box for a detected object, among other examples.”).

Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to modify Liao by adding the “methods, systems, and computer programs directed to identifying the brand and model of products embedded within an image” as rejected above by Hu.

The suggestion/motivation for combining the teachings of Liao and Hu would have been in order to enhance the processor for “analyzing the image to determine the location within the image of one or more products further includes training a machine-learning program to generate feature maps for feature extraction, object detection, and object classification.” (at para [130], Hu).

Therefore, it would have been obvious to one of ordinary skill in the art to combine the teachings of Liao and Hu in order to obtain the specified claimed elements of Claim 15. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to the claim in question.
Allowable Subject Matter
Claims 3-5 and 10-12 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
US 20220027609 A1
US 20210117484 A1
US 20210279514 A1
US 20130268504 A1
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MIA M THOMAS whose telephone number is (571)270-1583. The examiner can normally be reached M-Th 8:30am-4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Edward (Ed) Urban can be reached on 572-272-7899. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MIA M. THOMAS
Primary Examiner
Art Unit 2665



/MIA M THOMAS/Primary Examiner
Art Unit 2665