DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to the Amendment filed on 8/19/2022.
Claims 1-20 are pending. Claims 1, 9, 10, 18 have been amended. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 6-7, 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Trinh et al. (US 20210201373 A1, hereinafter Trinh) in view of Sacheti et al. (US 2019 0258895 A1, hereinafter Sacheti)
Regarding Claim 1, Trinh teaches a non-transitory computer readable medium for presenting clustered images, the non-transitory computer readable medium comprising instructions that, when executed by at least one processor, cause a computing device to (Trinh, Paragraph [0078], code embodied on a machine-readable medium) or hardware modules. A "hardware module" is a tangible unit capable of performing certain operations and may be configured or arranged in a certain physical manner one or more hardware modules of a computer system ( e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations):
present, via a graphical user interface at a user client device (Trinh, Fig. 1, Element 110 Client Device; Paragraph [0027], the client device 110 may comprise a display module to display information (e.g., in the form of user interfaces), curated images displaying a product (Trinh, Fig. 2, Element 200, Paragraph [0027], the networked system 102 comprises a network-based marketplace (also referred to as "online marketplace") that responds to requests for product listings, publishes publications comprising item listings of products);
extract, utilizing a machine learning model, feature vectors from the curated images (Trinh, Paragraph [0016], [0022], the input layer of a machine learning model used by the listing generating system accesses the input images and generates features that represent various aspects of the input images; the machine learning models, the listing generating system can abstract the image into a feature vector (hereinafter also an “embedding” or a “vector”) associated with the image);
extract, utilizing the machine learning model, feature vectors from a plurality of user- submitted images displaying the product (Trinh, Paragraph [0017], [0023], a selection, by the user (e.g., the seller), of the one or more user interface elements of the user interface allows the user to edit or update the automatically generated listing with additional images or additional textual description. using features in deep learning models to analyze and understand the input provided by the user);
[[ determine a sub-set of the user-submitted images for each curated image that show the product in a similar view to a given curated image by ]] comparing the feature vectors of the plurality of user-submitted images with the feature vectors of the curated images (Trinh, Paragraph [0075], matching module 304 matches the attribute vector and a descriptor vector of the two or more descriptor vectors using a bipartite matching algorithm. The bipartite matching algorithm performs pairwise comparisons among two or more attribute vectors associated with the two or more items, and the two or more of description vectors);
[[ determine an additional sub-set of user-submitted images that show the product in a view not included in the curated images by ]] comparing the feature vectors of the plurality of user-submitted images with the feature vectors of the curated images(Trinh, Paragraph [0075], matching module 304 matches the attribute vector and a descriptor vector of the two or more descriptor vectors using a bipartite matching algorithm. The bipartite matching algorithm performs pairwise comparisons among two or more attribute vectors associated with the two or more items, and the two or more of description vectors); 
receive, via the graphical user interface, a user selection of a curated image (Trinh, Paragraph [0025], a selection, by the user (e.g., the seller), of the one or more user interface elements of the user interface allows the user to edit or update the automatically generated listing with additional images or additional textual description)
present, via the graphical user interface and based on the user selection of the curated image, [[ the sub-set of ]] the user-submitted images that show the product in a view similar to the curated image (Trinh, Paragraph [0047], At operation 216, the listing generating system 300 automatically generates item listings based on the automatically identified matches between image clusters and titles. The listing generating system 300 may include the automatically identified attribute values as keywords in the item listings to describe various aspects of the items for sale.);
receive, via the graphical user interface, a user selection of an additional views element; [[ and present, via the graphical user interface, and based on the user selection of the additional views element, one or more of the user-submitted images in the additional sub-set of user-submitted images show the product in a view not included in the curated images.]] 
	Trinh does not explicitly disclose determine a sub-set of the user-submitted images for each curated image that show the product in a similar view to a given curated image  [[ by comparing the feature vectors of the plurality of user-submitted images with the feature vectors of the curated images;]]
determine an additional sub-set of user-submitted images that show the product in a view not included in the curated images by [[ comparing the feature vectors of the plurality of user-submitted images with the feature vectors of the curated images ]];
receive, via the graphical user interface, a user selection of an additional views element;
and present, via the graphical user interface, and based on the user selection of the additional views element, one or more of the user-submitted images in the additional sub-set of user-submitted images show the product in a view not included in the curated images.
	However, Sacheti teaches determine a sub-set of the user-submitted images for each curated image that show the product in a similar view to a given curated image (Sacheti, Paragraph [0026], The visual search model 104 may comprise access to one or more visual indexes (e.g., databases) that are utilized to match image content (or portions <read on sub-set of images> thereof) to existing image content) by comparing the feature vectors of the plurality of user-submitted images with the feature vectors of the curated images (Sacheti, Paragraph [0025]-[0026], 
“the object detection model 102 is applied to both the image content ( e.g., query image) and index images (associated with one or more indices of the object
detection model 102) to extract both the object categories (i.e. categorical object classifications) and feature vectors that represents the object in the detected bounding box”. “The visual search model 104 may comprise access to one or more visual indexes (e.g., databases) that are utilized to match image content (or portions thereof) to existing image content”);
determine an additional sub-set of user-submitted images that show the product in a view not included in the curated images (Sacheti, Paragraph [0050],
method 200 proceeds to processing operation 218, where additional representation(s) for detected object(s) <read on additional sub-set of images> are surfaced processing operation 218 may comprise surfacing of an additional representation for the detected object that corresponds with the received selection. Fig. 2, Surface Additional Representation for Detected Object(s))
by comparing the feature vectors of the plurality of user-submitted images with the feature vectors of the curated images (Sacheti, Paragraph [0012], Compared with existing solutions for visual search ranker training, features extracted by an exemplary object detection model (or models) contain more accurate shape and location information of the object);
and present, via the graphical user interface, one or more of the user-submitted images (Sacheti, Paragraph [0015], , a user may have uploaded image content for searching through a search engine application/service) in the additional sub-set of user-submitted images show the product in a view not included in the curated images
(Sacheti, Paragraph [0015], in other instances, access to image recognition processing may not rely on an active usage of the image content by a user)
Sacheti and Trinh are analogous since both of them are dealing with using neural network to train and display image data. Trinh provided a way of tracking and extraction the feature vector from product image and use the training model to adjust and fine tune the image data by using neural network. Sacheti provided a way when dealing with training image data by specifying region to include and/or exclude to incrementally adjust the image data when using neural network. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate region of interest taught by Sacheti into modified invention of Trinh such that during the modeling training data when using neural network, system will be able to dynamically adjust the training data by adjust the modeling region in order to create more reliable and well-trained machine learning or prediction model. 

Regarding Claim 2, the combination of Trinh and Sacheti teaches the invention in claim 1.
The combination further teaches further comprising instructions that, when executed by the at least one processor, cause the computing device to extract the feature vectors from the curated images by generating object descriptors for the product in the curated images (Trinh, Paragraph [0019], [0023], The two or more images depict two or more items, and the two or more descriptions pertain to the two or more items. The listing generating system matches one or more images of the two or more images to a description of the two or more descriptions. Finding similarities among images of items, and among the images and descriptions of the items is a complex problem, especially in a system where there could be large number of images and descriptions. In machine learning, categorical features are those features that may have a value from a finite set of possible values).

Regarding Claim 3, the combination of Trinh and Sacheti teaches the invention in claim 1.
The combination further teaches further comprising instructions that, when executed by the at least one processor, cause the computing device to extract the feature vectors from the plurality of user-submitted images displaying the product by (Trinh, Paragraph [0075], matching module 304 matches the attribute vector and a descriptor vector of the two or more descriptor vectors using a bipartite matching algorithm. The bipartite matching algorithm performs pairwise comparisons among two or more attribute vectors associated with the two or more items, and the two or more of description vectors):
Trinh does not explicitly disclose but Sacheti teaches generating object bounding boxes and labels for objects in the plurality of user-submitted images, wherein the object bounding boxes comprise product bounding boxes and product labels corresponding to the product (Sacheti, Paragraph [0025], the object detection model 102 is applied to both the image content ( e.g., query image) and index images (associated with one or more indices of the object detection model 102) to extract both the object categories (i.e. categorical object classifications) and feature vectors that represents the object in the detected bounding box).
cropping the product bounding boxes; and extracting feature vectors from the product bounding boxes (Sacheti, Paragraph [0028], These newly generated object images are used to enhance precision and relevance when the search query is also an object, especially in instances where portions of image content ( e.g. regions of image content that may be associated with detected objects) are being matched with cropped visually similar image content).

As explained in rejection of claim 1, the obviousness for combining of excluding portion image of Sacheti into Trinh is provided above.

Regarding Claim 4, the combination of Trinh and Sacheti teaches the invention in claim 3.
The combination further teaches wherein generating the object bounding boxes and the labels comprises utilizing a region proposal neural network to identify the object bounding boxes and the labels (Sacheti, Paragraph [0016], [0029], data propagated by the object detection model is used to enhance content retrieval and filtering of image content through visual search processing. Categorical object classifications, provided by neural network image classifiers. Object detection provides not only more accurate localization of an exemplary bounding box for a detected object. Context data may be in the
form of metadata that is directly associated with the image content (properties, tagging, fields, storage location (e.g., folders, labels)), capture of the image content).
As explained in rejection of claim 1, the obviousness for combining of excluding portion image of Sacheti into Trinh is provided above.

Regarding Claim 6, the combination of Trinh and Sacheti teaches the invention in claim 1.
The combination further teaches instructions that, when executed by the at least one processor, cause the computing device to [[ determine the sub-set of the user-submitted images that are similar to the curated image ]] to (Trinh, Paragraph [0078], code embodied on a machine-readable medium) or hardware modules. A "hardware module" is a tangible unit capable of performing certain operations and may be configured or arranged in a certain physical manner one or more hardware modules of a computer system ( e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations): by mapping the feature vector from the curated image and the feature vectors from the plurality of user-submitted images in a vector space (Trinh, Paragraph [0075], matching module 304 matches the attribute vector and a descriptor vector of the two or more descriptor vectors using a bipartite matching algorithm. The bipartite matching algorithm performs pairwise comparisons among two or more attribute vectors associated with the two or more items, and the two or more of description vectors);  determining distances between the feature vector from the curated image and each of the feature vectors from the plurality of user-submitted images in the vector space; and determining that distances between the feature vectors of the sub-set of the user-submitted images and the feature vector from the curated image are within a threshold distance (Trinh, Paragraph [0016], The classifying of similar images into a cluster of images may be based on identifying feature vectors that are similar. Two feature vectors may be similar if a computed distance value between the two feature vectors does not exceed (e.g., is equal to or is less than) a certain threshold value).
Trinh does not explicitly disclose but Sacheti teaches comprising instructions that, when executed by the at least one processor (Sacheti, Paragraph [0055], examples of the invention may be practiced in an electrical circuit comprising discrete electronic elements, packaged or integrated electronic chips containing logic gates, a circuit utilizing a microprocessor, or on a single chip containing electronic elements or microprocessors) , cause the computing device to determine the sub-set of the user-submitted images that are similar to the curated image (Sacheti, Abstract, The visually similar images for contextual relevance to the object is filtered (212) based on the propagated multiple categorical classifications).
As explained in rejection of claim 1, the obviousness for combining of excluding portion image of Sacheti into Trinh is provided above.

Regarding Claim 7, the combination of Trinh and Sacheti teaches the invention in claim 1.
The combination further teaches further comprising instructions that, when executed by the at least one processor, cause the computing device to  [[ determine the additional sub-set of user-submitted images that show the product in a view not included in the curated images ]] by mapping the feature vectors from the curated images and the feature vectors from the plurality of user-submitted images in a vector space;  determine distances between the feature vectors from the curated images and each of the feature vectors from the plurality of user-submitted images in the vector space; determine that distances between the feature vectors from the curated images and the feature vectors from one or more user-submitted images of the plurality of user-submitted images exceed a threshold distance; and generate a new cluster comprising the one or more user-submitted images (Trinh, Paragraph [0016], The listing generating system then trains one or more machine learning models to classify similar images into a cluster of images. The classifying of similar images into a cluster of images may be based on identifying feature vectors that are similar. Two feature vectors may be similar if a computed distance value between the two feature vectors does not exceed (e.g., is equal to or is less than) a certain threshold value).
Trinh does not explicitly disclose but Sacheti teaches determine the additional sub-set of user-submitted images that show the product in a view not included in the curated images (Sacheti, Paragraph [0015], a user may be actively accessing the image content through a camera application/service of a mobile computing device, uploading the image content for an image search through a search engine service. access to image recognition processing may not rely on an active usage of the image content by a user).
As explained in rejection of claim 1, the obviousness for combining of excluding portion image of Sacheti into Trinh is provided above.

Regarding Claim 9, the combination of Trinh and Sacheti teaches the invention in claim 1.
The combination further teaches wherein the curated image comprises a product image created by a seller of the product (Trinh, Fig. 2, Element 202, Seller provided item titles; Paragraph [0036], the listing generating system 300 accesses (e.g., receives or obtains) input provided by a seller of multiple items. The seller provided input includes descriptions ( e.g., titles) of the items for sale, and a set (e.g., a number) of unordered and unlabeled images ( e.g., photographs) of the items for sale).

Claims 5, 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Trinh et al. (US 20210201373 A1, hereinafter Trinh) in view of Sacheti et al. (US 2019 0258895 A1, hereinafter Sacheti) as applied to Claim 1 above and further in view of Zhou et al. (US 20170193545 A1, hereinafter Zhou)
Regarding Claim 5, the combination of Trinh and Sacheti teaches the invention in claim 3.
The combination does not explicitly disclose but Zhou teaches comprising instructions that, when executed by the at least one processor, cause the computing device to generate confidence scores corresponding to the labels (Zhou, Paragraph [0070], For each of the detectable objects, the Flickr classifiers output a confidence score corresponding to the probability that the object is represented
in the image).
Zhou and Trinh are analogous since both of them are dealing with using neural network to train and display image data. Trinh provided a way of tracking and extraction the feature vector from product image and use the training model to adjust and fine tune the image data by using neural network. Zhou provided a way when dealing with training image data by confidence score to incrementally adjust the image data when using neural network. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate confidence score taught by Zhou into modified invention of Trinh such that during the modeling training data when using neural network, system will be able to dynamically adjust the training data by using confidence score in order to create more reliable and well-trained machine learning or prediction model. 

Regarding Claim 8, the combination of Trinh and Sacheti teaches the invention in claim 1.
The combination further teaches instructions that, when executed by the at least one processor, cause the computing device to present the sub-set of the user-submitted images by  (Trinh, Paragraph [0047], At operation 216, the listing generating system 300 automatically generates item listings based on the automatically identified matches between image clusters and titles. The listing generating system 300 may include the automatically identified attribute values as keywords in the item listings to describe various aspects of the items for sale.):
 Generating [[ aesthetic ]] values for each user-submitted image of the sub-set of the user- submitted images (Sacheti, Paragraph [0026], The visual search model 104 may comprise access to one or more visual indexes (e.g., databases) that are utilized to match image content (or portions <read on sub-set of images> thereof) to existing image content);  [[ ordering the sub-set of the user-submitted images based on the aesthetic values ]];  and presenting the ordered sub-set of the user-submitted images (Sacheti, Paragraph [0015], , a user may have uploaded image content for searching through a search engine application/service)
	As explained in rejection of claim 1, the obviousness for combining of excluding portion image of Sutherland into Trinh is provided above.
	But the combination does not explicitly disclose generating aesthetic value for images and ordering the images based on the aesthetic values.
	However, Zhou teaches generating aesthetic value for images and ordering the images based on the aesthetic values (Zhou, Paragraph [0057], [0070], [0071],  classifiers output a confidence score corresponding to the probability that the object is represented in the image.   Some example quality factors are: Aesthetic appeal, Product, Brand, Trustworthiness, Clarity, and Layout (this list is provided in descending order of importance in accordance with one embodiment. To further capture the underlying semantics of the image, richer visual descriptions may be obtained from the CNN (Convolutional Neural Networks)-based Flickr classifiers).
Zhou and Trinh are analogous since both of them are dealing with using neural network to train and display image data. Trinh provided a way of tracking and extraction the feature vector from product image and use the training model to adjust and fine tune the image data by using neural network. Zhou provided a way when dealing with training image data by sorting and ordering by different factors like aesthetic appeal when presenting the image data when using the neural network training process. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate sorting and order by different factors taught by Zhou into modified invention of Trinh such that during the modeling training data when using neural network, system will be able to dynamically adjust the training data by adjust the modeling image data by different sorting order in order for user to see the data based on their preferences which increase the flexibility and provide more user friendly environment.


Claims 10-13, 15, 17-18, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Trinh et al. (US 20210201373 A1, hereinafter Trinh) in view of Sutherland (US 20210241035 A1)

Regarding Claim 10, Trinh teaches a system comprising (Trinh, Paragraph [0016], the listing generating system utilizes the images of items as input to train one or more machine learning models to generate features that represent various aspects of particular images):
 at least one non-transitory computer readable medium storing (Trinh, Paragraph [0078], code embodied on a machine-readable medium) or hardware modules. A "hardware module" is a tangible unit capable of performing certain operations and may be configured or arranged in a certain physical manner one or more hardware modules of a computer system ( e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations)a curated image displaying a product and a plurality of user-submitted images displaying the product; at least one server (Trinh, Fig. 1, Element 140 Application Server) configured to cause the system to:  present, via a graphical user interface at a user client device, the curated image displaying the product in a first view (Trinh, Paragraph [0027], the client device 110 may comprise a display module to display information. , the networked system 102 comprises a network-based marketplace (also referred to as “online marketplace”) that responds to requests for product listings); extract, utilizing a machine learning model, a scale and rotation invariant feature vector from the curated image (Trinh, Paragraph [0016], [0022], the input layer of a machine learning model used by the listing generating system accesses the input images and generates features that represent various aspects of the input images; the machine learning models, the listing generating system can abstract the image into a feature vector (hereinafter also an “embedding” or a “vector”) associated with the image);
extract, utilizing the machine learning model, [[ scale and rotation ]] invariant feature vectors from the plurality of user-submitted images displaying the product in one or more views (Trinh, Paragraph [0075], matching module 304 matches the attribute vector and a descriptor vector of the two or more descriptor vectors using a bipartite matching algorithm. The bipartite matching algorithm performs pairwise comparisons among two or more attribute vectors associated with the two or more items, and the two or more of description vectors; [0023], [0024], a selection, by the user (e.g., the seller), of the one or more user interface elements of the user interface allows the user to edit or update the automatically generated listing with additional images. small screens tend to need data and functionality divided into many layers or views,);
map the [[ scale and rotation ]] invariant feature vector from the curated image and the scale and rotation invariant feature vectors from the plurality of user-submitted images in a vector space (Trinh, Paragraph [0075], matching module 304 matches the attribute vector and a descriptor vector of the two or more descriptor vectors using a bipartite matching algorithm. The bipartite matching algorithm performs pairwise comparisons among two or more attribute vectors associated with the two or more items, and the two or more of description vectors);
cluster a [[ sub-set of ]] the user-submitted images that display the product in the first view with the curated image by determining that [[ scale and rotation ]] invariant feature vectors from the [[ sub-set ]] of the user-submitted images are within a threshold distance of the [[ scale and rotation ]] invariant feature vector from the curated image
(Trinh, Paragraph [0047], At operation 216, the listing generating system 300 automatically generates item listings based on the automatically identified matches between image clusters and titles. The listing generating system 300 may include the automatically identified attribute values as keywords in the item listings to describe various aspects of the items for sale. Paragraph [0016], The classifying of similar images into a cluster of images may be based on identifying feature vectors that are similar. Two feature vectors may be similar if a computed distance value between the two feature vectors does not exceed (e.g., is equal to or is less than) a certain threshold value);
receive, via the graphical user interface, a user selection of the curated image (Trinh, Paragraph [0023], a selection, by the user (e.g., the seller), of the one or more user interface elements of the user interface allows the user to edit or update the automatically generated listing with additional images or additional textual description);
and present, via the graphical user interface and based on the user selection of the curated image, the [[ subset ]] of user-submitted images (Trinh, Paragraph [0047], At operation 216, the listing generating system 300 automatically generates item listings based on the automatically identified matches between image clusters and titles. The listing generating system 300 may include the automatically identified attribute values as keywords in the item listings to describe various aspects of the items for sale.).
Trinh does not explicitly disclose scaling and rotation invariant feature vectors and  sub-set of the user-submitted images that display the product in the first view with the curated image.
However, Sutherland teaches extract, utilizing the machine learning model, scale and rotation invariant feature vectors from the plurality of user-submitted images displaying the product in one or more views (Sutherland, Paragraph [0037], the machine learning system 101 can introduce small perturbations ( e.g., changes in size, rotation, color, etc.) to each image);
map the scale and rotation invariant feature vector from the curated image and the scale and rotation invariant feature vectors from the plurality of user-submitted images in a vector space (Sutherland, Paragraph [0028], the images can be collected by a computer vision system 107 over a communication network 109 as a part of a digital map making pipeline to generate a geographic database 111 of the found features/objects); cluster a sub-set of the user-submitted images that display the product in the first view with the curated image (Sutherland, Paragraph [0005], [0029], The apparatus is also caused to provide data for presenting a bulk arrangement of at least one subset of the plurality of images. the bulk arrangement 201 is an 8x8 grid with each grid cell displaying a different image of a found feature)
by determining that scale and rotation invariant feature vectors from the sub-set of the user-submitted images (Sutherland, Paragraph [0037], the machine learning system 101 can introduce small perturbations ( e.g., changes in size <read on scale>, rotation, color, etc.) to each image. [0045], the machine learning system 101 can process the at least one subset to normalize a size, a position, a visual characteristic, or a combination thereof of the feature of interest in the plurality of image); and present, via the graphical user interface and based on the user selection of the curated image, the subset of user-submitted images (Sutherland, Paragraph [0027], the system 100 provides a user interface or data for generating a user interface to enable a human annotator to view multiple found objects).
Sutherland and Trinh are analogous since both of them are dealing with using neural network to train and display image data. Trinh provided a way of tracking and extraction the feature vector from product image and use the training model to adjust and fine tune the image data by using neural network. Sutherland provided a way when dealing with training image data by specifying region to include and/or exclude to incrementally adjust the image data when using neural network. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate region of interest taught by Sutherland into modified invention of Trinh such that during the modeling training data when using neural network, system will be able to dynamically adjust the training data by adjust the modeling region in order to create more reliable and well-trained machine learning or prediction model. 

Regarding Claim 11, the combination of Trinh and Sutherland teaches the invention in claim 10.
The combination further teaches wherein the at least one server is further configured to cause the system to extract the scale and rotation invariant feature vector from the curated image by generating object descriptors for the product in the curated image. (Trinh, Paragraph [0019], [0023], The two or more images depict two or more items, and the two or more descriptions pertain to the two or more items. The listing generating system matches one or more images of the two or more images to a description of the two or more descriptions. Finding similarities among images of items, and among the images and descriptions of the items is a complex problem, especially in a system where there could be large number of images and descriptions. In machine learning, categorical features are those features that may have a value from a finite set of possible values).

Regarding Claim 12, the combination of Trinh and Sutherland teaches the invention in claim 10.
The combination further teaches wherein the at least one server is further configured to cause the system to extract the scale and rotation invariant feature vectors from the plurality of user-submitted images by: (Trinh, Paragraph [0075], matching module 304 matches the attribute vector and a descriptor vector of the two or more descriptor vectors using a bipartite matching algorithm. The bipartite matching algorithm performs pairwise comparisons among two or more attribute vectors associated with the two or more items, and the two or more of description vectors):
Trinh does not explicitly disclose but Sutherland teaches generating object bounding boxes and labels for objects in the plurality of user-submitted images, wherein the object bounding boxes comprise product bounding boxes and product labels corresponding to the product (Sutherland, Paragraph [0034], The annotation or labeling includes any means for indicating the found feature or object including but not limited to tagging the image with a label, indicating the feature as a bounding box in the image, and/or the like; [0038], trained feature detection model 103 ingests this different body of images to extract relevant feature data to predict or classify whether one or more of the images contain the feature or object of interest. In one embodiment, the trained feature detection model 103 can output a bounding box around the feature in a corresponding image, label the bound box with the feature or object of interest);
cropping the product bounding boxes; and extracting scale and rotation invariant feature vectors from the product bounding boxes (Sutherland, Paragraph [0054], on the terminating of the displaying or presenting of the bulk arrangement to end a filtering round, the machine learning system 101 effectively excludes the portion of the plurality of images that has not been presented in the bulk arrangement from the training data; it is noted by excluding the portion of data which will not present at the end which is same as cropping the data).
	As explained in rejection of claim 10, the obviousness for combining of excluding portion image of Sutherland into Trinh is provided above.

Regarding Claim 13, the combination of Trinh and Sutherland teaches the invention in claim 12.
The combination further teaches wherein generating the object bounding boxes and the labels comprises utilizing a region proposal neural network (Sutherland, Paragraph [0038], [0065], the trained feature detection model 103 can output a bounding box around the feature in a corresponding image, label the bound box with the feature or object of interest. the machine learning system 101 includes a neural network or other equivalent machine learning model (e.g., Support Vector Machines, Random Forest, etc.) to detect features or objects).
	As explained in rejection of claim 10, the obviousness for combining of excluding portion image of Sutherland into Trinh is provided above.

Regarding Claim 15, the combination of Trinh and Sutherland teaches the invention in claim 10.
The combination further teaches wherein the at least one server (Trinh, Fig. 1, Element 140 Application Server) is further configured to cause the system to:
 determine that distances between [[ the scale and rotation ]] invariant feature vector from the curated image and [[ scale and rotation ]] invariant feature vectors from one or more user-submitted images of the plurality of user-submitted images exceed the threshold distance (Trinh, Paragraph [0016], The classifying of similar images into a cluster of images may be based on identifying feature vectors that are similar. Two feature vectors may be similar if a computed distance value between the two feature vectors does not exceed (e.g., is equal to or is less than) a certain threshold value);
 generate a new cluster comprising the one or more user-submitted images (Trinh, Paragraph [0016], The listing generating system then trains one or more machine learning models to classify similar images into a cluster of images. The classifying of similar images into a cluster of images may be based on identifying feature vectors that are similar. Two feature vectors may be similar if a computed distance value between the two feature vectors does not exceed (e.g., is equal to or is less than) a certain threshold value);
 and present, via the graphical user interface, the one or more user-submitted images (Trinh, Paragraph [0047], At operation 216, the listing generating system 300 automatically generates item listings based on the automatically identified matches between image clusters and titles. The listing generating system 300 may include the automatically identified attribute values as keywords in the item listings to describe various aspects of the items for sale.).
	Trinh does not explicitly disclose the scale and rotation invariant feature vector from curated image.
	However, Sutherland teaches extract, utilizing the machine learning model, scale and rotation invariant feature vectors from the plurality of user-submitted images displaying the product in one or more views (Sutherland, Paragraph [0037], the machine learning system 101 can introduce small perturbations ( e.g., changes in size, rotation, color, etc.) to each image);
As explained in rejection of claim 10, the obviousness for combining of excluding portion image of Sutherland into Trinh is provided above.

Regarding Claim 17, the combination of Trinh and Sutherland teaches the invention in claim 10.
The combination further teaches wherein the at least one server is further configured to cause the system to: present, via the graphical user interface, an additional views element  (Trinh, Paragraph [0023], a selection, by the user (e.g., the seller), of the one or more user interface elements of the user interface allows the user to edit or update the automatically generated listing with additional images or additional textual description);
receive, via the graphical user interface, a user selection of the additional views element; and present, via the graphical user interface and based on the user selection of the additional views element, one or more user-submitted images comprising different views of the product than the curated image (Sutherland, Paragraph [0030], the user interface 200 presents an instruction 205 to instruct the human annotator to “select the outlier image/false positive image to filter from training data.” [0054], on the terminating of the displaying or presenting of the bulk arrangement to end a filtering round, the machine learning system 101 effectively excludes the portion of the plurality of images that has not been presented in the bulk arrangement from the training data [0055], The retrained feature detection model can be used to classify the previously classified images and/or additional images).
As explained in rejection of claim 10, the obviousness for combining of excluding portion image of Sutherland into Trinh is provided above.

Regarding Claim 18, Trinh teaches in a digital medium environment for storing and displaying digital images, a computer-implemented method for generating clusters of digital images comprising (Trinh, Paragraph [0059], [0078], a method for automatic listing generation for multiple items based on joint matching of images of the items and descriptions of the items; code embodied on a machine-readable medium) or hardware modules. A "hardware module" is a tangible unit capable of performing certain operations and may be configured or arranged in a certain physical manner one or more hardware modules of a computer system ( e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations):
 performing a step for identifying [[ a sub-set of ]] user-submitted images of a product that have a similar orientation and view as curated image of the product;
 providing a graphical user interface displaying a plurality of curated images of the product (Trinh, Fig. 1, Element 110 Client Device; Paragraph [0027], the client device 110 may comprise a display module to display information (e.g., in the form of user interfaces. Fig. 2, Element 200, Paragraph [0027], the networked system 102 comprises a network-based marketplace (also referred to as "online marketplace") that responds to requests for product listings, publishes publications comprising item listings of products);
 receiving a selection of the curated image from the plurality of curated images of the product;
 and providing, via the graphical user interface, [[ the sub-set of ]] of the user-submitted images of the product that have the similar orientation and view as the curated image
(Trinh, Paragraph [0016], the listing generating system, using a plurality of feature vectors associated with a plurality of images, identifies a number of images that are similar. The listing generating system then trains one or more machine learning models to classify similar images into a cluster of images).
	Trinh does not explicitly disclose a sub-set of the user-submitted images of product.
	However, Sutherland teaches performing a step for identifying a sub-set of  the user-submitted images of a product that have a similar orientation and view as curated image of the product (Sutherland, Paragraph [0042], the machine learning system 101 provides data for presenting a bulk arrangement of at least one subset of the plurality of images (e.g., the body of images received from the feature detection model 103 of the step 405 of the process 400). the subset includes multiple images (e.g., at least two images) of the received body of images);
and providing, via the graphical user interface, [[ the sub-set of ]] of the user-submitted images of the product that have the similar orientation and view as the curated image (Sutherland, Paragraph [0054], on the terminating of the displaying or presenting of the bulk arrangement to end a filtering round, the machine learning system 101 effectively excludes the portion of the plurality of images that has not been presented in the bulk arrangement from the training data).
Sutherland and Trinh are analogous since both of them are dealing with using neural network to train and display image data. Trinh provided a way of tracking and extraction the feature vector from product image and use the training model to adjust and fine tune the image data by using neural network. Sutherland provided a way when dealing with training image data by specifying region to as subset of the images to incrementally adjust the image data when using neural network. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate subset taught by Sutherland into modified invention of Trinh such that during the modeling training data when using neural network, system will be able to dynamically adjust the training data by adjust subset of modeling region in order to create more reliable and well-trained machine learning or prediction model. 

Regarding Claim 19, it recites limitations similar in scope to the limitations of Claim 17 and therefore is rejected under the same rationale.

Claims 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Trinh et al. (US 20210201373 A1, hereinafter Trinh) in view of Sutherland (US 20210241035 A1)as applied to Claim 10 above and further in view of Zhou et al. (US 20170193545 A1, hereinafter Zhou)
Regarding Claim 14, the combination of Trinh and Sacheti teaches the invention in claim 12.
The combination does not explicitly disclose but Zhou teaches wherein the at least one server is further configured to cause the system to generate confidence scores corresponding to the labels (Zhou, Paragraph [0070], For each of the detectable objects, the Flickr classifiers output a confidence score corresponding to the probability that the object is represented
in the image).
Zhou and Trinh are analogous since both of them are dealing with using neural network to train and display image data. Trinh provided a way of tracking and extraction the feature vector from product image and use the training model to adjust and fine tune the image data by using neural network. Zhou provided a way when dealing with training image data by confidence score to incrementally adjust the image data when using neural network. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate confidence score taught by Zhou into modified invention of Trinh such that during the modeling training data when using neural network, system will be able to dynamically adjust the training data by using confidence score in order to create more reliable and well-trained machine learning or prediction model. 

Claims 16, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Trinh et al. (US 20210201373 A1, hereinafter Trinh) in view of Sutherland (US 20210241035 A1) as applied to Claim 10, 18 above respectively, and further in view of Zhou et al. (US 20170193545 A1, hereinafter Zhou)

Regarding Claim 16, the combination of Trinh and Sutherland teaches the invention in claim 10.
The combination further teaches wherein the at least one server is further configured to cause the system to present the sub-set of the user-submitted images by (Trinh, Paragraph [0047], At operation 216, the listing generating system 300 automatically generates item listings based on the automatically identified matches between image clusters and titles. The listing generating system 300 may include the automatically identified attribute values as keywords in the item listings to describe various aspects of the items for sale.):
 Generating [[ aesthetic ]] values for each user-submitted image of the sub-set of the user- submitted images;  ordering the sub-set of the user-submitted images based on the [[ aesthetic values ]];  and presenting the ordered sub-set of the user-submitted images (Sutherland, Paragraph [0038], the trained feature detection model 103 can output a bounding box around the feature in a corresponding image, label the bound box with the feature or object of interest, and calculate detection confidence data for the images. The detection confidence data, for instance, includes a calculated confidence indicating the probability determined by the feature detection model 103 that an image depicts a feature or object of interest [0029], the bulk arrangement 201 can sort or present the found features in an order based on detection confidence of the found feature in each image.).
	As explained in rejection of claim 1, the obviousness for combining of excluding portion image of Sutherland into Trinh is provided above.
	But the combination does not explicitly disclose generating aesthetic value for images and ordering the images based on the aesthetic values.
	However, Zhou teaches generating aesthetic value for images and ordering the images based on the aesthetic values (Zhou, Paragraph [0057], [0070], [0071],  classifiers output a confidence score corresponding to the probability that the object is represented in the image.   Some example quality factors are: Aesthetic appeal, Product, Brand, Trustworthiness, Clarity, and Layout (this list is provided in descending order of importance in accordance with one embodiment. To further capture the underlying semantics of the image, richer visual descriptions may be obtained from the CNN (Convolutional Neural Networks)-based Flickr classifiers).
Zhou and Trinh are analogous since both of them are dealing with using neural network to train and display image data. Trinh provided a way of tracking and extraction the feature vector from product image and use the training model to adjust and fine tune the image data by using neural network. Zhou provided a way when dealing with training image data by sorting and ordering by different factors like aesthetic appeal when presenting the image data when using the neural network training process. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate sorting and order by different factors taught by Zhou into modified invention of Trinh such that during the modeling training data when using neural network, system will be able to dynamically adjust the training data by adjust the modeling image data by different sorting order in order for user to see the data based on their preferences which increase the flexibility and provide more user friendly environment.

Regarding Claim 20, it recites limitations similar in scope to the limitations of Claim 16 and therefore is rejected under the same rationale.

Response to Arguments
Applicant’s arguments with respect to claim 1 filed on 8/19/2022, with respect to rejection under 35 USC § 103 in regard to prior art does not teaches the limitation “ "receiv[ing] ... a user selection of an additional views element; and ...present[ing], via the graphical user interface and based on the user selection of the additional views element, one or more of the user-submitted images in the additional sub-set of user-submitted images that show the product in a view not included in the curated images” have been considered but are moot in view of the new ground(s) of rejection.
Applicant asserts that Claim 10 fails to teach the limitation "cluster a sub-set of the user-submitted images that display the product in the first view with the curated image by determining that scale and rotation invariant feature vectors from the sub-set of the user-submitted images are within a threshold distance of the scale and rotation invariant feature vector from the curated image" has been considered but is not persuasive.
In response to the argument, prior art Trinh teaches in Paragraph 0016, 0023, 0047 of classifying  user-submitted image that display the product and a selection, by the user to update the automatically generated listing with additional images. Prior art Sutherland teaches in Paragraph 0028, 0029, 0037 that system provided rotation, resizing and other actions for those user provided images and to extract characteristics information of subset of plurality of images. By combing the Sutherland resizing and extraction to the updating image of Trinh, combination of prior arts fully anticipate all the limitations. Therefore, applicant remark cannot be considered persuasive.
Applicant asserts that Claim 18 fails to conduct 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph analysis has been considered but is not persuasive.
In response to the argument, the Claim 18 was properly analyzed by examiner in the previous office action and there is no issue with the step-plus-function and/or other claimed language which could triggered by the 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, hence there was no 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph was given in the previous office action and since no amendment has been made to the claim, there is no need for further analysis. Therefore, applicant remark cannot be considered persuasive.
Regarding claims 2-9, 11-17, 19-20, they directly/indirectly depends on independent Claim 1, 10, 18 respectively. Applicant does not argue anything other than the independent claim 1, 10, 18. The limitations in those claims in conjunction with combination previously established as explained.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YUJANG TSWEI whose telephone number is (571)272-6669. The examiner can normally be reached 8:30am-5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached on (571) 272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/YuJang Tswei/Primary Examiner, Art Unit 2619