Detailed Action
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
	
2.	The amendment filed 6/18/21 has been entered.

3.	The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


Claims 5-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.  Independent claims 5 and 14 each recite “receive a gesture input modifying a target product portrayed within a digital canvas to include a visual product feature;” however, it is not clear how the gesture modifies the actual target product itself, as opposed to modifying its digital/virtual representation within the digital canvas.  The language conflates and confuses the concept of the actual target product being modified with the concept of the digital representation being modified, and the lack of clarity renders the claims vague 

4.	The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 



As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.  Claim 1 line 6 recites “a step for utilizing an input-classifier matching model to determine…” and hereby invokes interpretation under 35 U.S.C. 112(f)

5.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Siddique (US 2013/0215116) and Kaur (US 2020/0167556).
6.	Regarding claim 1, Siddique shows: in a digital medium environment for conducting digital searches for products, a computer-implemented method of identifying products based on free flow inputs (para 130, 134 – note the digital interface for searching products based on free flow user inputs, i.e. which are flexible and freely applied ), the computer-implemented method comprising: identifying, via a digital canvas, gesture input indicating a visual product feature of a target product (para 271, 285, 339, 352-354 – note the various gesture inputs which indicate a particular product feature); and providing, for display, the one or more target products that include the visual product feature (Figures 22, 29A, para 113-114, 120 – note how the display shows the products which include the particular selected/indicated features ).  Siddique shows utilizing an input-classifier matching model (para 145, 160, 167 – note the classifier matching model is mentioned) to determine one or more target products that include the visual product feature identified via the digital canvas (para 123, 130, 167, 168  - again the classifier matching model helps to determine particular features and the products containing them).  Nevertheless, Siddique does not go into the exact details of using the model for gesture inputs making the indications per se.  Kaur however does show using machine learning matching models to determine gesture inputs for making indications (para 31-34 show the machine learning matching models to determine gesture inputs – note how these gestures then make indications).  It would have been obvious to a person with ordinary skill in the art to have this in Siddique, because it would provide an efficient way to utilize machine learning matching models in a system that determines gesture inputs for indicating the particular feature.    

7.	Regarding claim 2, Siddique provides the digital canvas for display within a product search interface of a website together with one or more digital images of the one or more target products (see Figures 22, 29A, para 165 which show the website and images of the product).

8.	Regarding claim 3, the digital canvas comprises a product template illustrating the target product, wherein the product template is modifiable by gesture inputs (Kaur para 271, 276, 279 for example show modifying the template via different options including through gesture inputs).

9.	Regarding claim 4, the visual product feature comprises one or more of a presence of an object within the target product, a location of an object within the target product, a shape of an object within the target product, a size of an object within the target product, or a rotation of an object within the target product (note the alternative language and that only one of these need be shown; Siddique shows at least a particular feature which would constitute an additional object and its location within the product, or a size, in Figures 31, para 187, 211, 235 for example). 

10.	Claims 5-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Siddique (US 2013/0215116) and Kaur (US 2020/0167556) and Anusha et al (US 2019/0236677).

11.	Regarding claim 5, Siddique shows a non-transitory computer readable medium comprising instructions that, when executed by at least one processor, cause a computer device to: utilize a plurality of digital image classification models (Figure 6A, para 145, 160, 167 – note the classifier matching model) to generate a plurality of product sets corresponding to respective visual product features (para 123, 130, 167, 168  - the classifier matching model helps to determine particular features and the products containing them); receive, a gesture input modifying a digital canvas which has an image of a product to include a visual product feature (See also the 112 rejection and interpretation for this claim feature as explained above.  Para 271, 285, 339, 352-354 show various gesture inputs which indicate a particular product feature and modify the digital canvas with the image of the product and product feature accordingly); and provide, for display, one or more products from the first product set corresponding to the visual product feature (Figures 22, 29A, para 113-114, 120 – note how the display shows the products which include the particular selected/indicated features).  Siddique does determine, via an input-classifier matching model, a first product set from the plurality of product sets corresponding to a first digital image classification model trained to identify the visual product feature (para 123, 130, 167, 168  - again the classifier matching model helps to determine particular features and the products containing them), but Siddique does not go into the specific details of how the determining is corresponding to the gesture input per se.  Kaur however does show using machine learning matching models to determine indications corresponding to gesture inputs (para 31-34 show the machine learning matching models to determine gesture inputs – note how these gestures then make indications).  It would have been obvious to a person with ordinary skill in the art to have this in Siddique, because it would provide an efficient way to utilize machine learning matching models in a system that determines gesture inputs for indicating the particular feature.   Neither Siddique nor Kaur go into the exact details wherein each of the plurality of digital image classification models is trained to identify a unique visual product feature regarding such product sets corresponding to respective visual product features.  Anusha however does show each of a plurality of digital image classification models is trained to identify a unique visual product feature regarding image sets corresponding to respective visual product features (Figures 4A-B, para 66 – note there are different classification models and each is trained to identify a particular visual feature).  It would have been obvious to a person with ordinary skill in the art to have this in Siddique, especially as modified by Kaur, because it would provide an efficient way to identify visual features in a system that generates product sets corresponding to visual product features.

12.	Regarding claim 6, Siddique modifies the canvas having an image of the product, by receiving gesture input indicating, within the digital canvas, a location for an object depicted on the target product (Siddique shows input to indicate a particular feature which would constitute an additional object and its location within the product in Figures 31, para 187, 211, 235 for example.  Also, para 271, 285, 339, 352-354 show how the inputs may be gesture inputs).

13.	Regarding claim 7, Siddique shows instructions that, when executed by the at least one processor, cause the computer device to determine the first product set by comparing the target product including the visual product feature with representative digital images corresponding to the plurality of product sets (para 125, 145, 165, 223 – the feature is added to the product which is then searched among the representative digital images of products, to produce the product set).  Note that the input indicating the feature may be a gesture (para 271, 285, 339, 352-354 show various gesture inputs which indicate a particular product feature).  See also Kaur para 47, 49 which compares features indicated by the gesture input among digital images.  

14.	Regarding claim 8,  comparing the target product with the representative digital images corresponding to the plurality of product sets comprises: analyzing the representative digital images of the plurality of product sets to generate confidence scores for the plurality of product sets (Siddique para 130, 157 show the confidence scores based on analyzing the representative images), the confidence scores indicating probabilities that the plurality of product sets correspond to the visual product feature (Siddique para 157-158 show the scores indicate probabilities that the sets correspond to the indicated feature); and identifying, from among the representative digital images of the plurality of product sets and based on the confidence scores, a representative digital image of the first digital image classification model that depicts the visual product feature (Siddique para 157-158, 165 show identifying the image showing the feature).  Note that the input indicating the feature may be a gesture (para 271, 285, 339, 352-354 show various gesture inputs which indicate a particular product feature).  See also Kaur para 47-49 which compares features indicated by the gesture input among digital images with confidence scores that indicated images of a model correspond to the gesture. 

15.	Regarding claim 9, Siddique shows instructions that, when executed by the at least one processor, cause the computer device to receive a second gesture input further modifying the canvas with the image of the target product portrayed including a modified visual product (see the 112 rejection.  Also, para 271, 285, 339, 352-354 show various gesture inputs which indicate a particular product feature – note how more than one gesture can be made to indicate yet another feature).  
16.	Regarding claim 10, in addition to that mentioned for claim 9, Siddique does determine, via an input-classifier matching model, a second product set from the plurality of product sets corresponding to a second digital image classification model trained to identify the second or thus modified visual product feature (para 123, 130, 167, 168  -  the classifier matching model helps to determine particular features and the products containing them, and note this does it for each new gesture), but Siddique does not go into the specific details of how the determining is corresponding to the gesture input per se.  Kaur however does show using machine learning matching models to determine indications corresponding to gesture inputs (para 31-34 show the machine learning matching models to determine gesture inputs – note how these gestures then make indications).  It would have been obvious to a person with ordinary skill in the art to have this in Siddique for the second gesture input, because it would provide an efficient way to utilize machine learning matching models in a system that determines gesture inputs for indicating the particular new or modified feature.    
17.	Regarding claim 11, Siddique receives, via a text input element, a text-based search query (see para 130-131 and 136 which receives the text search query); and determines, via the input-classifier matching model, a third product set from the plurality of product sets corresponding to the text-based search query (para 123, 130-131, 136, 167, 168  -  the classifier matching model helps to determine particular features and the products containing them, and note this does it for each new input including the text search) by: determining a product type from the text-based search query (again para 130-131, 136  shows determining the product type from the text search query); and determining a third digital image classification model from the plurality of digital image classification models trained to identify the visual product feature depicted on products of the product type (para 123, 130, 167, 168  -  the classifier matching model helps to determine particular features and the products containing them).  Note this does it for each new gesture, but Siddique does not go into the specific details of how the determining is corresponding to the gesture input per se.  Kaur however does show using machine learning matching models to determine indications corresponding to gesture inputs (para 31-34 show the machine learning matching models to determine gesture inputs – note how these gestures then make indications).  It would have been obvious to a person with ordinary skill in the art to have this in Siddique for the second gesture input, because it would provide an efficient way to utilize machine learning matching models in a system that determines gesture inputs and text searches in tandem for indicating the particular new or modified feature.    

18.	Regarding claim 12, the computer device trains the plurality of digital image classification models to classify digital images into product sets based on ground truth product sets and training digital images depicting visual product features (Siddique para 123, 130, 167, 168  -  the classifier matching model helps to determine particular features and the products containing them.  The product sets indeed confirm the product and feature and are ground truth sets).

19.	Regarding claim 13, the computer device generates the training digital images by, for each of the plurality of digital image classification models: generating a training set of plain digital images depicting a product (Siddique para 123, 129-130, 167 - note the general set of plain digital images before a feature is selected); and adding one or more visual product features to the plain digital images of the training set (Siddique para 125, 145, 165, 223 – the feature is added to the product which is then searched among the representative digital images of products, to produce the product set).  
20. 	Regarding claim 14, Siddique shows a system comprising: at least one processor; and a non-transitory computer readable medium comprising instructions that, when executed by the at least one processor, cause the system to: receive a gesture modifying a target product portrayed within a digital canvas to include a visual product feature (See also the 112 rejection and interpretation for this claim feature as explained above.  Para 271, 285, 339, 352-354 show various gesture inputs which indicate a particular product feature and modify the digital canvas with the image of the product and product feature accordingly).  Siddique does determine, via an input-classifier matching model, a first product set from the plurality of product sets corresponding to a first digital image classification model trained to identify the visual product feature (Figure 6A, para 123, 130, 167, 168  - again the classifier matching model helps to determine particular features and the products containing them), but Siddique does not go into the specific details of how the determining is corresponding to the gesture input per se.  Kaur however does show using machine learning matching models to determine indications corresponding to gesture inputs (para 31-34 show the machine learning matching models to determine gesture inputs – note how these gestures then make indications).  It would have been obvious to a person with ordinary skill in the art to have this in Siddique, because it would provide an efficient way to utilize machine learning matching models in a system that determines gesture inputs for indicating the particular feature.  Siddique utilizes the input-classifier matching model to determine, for a plurality of digital image classification models corresponding to a plurality of product sets, confidence scores indicating probabilities that the digital image classification models correspond to the visual product feature (Siddique para 130, 157 show the confidence scores based on analyzing the representative images) corresponding to the gesture input; and selecting, based on the confidence scores for the plurality of digital image classification models, a product set corresponding to a digital image classification model that classifies digital images depicting the visual product feature corresponding to the gesture input (Siddique para 157-158 show the scores indicate probabilities that the sets correspond to the indicated feature.  Para 157-158, 165 show identifying the image showing the feature).  Note that the input indicating the feature may be a gesture (para 271, 285, 339, 352-354 show various gesture inputs which indicate a particular product feature).  See also Kaur para 47-49 which compares features indicated by the gesture input among digital images with confidence scores that indicated images of a model correspond to the gesture.  Siddique provides, for display, one or more digital images from the product set that includes products depicting the visual product feature (Figures 22, 29A, para 113-114, 120 – note how the display shows the products which include the particular selected/indicated features).  Neither Siddique nor Kaur go into the exact details wherein each of the plurality of digital image classification models is trained to identify a unique visual product feature regarding such product sets corresponding to respective visual product features.  Anusha however does show each of a plurality of digital image classification models is trained to identify a unique visual product feature regarding image sets corresponding to respective visual product features (Figures 4A-B, para 66 – note there are different classification models and each is trained to identify a particular visual feature).  It would have been obvious to a person with ordinary skill in the art to have this in Siddique, especially as modified by Kaur, because it would provide an efficient way to identify visual features in a system that generates product sets corresponding to visual product features.

21.	Regarding claim 15, the system utilizes the plurality of digital image classification models to generate the plurality of product sets corresponding to respective visual product features, wherein each of the plurality of digital image classification models is trained to identify a unique visual product feature (Siddique para 123, 130, 167, 168  - the classifier matching model helps to determine particular features and the products containing them).

22.	Regarding claim 16, the system determines input the product set corresponding to the visual product feature by comparing the target product including the visual product feature indicated by the gesture input with representative digital images corresponding to the plurality of product sets (Siddique para 125, 145, 165, 223 – the feature is added to the product which is then searched among the representative digital images of products, to produce the product set).  Note that the input indicating the feature may be a gesture (Siddique para 271, 285, 339, 352-354 show various gesture inputs which indicate a particular product feature).  See also Kaur para 47, 49 which compares features indicated by the gesture input among digital images.  

23.	Regarding claim 17, the system utilizes the input-classifier matching model to determine the confidence scores (Siddique para 130, 157 show the confidence scores based on analyzing the representative images) by determining, based on comparing the target product including the visual product feature indicated by the gesture input with the representative digital images from the plurality of product sets, probabilities that the product sets include digital images that depict product types corresponding to the target product and that include the visual product feature indicated by the gesture input (Siddique para 157-158 show the scores indicate probabilities that the sets correspond to the indicated feature).  Note that the input indicating the feature may be a gesture (para 271, 285, 339, 352-354 show various gesture inputs which indicate a particular product feature).  See also Kaur para 47-49 which compares features indicated by the gesture input among digital images with confidence scores that indicated images of a model correspond to the gesture. 
24.	Regarding claim 18, Siddique shows the system provides the digital canvas for display within a product search interface of a website together with the one or more digital images from the product set that includes products depicting the visual product feature (see Figures 22, 29A, para 165 which show the website and images of the product.  See also para 113-114, 120 – note how the display shows the products which include the particular selected/indicated features).  
25.	Regarding claim 19, the gesture input indicating the target product comprises an indication of a location for an object within the target product (Siddique shows a particular feature which would constitute an additional object and its location within the product, in Figures 31, para 187, 211, 235 for example). 

26.	Regarding claim 20, the system determines the confidence scores for the plurality of digital image classification models based on the location for the object within the target product (Siddique para 130-131, 157-158 show the confidence scores based on the new input.  Siddique para 187, 211, 235 shows the new input may be putting the feature at particular location.  Note that the input indicating the feature may be a gesture - para 271, 285, 339, 352-354 show various gesture inputs which indicate a particular product feature).  

27.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
a) Salokhe et al (US 2020/0142978) classification model that has been trained using image data to identify specific features associated with apparel.
b) Roh et al (US 2019/0146458) uses a model trained to identify particular objects and their locations within a subset of images.

28.	Applicant's arguments filed have been fully considered but they are not persuasive.  1) Applicant states that while the 112(f) invocation of claim 1 is acknowledged, that nevertheless “structure disclosed in the specification” has been disregarded in the broadest reasonable interpretation of the claim.  However, applicant does not explain which structure has been disregarded or how the applicant is interpreting the claim.  The Action’s interpretation of claim 1 invoking 112(f) still stands for the reasons given. 2) Applicant’s arguments regarding the feature that “each of the plurality of digital image classification models is trained to identify a unique visual product feature” are moot in view of the new rejection which uses Anusha to show this feature.  3) Regarding applicant’s arguments on the recited portion “receive a gesture input modifying a target product portrayed within a digital canvas to include a visual product feature,” please see the 112 rejection and the attached Interview Summary.  In addition, Siddique does show modifying the canvas and image via user input, and such input may include gestures, as explained in the Action.  4) Regarding applicant’s arguments on the feature “determine, via an input classifier matching model and based on the gesture input, a first product set from the plurality of product sets corresponding to a first digital image classification model trained to identify the visual product feature corresponding to the gesture input,” please note that the Action does not say Siddique nor Kaur individually shows all of that feature.  Rather, it is the combination of references, including, Anusha, which fully brings out the claim features.  Note in particular that Siddique does however show using a classifier model to generate a product set of images, in which features may be inputted by a gesture input, and Kaur shows how features are trained based on a gesture input, as explained in the Action above.  Furthermore, Anusha shows each of the plurality of digital image classification models is trained regarding such product sets corresponding to respective visual product features, as explained in the Action above.  

29.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
30.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to STEVEN PAUL SAX whose telephone number is (571)272-4072. The examiner can normally be reached Monday - Friday, 9:30 - 6:00 Est.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sherief Badawi can be reached on 571-272-9782. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/STEVEN P SAX/           Primary Examiner, Art Unit 2174