DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation under - 35 USC § 112(f)
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

Use of the word “means” (or “step for”) in a claim with functional language creates a rebuttable presumption that the claim element is to be treated in accordance with 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph).  The presumption that 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph) is invoked is rebutted when the function is recited with sufficient structure, material, or acts within the claim itself to entirely perform the recited function. 
is not to be treated in accordance with 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph).  The presumption that 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph) is not invoked is rebutted when the claim element recites function but fails to recite sufficiently definite structure, material or acts to perform that function.
Claim elements in this application that use the word “means” (or “step for”) are presumed to invoke 35 U.S.C. 112(f) except as otherwise indicated in an Office action.  Similarly, claim elements that do not use the word “means” (or “step for”) are presumed not to invoke 35 U.S.C. 112(f) except as otherwise indicated in an Office action.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked.
 	As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)       the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;

(C)       the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are:
 classifier, trainer, pre-processor, selection engine in claims 26-38.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 26-45 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1,3,4,1,1,6,5,7-12,1,4+12,1,1,6,9,11 respectively of U.S. Patent No. 10,970,550. Although the claims at issue are not identical, they are not patentably distinct from each other because, the claims of the instant application are obvious variant of the corresponding ones of the US Patent No. 10,970,550. Furthermore, the scopes of the claims on the instant application are also met and encompassed by the corresponding ones of the Patent No. 10,970,550.
 	The apparent difference between the conflicting claims mainly arise from the style of limitation recitation and relative placement of conflicting elements within the claims’ body.
 	Although claims 39-45 are directed toward non-transitory, computer-readable medium, the subject matter recited therein is considered substantively equivalent to the corresponding conflicting system claims of the Patent 10,970,550.  
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.
 
Claims 26-29, 31-34, 36-41, 43, 45 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (2020/0019628, hereinafter Chen) in view of FERIS et al. (US 2017/0154212; hereinafter FERIS).
 	Regarding claim 26, Chen discloses a system (any one or more systems of 100 through 1500 shown in figs. 1-15, ¶0001, claim 10 and dependents, ¶0033) comprising:
The pre-triggering capability of the visual intent classifier extends well beyond such a narrow use case – ¶0027. Visual intent classification 204 receives a source image 202 and utilizes a machine learning model, such as the multilabel classifier 206 to identify subjects in the source image 202 – ¶0041, fig. 2. Likewise, other visual intent and multilabel classifiers in figs. 3-15) configured to:
            generate a segment of image data (As an example, the bounding boxes and/or the classification labels can allow a user to trigger a visual search on only a portion of an image – ¶0037);
            associate the segment with one of a plurality of triggers (Examples classifier, Table 1, e.g. sub-categories Dog, cat, bird, horse and so forth are classified under Taxonomical category of animal. Abstract. Visual intent classification 112 receives a source image (e.g., from the application 106) and produces one or more taxonomy categories/sub-categories (hereinafter classification labels) that describe the subject(s) of an image – ¶0035); and
            determine that the image data includes a match to the one of the plurality of triggers, the match comprising an object, when the association between the segment and the one of the plurality of triggers satisfies a match condition (For example, a source image containing a packaged consumer product gives rise to a different set of user cases than a source image containing a landmark or natural object. Thus, the categories can be used, along with other information, to identify different possible scenarios that the user may wish to trigger ….Thus, visual intent classification can be used in such use cases as helping trigger a particular user scenario based on the content of an image, detecting and recognizing everyday objects in order to help a user better formulate a query that will match what the user desires to find, and/or helping improve the user experience – ¶0027-0029.
Visual intent classification receives a source image and returns multiple classification labels that exist in the source image. For example, a source image may include multiple subjects (items that are included in the source image) such as a vehicle, an animal, a person, and so forth. The visual intent classification model evaluates the source image and returns taxonomy categories that correspond to the subjects in the source image – ¶0025.
Thus, visual intent classification can be used in such use cases as helping trigger a particular user scenario based on the content of an image, detecting and recognizing everyday objects in order to help a user better formulate a query that will match what the user desires to find, and/or helping improve the user experience – ¶0029.
Because the images in the image data store 510 have associated classification labels generated by the same trained multilabel classifier, the classification labels can be used as a pre-screening of the images in the data store 510. Thus, those images with the same classification labels as the labels from the source image can be considered for matching – ¶0066); and
a trainer configured to train the system using training data in response to, in part, the determination by the classifier that the image data includes the match (The visual intent classification 112 utilizes a machine learning model that is trained offline (e.g., offline process 120) to identify classification labels for subjects in a source image… The classification labels can be returned to the application 106, which can use the classification labels to make pre-triggering decisions as described in embodiments presented herein. Additionally, or alternatively, the service can use classification labels to make pre-triggering decisions. In a representative example, the classification labels of a source image can be used by the service 108 and/or the application 106 to help the user more clearly formulate a query that can be passed to the search engine 114– ¶0036.
Because the images in the image data store 510 have associated classification labels generated by the same trained multilabel classifier, the classification labels can be used as a pre-screening of the images in the data store 510. Thus, those images with the same classification labels as the labels from the source image can be considered for matching  – ¶0066.
As described in conjunction with other embodiments of the present disclosure, the classification labels 1008 along with other information (in some embodiments) can be used by scenario selection, pre-triggering logic, and so forth, represented in FIG. 10 by the pre-triggering logic/decisions 1010 to identify further processing that should be performed. One option is that the pre-triggering logic/decisions 1010 can determine to invoke the visual intent detection process 1012. Such may be possible, for example, when the classification labels 1008 are not sufficient to allow the system to help the user formulate a query and it would be helpful to show the user the bounding boxes associated with the subjects in the source image 1002 – ¶0106).
 	Chen is not found disclosing explicitly, wherein: the training data is generated, using a pose estimation model corresponding to the classifier, by tracking the object in a video stream; and the video stream includes the image data.  
pose-aware feature learning system, that includes an object tracker which tracks an object on a subject in a plurality of video frames, a pose estimator which estimates a pose of the subject in a track of the plurality of video frames, an image pair generator which extracts a plurality of image pairs from the track of the plurality of video frames, and labels the plurality of image pairs with the estimated pose and as depicting the same or different object, and a neural network trainer which trains a neural network based on the labeled plurality of image pairs, to predict whether an image pair depicts the same or different object and a pose difference for the image pair.
Similar features are elaborated in Feris in ¶0004, 0007-0008, ¶0036, 0053, 0056, 0074, claim 1 and dependents.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to modify the invention of Chen with the teaching of pose-aware feature learning system of Feris, where training data is generated by estimating pose of a tracked object in s sequence of video frames, to obtain, wherein: the training data is generated, using a pose estimation model corresponding to the classifier, by tracking the object in a video stream; and the video stream includes the image data, because, combining prior art elements according to known method ready for improvement to yield predictable results is obvious. Furthermore, such combination would enhance the versatility of the overall system.
Regarding claim 27, Chen in view of Feris discloses the system of claim 26, wherein the classifier comprises a convolutional neural network (Chen: ¶0092, ¶0093, ¶0123-0124).  
Regarding claim 28, Chen in view of Feris discloses the system of claim 26, wherein the classifier is configured to use keypoint matching to determine whether the image data includes the match (The visual intent detector 1220 comprises a plurality of feature extractors 1204 that extract features from the source image 1202. The extracted features are presented to a plurality of multi-layer predictors 1206 which make predictions about the various features and which are important to recognition of the subjects in the source image 1202. The resultant predictions are used to both classify the subjects using a multi-way classifier 1208 and identify a bounding box for the subject using a bounding box regression analysis 1210 – ¶0120).  
 	Regarding claim 29, Chen in view of Feris discloses the system of claim 26, wherein: the classifier comprises a secondary classifier (The visual intent classification 112 utilizes a machine learning model that is trained offline (e.g., offline process 120) to identify classification labels for subjects in a source image… The classification labels can be returned to the application 106, which can use the classification labels to make pre-triggering decisions as described in embodiments presented herein. Additionally, or alternatively, the service can use classification labels to make pre-triggering decisions. In a representative example, the classification labels of a source image can be used by the service 108 and/or the application 106 to help the user more clearly formulate a query that can be passed to the search engine 114– ¶0036) and
a hierarchical classifier (visual intent classifier 204, fig. 2, ¶0027-0030, ¶0041-0042. For example, a source image containing a packaged consumer product gives rise to a different set of user cases than a source image containing a landmark or natural object. Thus, the categories can be used, along with other information, to identify different possible scenarios that the user may wish to trigger ….Thus, visual intent classification can be used in such use cases as helping trigger a particular user scenario based on the content of an image, detecting and recognizing everyday objects in order to help a user better formulate a query that will match what the user desires to find, and/or helping improve the user experience – ¶0027-0029); and
training the system comprises training the hierarchical classifier using the training data (Visual intent classification utilizes a trained machine learning model that classifies subjects in the image according to a classification taxonomy – abstract.
FIG. 8 is a representative diagram illustrating training 800 of a representative visual intent classification model according to some aspects of the present disclosure. As noted herein, the visual intent classification model is formulated as a multilabel classification problem. In other words, the task is to identify classification labels for subjects that exist in a source image. In one embodiment, the multilabel classifier 806 is a MobileNet classifier trained according to a unique training methodology described herein – ¶0091.
To train the classifier, the taxonomy is used to label image training data 804. In one embodiment both images from the web and images from an image capture device are used in the training data. The training data is used to train the model 806 by inputting a selected training image into the model and evaluating the output of the model to ascertain whether the model identified the proper classification labels as indicated by 808. Feedback 810 in the form of an error function adjusts the weights in the classifier until the model converges 812 – ¶0095).
 	Regarding claim 31, Chen in view of Feris discloses the system of claim 29, wherein: the hierarchical classifier comprises specific classifiers (Examples classifier, Table 1, e.g. sub-categories Dog, cat, bird, horse and so forth are classified under Taxonomical category of animal. Abstract. Visual intent classification 112 receives a source image (e.g., from the application 106) and produces one or more taxonomy categories/sub-categories (hereinafter classification labels) that describe the subject(s) of an image – ¶0035); and
the trainer is configured to train the specific classifiers using a reinforcement learning model, the reinforcement learning model trained to generate classifier hyperparameters using a reward function based on classifier accuracy (¶0128, Huang (2019/0236487) ¶0069, ¶0077. Tuning job and/or learning curve measurement is understood as reward function. Huang is incorporated by reference in Chen ¶0128, as application no. 15/883,686).  
 	Regarding claim 32, Chen in view of Feris discloses the system of claim 26, wherein the training data is generated using the pose estimation model, at least in part, by segmenting the object in a frame of the image data (Ferix: The object tracker 110 tracks an object on a subject in a plurality of video frames. In particular, the object tracker 110 may receive a video frame (e.g., a single video frame) in which the object has been detected (e.g., a video frame including a " bounding box" formed around the object (e.g., clothing region) being tracked), and output a sequence of corresponding image regions in subsequent frames of a plurality of video frames – ¶0037).  
 	Regarding claim 33, Chen in view of Feris discloses the system of claim 26, wherein: the system further comprises a pre-processor configured to automatically determine whether to select and apply a preprocessing technique to the image data before providing the image data to the classifier (feature extractor 1204 and multilayer predictions 1206 is understood as pre-processing techniques applied on the image 1202 before providing the image data to the hierarchical classifier 1208, fig. 12, ¶0120. Feature extractor 1204 and predictor 1206 can also be implemented in training 1300 of fig. 13. E.g. All the sets of layers act as feature extractors and the outputs are tapped off where shown 1332, 1334, 1336,1338, 1340, 1342 after the last layer in each set and fed into a plurality of detector layers 1328 – ¶0126. Training proceeds by presenting training data to the visual intent detector 1306, evaluating the output against what should have been produced 1308 and then adjusting the weights 1310 according to one or more error functions until the model converges 1312 – ¶0127. Since the loop continues until convergence criterion is met, each convergence step is understood functioning as a pre-processing step from the succeeding one. Chen uses "Machine Learning Hyperparameter Tuning Tool" (incorporated herein by reference) as an efficient way to tune various parameters for training the visual intent detector.	
Huang: …a hyperparameter set can be selected, such as a hyperparameter set in a job that exhibits the lowest loss and/or the greatest precision gain during the tuning. This selection may be received as user input after the results comparisons are presented, or it may be provided as an automated identification and selection – ¶0076. Thus, limitation of “pre-processor configured to automatically determine whether to select and apply a preprocessing technique” is understood met, since optimization/learning-model is fine-tuned automatically until desired loss and precision gain is attained.).
 	Regarding claim 34, Chen in view of Feris discloses the system of claim 33, wherein: the pre-processor comprises a reinforcement learning model (Huang: title, … such as a hyperparameter set in a job that exhibits the lowest loss and/or the greatest precision gain during the tuning…. – ¶0076. Here, attaining lowest loss and greatest precision gain us understood as reinforcement); and
Huang:
exhibits the lowest loss and/or the greatest precision gain during the tuning - ¶0076. Here attaining lowest loss and greatest precision is understood as reward function.
The tuning tool may handle training job failures (i.e., failures of the tuning jobs for the different hyperparameter sets) by monitoring job status and re-trying failed jobs automatically – ¶0018. For example, a status indicator may indicate that the job is not yet started, that it is running (and if so, how much progress has been made), that it has failed (and if so, how much progress was made before the failure), or that it has successfully completed – ¶0051. Also see, Chen:  Abstract, ¶0029).  
 	Regarding claim 36, Chen in view of Feris discloses the system of claim 26, wherein the system further comprises a selection engine configured to select, based on the match and in response to the determination that the image data includes the match, a processing engine to generate an output from the image data (Chen: ¶0056-¶0060.
The visual intent detection process 1012 can then operate in one of the two modes previously described to produce a resultant image 1014 with the relevant subject(s) identified by bounding boxes and classification labels. The user can then make a selection 1016 which can be passed to the search engine 1018 to execute the associated query across the appropriate data store(s) 1020. Results can then be returned and displayed to the user 1022 – ¶0107).
 	Regarding claim 37, Chen in view of Feris discloses the system of claim 36, wherein the trainer is further configured to train the selection engine using a reward function based on a Chen: As another example, the scenario selection process 410 can utilize one or more trained machine learning models to invoke scenarios based on the input classification labels 408 and/or other information. As a particular implementation example, personal digital assistants are designed to ascertain what a user desires and then invoke processing to accomplish the task(s) the user is likely to want accomplished. Thus, the classification labels and/or other user information can be presented to either a digital assistant or to a similar entity to identify what the likely use scenario is given the classification labels 408 and/or other information – ¶0056).
 	Regarding claim 38, Chen in view of Feris discloses the system of claim 26, wherein the system is configured to store the image data when the classifier determines that the image data includes the match (see ¶0029, 0039).
Regarding claim 39, Chen discloses a non-transitory, computer-readable medium containing instructions that, when executed by at least one processor of a system, cause the system to perform operations (¶0134-139, fig. 15) comprising:
generating a segment of image data;
associating the segment with one of a plurality of triggers;
determining that the image data includes a match to the one of the plurality of triggers, the match comprising an object, when the association between the segment and the one of the plurality of triggers satisfies a match condition; and

the training data is generated, using a pose estimation model, by tracking the object in a video stream; and the video stream includes the image data (see substantively equivalent claim 26 rejection above).  
 	Regarding claim 40, Chen in view of Feris discloses the non-transitory, computer-readable medium of claim 39, wherein the operations further comprise: determining, using keypoint matching, whether the image data includes the match (The visual intent detector 1220 comprises a plurality of feature extractors 1204 that extract features from the source image 1202. The extracted features are presented to a plurality of multi-layer predictors 1206 which make predictions about the various features and which are important to recognition of the subjects in the source image 1202. The resultant predictions are used to both classify the subjects using a multi-way classifier 1208 and identify a bounding box for the subject using a bounding box regression analysis 1210 – ¶0120); and storing the image data in response to the determination that the image data includes the match (In either case, the output of the visual intent detection 116 can be passed back to the user device (e.g., the application 106) and/or passed to the search engine 114 for visual and/or non-visual search of the data store 118 – ¶0039.
selecting a subset of images from the data store, each image in the subset having at least one associated classification label that matches the at least one classification label associated with the image – ¶0293).  

 	Regarding claim 45, Chen in view of Feris discloses the non-transitory, computer-readable medium of claim 39, wherein the operations further comprise:
            generating, based on the match and in response to the determination that the image data includes the match, an output using the image data (Chen: ¶0056-¶0060.
The visual intent detection process 1012 can then operate in one of the two modes previously described to produce a resultant image 1014 with the relevant subject(s) identified by bounding boxes and classification labels. The user can then make a selection 1016 which can be passed to the search engine 1018 to execute the associated query across the appropriate data store(s) 1020. Results can then be returned and displayed to the user 1022 – ¶0107);  
            providing the output to a user device (¶0056-¶0060, ¶0107); and
            additionally training the system using a reward function based on a degree of engagement of with the provided output (Chen: As another example, the scenario selection process 410 can utilize one or more trained machine learning models to invoke scenarios based on the input classification labels 408 and/or other information. As a particular implementation example, personal digital assistants are designed to ascertain what a user desires and then invoke processing to accomplish the task(s) the user is likely to want accomplished. Thus, the classification labels and/or other user information can be presented to either a digital assistant or to a similar entity to identify what the likely use scenario is given the classification labels 408 and/or other information – ¶0056).
Claim 35, 44 is rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Feris and further in view of Milton (US 2020/0017117).
 	Regarding claim 35, Chen in view of Feris discloses the system of claim 34, except, wherein the reward function is further based on a time required to identify the match.
 	However, Milton discloses neural network and machine learning based classification system (abstract, ¶0038, ¶0057), where reinforcement/reward based functions are used during learning approach and where, reward function can be based on accuracy parameter and/or a response time parameter (¶0121).
	Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was made (pre-AIA ) or before the effective filing date of the claimed invention (AIA ) to modify the invention of Chen in view of Feris to include the teaching of Milton of tuning the learning engine acts based on reward function that optimizes parameters based on response time beside an accuracy only, to obtain, wherein: the reward function is further based on a time required to identify the match, because, combining prior art elements according to known method ready for improvement to yield predictable results is obvious. Furthermore, adding Milton’s approach along with existing accuracy based solution finding of Chen enhances the robustness of the overall system.
 
Regarding claim 44, Chen in view of Feris discloses the non-transitory, computer-readable medium of claim 39, wherein: the operations further comprise automatically determining whether to select and apply a preprocessing technique to the image data (feature extractor 1204 and multilayer predictions 1206 is understood as pre-processing techniques applied on the image 1202 before providing the image data to the hierarchical classifier 1208, fig. 12, ¶0120. Feature extractor 1204 and predictor 1206 can also be implemented in training 1300 of fig. 13. E.g. All the sets of layers act as feature extractors and the outputs are tapped off where shown 1332, 1334, 1336,1338, 1340, 1342 after the last layer in each set and fed into a plurality of detector layers 1328 – ¶0126. Training proceeds by presenting training data to the visual intent detector 1306, evaluating the output against what should have been produced 1308 and then adjusting the weights 1310 according to one or more error functions until the model converges 1312 – ¶0127. Since the loop continues until convergence criterion is met, each convergence step is understood functioning as a pre-processing step from the succeeding one. Chen uses "Machine Learning Hyperparameter Tuning Tool" (incorporated herein by reference) as an efficient way to tune various parameters for training the visual intent detector.
Huang: …a hyperparameter set can be selected, such as a hyperparameter set in a job that exhibits the lowest loss and/or the greatest precision gain during the tuning. This selection may be received as user input after the results comparisons are presented, or it may be provided as an automated identification and selection – ¶0076. Thus, limitation of “pre-processor configured to automatically determine whether to select and apply a preprocessing technique” is understood met, since optimization/learning-model is fine-tuned automatically until desired loss and precision gain is attained.); and the system is trained using a reward function, the reward function based on: a success or failure in identifying the match (Huang:
exhibits the lowest loss and/or the greatest precision gain during the tuning - ¶0076. Here attaining lowest loss and greatest precision is understood as reward function.
The tuning tool may handle training job failures (i.e., failures of the tuning jobs for the different hyperparameter sets) by monitoring job status and re-trying failed jobs automatically – ¶0018. For example, a status indicator may indicate that the job is not yet started, that it is running (and if so, how much progress has been made), that it has failed (and if so, how much progress was made before the failure), or that it has successfully completed – ¶0051. Also see, Chen:  Abstract, ¶0029), and a time required to identify the match (see substantively similar claim 35 rejection above).
Allowable Subject Matter
Claims 30 and 42 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. They also need to overcome double Patenting rejection above.
The following is a statement of reasons for the indication of allowable subject matter:
 Regarding claim 30, prior arts of record taken alone or in combination fail to reasonably disclose or suggest,
wherein the trainer is further configured to train the hierarchical classifier using the training data, in response to: a determination by the secondary classifier that the image data includes the match, and a determination by the hierarchical classifier that the image data does not include the match.  
 Regarding claim 42, prior arts of record taken alone or in combination fail to reasonably disclose or suggest,
. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NURUN N FLORA whose telephone number is (571)272-5742. The examiner can normally be reached M-F 9:30 am -5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mark Zimmerman can be reached on (571)272-7653. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) 





/NURUN N FLORA/Primary Examiner, Art Unit 2619