DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is in response to applicant’s amendment/arguments filed on 4/26/2021. This action is made FINAL.

Response to Arguments
Applicant’s arguments with respect to claim(s) have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.

3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Aguilar et al. (USPN 2011/0158510) in view of Wnuk et al. (USPN 2015/0023602).
Consider claim 1, Aguilar discloses a device for recognizing an object included in an input image, the device comprising: a memory in which at least one program is stored; a camera configured to capture an environment around the device (read as “natural imagery that contains complex evolving visual elements”); and at least one processor configured to execute the at least one program to recognize the object included in the input image, wherein the at least one program comprises instructions to: obtain the input image by controlling the camera; obtain information about the environment around the device that obtains the input image; determine, based on the information about the environment, a standard for using a plurality of feature value sets in a combined way, the plurality of feature value sets being used to recognize the object in the input image (read as “L1 to L5 features represented in the universal feature vector to recognize a plurality of object types”); and recognize the object included in the input image, by using the plurality of feature value sets based on the determined standard for using the plurality of feature value sets in the combined way (see figs. 2 and 3; [0008; 0050]; 
natural imagery that contains complex evolving visual elements is difficult. Most vision systems are modeled on how a computer sees the world, rather than the human visual system, and are subject to one or more constraints or limitations in order to provide useful metadata. The image data is typically segmented in a supervised procedure to identify certain segments of the image for consideration. Supervised segmentation is not a practical constraint in many applications. Systems are typically not robust to changes in viewing conditions. The metadata may be limited to provide semantic information only at one level. The system may not be scalable to complex scenes or broad classes of scenes. In many cases, the extraction of scene descriptors is application specific, not universal.
[0050] Referring now to FIGS. 12a and 12b, metadata is constructed at multiple semantic levels (step 52) by iterating through each ROI (step 190) to present each universal feature vector 192 to multiple classifiers 194 trained to extract semantic information at different levels of a scene understanding hierarchy in the form of scene semantic descriptors 195 (step 196), assembling the scene semantic descriptors into a metadata vector 198 (step 200), converting the metadata vector into a structured description (step 202) and L1 to L5 features represented in the universal feature vector to recognize a plurality of object types, object-to-object relationships, activities and scene situations at different levels of the scene understanding hierarchy.

    PNG
    media_image1.png
    896
    846
    media_image1.png
    Greyscale
).
However, Aguilar does not explicitly disclose a camera to capture.
Nevertheless, Aguilar discloses image of environment.
Therefore, it would have been obvious to one of ordinary skill in the art at a time before the effective filing date of the claimed subject matter to implement a camera in 
However, Aguilar does not explicitly disclose at least one sensor.
In the related field of endeavor, Wnuk discloses at least one sensor (see figs. 1 and 2; [0047]; [0047] From the perspective of a device or apparatus 120 (e.g., a cell phone, a tablet, a kiosk, an appliance, a vehicle, a game console, etc.) operating as recognition engine 130 in the field, apparatus 120 can, optionally, include at least one sensor 122 configured to obtain digital representation 140 of a plurality of objects in a scene 110. Example sensors 122 can include GPS, hall probes, cameras, RFID reader, near field radios, microphones, biometric sensors, touch screens, accelerometers, magnetometers, gyroscopes, spectrometers, strain or stress gauges, pulse oximeters, seisometer, galvanometers, Radar, LIDAR, infra red sensor, flow sensor, anemometer, Geiger counter, scintillator, barometer, piezoelectric sensor, or other types of sensors. In view that the sensors 122 can cover a broad spectrum of data acquisition devices one should appreciate digital representation 140 can comprise a broad spectrum of data modalities and could include one or more of the following types of data: image data, text data, audio data, video data, biometric data, game data, shopping or product data, weather data, or other types of data. The discussion herein presents the inventive subject matter from the perspective of image or video data for clarity purposes only without limiting the scope of the inventive subject matter. One should appreciate that the inventive subject matter is considered to include leveraging the disclosed techniques to quickly recognize objects across many different data modalities.)

Consider claim 2 as applied to respective claim, Aguilar discloses the plurality of feature value sets comprise a first feature value set and a second feature value set, and the determining of the standard for using the plurality of feature value sets in the combined way comprises determining, based on the information about the environment, the standard for using the first feature value set and the second feature value set (see fig. 2; L1 thru L5 feature extraction and universal feature vector).
Consider claim 3 as applied to respective claim, Aguilar discloses the determining of the standard for using the plurality of feature value sets in the combined way comprises determining, based on the information about the environment, use frequencies of the first feature value set and the second feature value set to be different from each other (see [0011]; hierarchy of features).
Consider claim 4 as applied to respective claim, Aguilar discloses the determining of the standard for using the plurality of feature value sets in the combined way comprises determining, based on the information about the environment, a weight of each of the first feature value set and the second feature value set, and the recognizing of the object included in the input image comprises: respectively applying the weight of the first feature value set and the weight of the second feature value set to a first object recognition result obtained by using the first feature value set and a second object recognition result obtained by using the second feature value set; and recognizing the object included in the input image, based on an object recognition result 
Consider claim 5 as applied to respective claim, Aguilar discloses the determining of the standard for using the plurality of feature value sets in the combined way comprises: recognizing, by using the first feature value set and the second feature value set, an object included in at least one previous input image obtained during a certain time before the input image is obtained; comparing an object recognition rate based on the first feature value set with an object recognition rate based on the second feature value set; and determining, based on a result of the comparison, the standard for using each of the first feature value set and the second feature value set (see fig. 2; L1 thru L5 feature extraction and universal feature vector).
Consider claim 6 as applied to respective claim, Aguilar discloses the information about the environment comprises at least one of information about a time when the input image is captured, information about the weather when the input image is captured, and information about a place where the input image is captured (see [0011]; hierarchy of features).
Consider claim 7 as applied to respective claim, Aguilar discloses the recognizing of the object included in the input image comprises: comparing the plurality of feature value sets with a feature value extracted from the input image, based on the determined standard for using the plurality of feature value sets in the combined way; 
Consider claim 8 as applied to respective claim, Aguilar discloses the feature value sets comprise at least one of information about an outline of the object, information about a brightness of the object, and information about a color of the object (see [0025]; hierarchy of multiple salient regions).

Examiner Note: See detailed rejection of claim 1. Similar reasoning applies.

Consider claim 9, Aguilar discloses a method, performed by a device, of recognizing an object included in an input image, the method comprising: obtaining an input image by capturing an environment around the device; obtaining information about the environment around the device that obtains the input image; determining, based on the information about the environment, a standard for using a plurality of feature value sets in a combined way, the plurality of feature value sets being used to recognize the object in the input image; and recognizing at least one object from the input image, by using the plurality of feature value sets based on the determined standard for using the plurality of feature value sets in the combined way (see figs. 2 and 3; [0008; 0050]; 
[0008] The extraction of metadata that provides image understanding from complex visual environments e.g. natural imagery that contains complex evolving visual elements is difficult. Most vision systems are modeled on how a computer sees the world, rather than the human visual system, and are subject to one or more constraints or limitations in order to 
[0050] Referring now to FIGS. 12a and 12b, metadata is constructed at multiple semantic levels (step 52) by iterating through each ROI (step 190) to present each universal feature vector 192 to multiple classifiers 194 trained to extract semantic information at different levels of a scene understanding hierarchy in the form of scene semantic descriptors 195 (step 196), assembling the scene semantic descriptors into a metadata vector 198 (step 200), converting the metadata vector into a structured description (step 202) and outputting the structure description as metadata for the ROI (step 204). Each classifier is suitably an ARTMAP classifier which was designed to effectively classify high-dimensional feature vectors. The ARTMAP classifier is described by Carpenter, G. A., Grossberg, S., & Reynolds, J. H. "ARTMAP: Supervised real-time learning and classification of L1 to L5 features represented in the universal feature vector to recognize a plurality of object types, object-to-object relationships, activities and scene situations at different levels of the scene understanding hierarchy.

    PNG
    media_image1.png
    896
    846
    media_image1.png
    Greyscale
).
However, Aguilar does not explicitly disclose a camera to capture.
Nevertheless, Aguilar discloses image of environment.
Therefore, it would have been obvious to one of ordinary skill in the art at a time before the effective filing date of the claimed subject matter to implement a camera in 
However, Aguilar does not explicitly disclose at least one sensor.
In the related field of endeavor, Wnuk discloses at least one sensor (see figs. 1 and 2; [0047]; [0047] From the perspective of a device or apparatus 120 (e.g., a cell phone, a tablet, a kiosk, an appliance, a vehicle, a game console, etc.) operating as recognition engine 130 in the field, apparatus 120 can, optionally, include at least one sensor 122 configured to obtain digital representation 140 of a plurality of objects in a scene 110. Example sensors 122 can include GPS, hall probes, cameras, RFID reader, near field radios, microphones, biometric sensors, touch screens, accelerometers, magnetometers, gyroscopes, spectrometers, strain or stress gauges, pulse oximeters, seisometer, galvanometers, Radar, LIDAR, infra red sensor, flow sensor, anemometer, Geiger counter, scintillator, barometer, piezoelectric sensor, or other types of sensors. In view that the sensors 122 can cover a broad spectrum of data acquisition devices one should appreciate digital representation 140 can comprise a broad spectrum of data modalities and could include one or more of the following types of data: image data, text data, audio data, video data, biometric data, game data, shopping or product data, weather data, or other types of data. The discussion herein presents the inventive subject matter from the perspective of image or video data for clarity purposes only without limiting the scope of the inventive subject matter. One should appreciate that the inventive subject matter is considered to include leveraging the disclosed techniques to quickly recognize objects across many different data modalities.)
Therefore, it would have been obvious to one of ordinary skill in the art at a time before the effective filing date of the claimed subject matter to incorporate the sensors of Wnuk with the camera imaging of Aguilar in order to more accurately recognize objects. 
Consider claim 10 as applied to respective claim, Aguilar discloses the plurality of feature value sets comprise a first feature value set and a second feature value set, and the determining of the standard for using the plurality of feature value sets in the combined way comprises determining, based on the information about the environment, the standard for using the first feature value set and the second feature value set (see fig. 2; L1 thru L5 feature extraction and universal feature vector).
Consider claim 11 as applied to respective claim, Aguilar discloses the determining of the standard for using the plurality of feature value sets in the combined way comprises determining, based on the information about the environment, use frequencies of the first feature value set and the second feature value set to be different from each other (see [0011]; hierarchy of features).
Consider claim 12 as applied to respective claim, Aguilar discloses the determining of the standard for using the plurality of feature value sets in the combined way comprises determining, based on the information about the environment, a weight of each of the first feature value set and the second feature value set, and the recognizing of the object included in the input image comprises: respectively applying the weight of the first feature value set and the weight of the second feature value set to a first object recognition result obtained by using the first feature value set and a second object recognition result obtained by using the second feature value set; and recognizing the object included in the input image, based on an object recognition result determined based on the first object recognition result to which the weight of the first 
Consider claim 13 as applied to respective claim, Aguilar discloses the determining of the standard for using the plurality of feature value sets in the combined way comprises: recognizing, by using the first feature value set and the second feature value set, an object included in at least one previous input image obtained during a certain time before the input image is obtained; comparing an object recognition rate based on the first feature value set with an object recognition rate based on the second feature value set; and determining, based on a result of the comparison, the standard for using each of the first feature value set and the second feature value set (see fig. 2; L1 thru L5 feature extraction and universal feature vector).
Consider claim 14 as applied to respective claim, Aguilar discloses the information about the environment comprises at least one of information about a time when the input image is captured, information about the weather when the input image is captured, and information about a place where the input image is captured (see [0011]; hierarchy of features).
Consider claim 15 as applied to respective claim, Aguilar discloses the recognizing of the object included in the input image comprises: comparing the plurality of feature value sets with a feature value extracted from the input image, based on the determined standard for using the plurality of feature value sets in the combined way; and recognizing an object having a highest degree of similarity as the object, based on a result of the comparison (see [0025]; hierarchy of multiple salient regions).
claim 16 as applied to respective claim, Aguilar discloses the feature value sets comprise at least one of information about an outline of the object, information about a brightness of the object, and information about a color of the object (see fig. 2; L1 thru L5 feature extraction and universal feature vector).
Consider claim 17, Aguilar discloses a non-transitory computer readable recording medium having recorded thereon at least instruction which, when executed by a processor, causes the processor to: obtain an input image by capturing an environment around the device; obtain information about the environment around the device that obtains the input image; determining, based on the information about the environment, a standard for using a plurality of feature value sets in a combined way, the plurality of feature value sets being used to recognize the object in the input image; and recognizing at least one object from the input image, by using the plurality of feature value sets based on the determined standard for using the plurality of feature value sets in the combined way (see figs. 2 and 3; [0008; 0050]; 
[0008] The extraction of metadata that provides image understanding from complex visual environments e.g. natural imagery that contains complex evolving visual elements is difficult. Most vision systems are modeled on how a computer sees the world, rather than the human visual system, and are subject to one or more constraints or limitations in order to provide useful metadata. The image data is typically segmented in a supervised procedure to identify certain segments of the image for consideration. Supervised segmentation is not a practical constraint in many applications. Systems are typically not robust to changes in viewing conditions. The metadata may be limited to provide semantic information only at one level. The system may not be scalable to complex scenes or broad classes of scenes. In many cases, the extraction of scene descriptors is application specific, not universal.
[0050] Referring now to FIGS. 12a and 12b, metadata is constructed at multiple semantic levels (step 52) by iterating through each ROI (step 190) to present each universal feature vector 192 to multiple classifiers 194 trained to extract semantic information at different levels of a scene understanding hierarchy in the form of scene semantic descriptors 195 (step 196), assembling the scene semantic descriptors into a metadata vector 198 (step 200), converting the metadata vector into a structured description (step 202) and outputting the structure description as metadata for the ROI (step 204). Each classifier is suitably an ARTMAP classifier which was designed to effectively classify high-dimensional feature vectors. The ARTMAP classifier is described by Carpenter, G. A., Grossberg, S., & Reynolds, J. H. "ARTMAP: Supervised real-time learning and classification of nonstationary data by a self-organizing neural network", Neural Networks (Publication), 4, 565-588, 1991 and Carpenter, G. A., Grossberg, S., Markuzon, N., Reynolds, J. H., & Rosen, D. B., "Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps", IEEE Transactions on Neural Networks, 3, 698-713, 1992, which are hereby incorporated by reference. The classifiers are suitably configured and trained to detect different patterns of the L1 to L5 features represented in the universal feature vector to recognize a plurality of object types, object-to-object relationships, activities and scene situations at different levels of the scene understanding hierarchy.

    PNG
    media_image1.png
    896
    846
    media_image1.png
    Greyscale
).
However, Aguilar does not explicitly disclose a camera to capture.
Nevertheless, Aguilar discloses image of environment.
Therefore, it would have been obvious to one of ordinary skill in the art at a time before the effective filing date of the claimed subject matter to implement a camera in order to capture images and yield predictable results of image capture and object detection.
However, Aguilar does not explicitly disclose at least one sensor.
In the related field of endeavor, Wnuk discloses at least one sensor (see figs. 1 and 2; [0047]; [0047] From the perspective of a device or apparatus 120 (e.g., a cell phone, a tablet, a kiosk, an appliance, a vehicle, a game console, etc.) operating as recognition engine 130 in the field, apparatus 120 can, optionally, include at least one sensor 122 configured to obtain digital representation 140 of a plurality of objects in a scene 110. Example sensors 122 can include GPS, hall probes, cameras, RFID reader, near field radios, microphones, biometric sensors, touch screens, accelerometers, magnetometers, gyroscopes, spectrometers, strain or stress gauges, pulse oximeters, seisometer, galvanometers, Radar, LIDAR, infra red sensor, flow sensor, anemometer, Geiger counter, scintillator, barometer, piezoelectric sensor, or other types of sensors. In view that the sensors 122 can cover a broad spectrum of data acquisition devices one should appreciate digital representation 140 can comprise a broad spectrum of data modalities and could include one or more of the following types of data: image data, text data, audio data, video data, biometric data, game data, shopping or product data, weather data, or other types of data. The discussion herein presents the inventive subject matter from the perspective of image or video data for clarity purposes only without limiting the scope of the inventive subject matter. One should appreciate that the inventive subject matter is considered to include leveraging the disclosed techniques to quickly recognize objects across many different data modalities.)
Therefore, it would have been obvious to one of ordinary skill in the art at a time before the effective filing date of the claimed subject matter to incorporate the sensors of Wnuk with the camera imaging of Aguilar in order to more accurately recognize objects. 
Consider claim 18 as applied to respective claim, Aguilar as modified by Wnuk discloses the information about the environment is obtained independently of information that is included in the obtained image (see [0047]; “sensors 122” “weather data”).
Consider claim 19 as applied to respective claim, Aguilar as modified by Wnuk discloses the information about the environment is obtained independently of information that is included in the obtained image (see [0047]; “sensors 122” “weather data”).
Consider claim 20 as applied to respective claim, Aguilar as modified by Wnuk discloses the information about the environment is obtained independently of information that is included in the obtained image (see [0047]; “sensors 122” “weather data”).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any response to this Office Action should be faxed to (571) 273-8300 or mailed to:
Commissioner for Patents
                      P.O. Box 1450
		Alexandria, VA 22313-1450

Hand-delivered responses should be brought to 
Customer Service Window
Randolph Building
401 Dulany Street
Alexandria, VA 22314                                                                                                                                                                           

	Any inquiry concerning this communication or earlier communications from the  
Examiner should be directed to Fayyaz Alam whose telephone number is (571) 270-1102. The Examiner can normally be reached on Monday-Friday from 9:30am to 7:00pm.    

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free) or 703-305-3028.
Any inquiry of a general nature or relating to the status of this application or proceeding should be directed to the receptionist/customer service whose telephone number is (571) 272-2600.

Fayyaz Alam


May 6, 2021

/FAYYAZ ALAM/
Primary Examiner, Art Unit 2662