DETAILED ACTION


Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-3, 8-10 and 15-16  are rejected under 35 U.S.C. 103 as being unpatentable over Ferstl (US 10691943 B1, DATE FILED: 1-31-2018), and in view of OGALE (US 20190279005 A1).
		Re Claim 1, FERSTL discloses an object detection model visual analysis system  (see Ferstl: e.g.,  --to train the object detection models, however, the imaging data must be annotated, or labeled, to identify the portions of the imaging data depicting one or more objects of interest. …such systems, machine learning algorithms such as deep neural networks (e.g., artificial neural networks having multiple hidden layers provided between an input layer and an output layer) process massive amounts of sensor data and associated ground truth that depict one or more objects of interest…. train an object detection model to recognize an object of interest to a sufficiently high degree of confidence, and at high rates of speed, necessarily requires large volumes of annotated images or video files, as well as sufficient processing power for training a model based on such files…. images and video files are annotated based on the contents of the images or video files alone. For example, images are currently labeled based on the identification of objects therein, e.g., by humans, the automatic detection of objects therein, e.g., by machine learning tools trained to recognize such objects therein, or based on variations in temporal context of the images.--, in line 51, col. 5 through line 57, col 6) comprising:
		an interface device including a display device (see Ferstl: e.g.,  --The computers, servers, devices and the like described herein have the necessary electronics, software, memory, storage, databases, firmware, logic/state machines, microprocessors, communication links, displays or other visual or audio user interfaces, --, in lines 45-58, col. 20);
		and a computer including a communication interface configured to communicate with the interface device (see Ferstl: e.g.,  --The computers, servers, devices and the like described herein have the necessary electronics, software, memory, storage, databases, firmware, logic/state machines, microprocessors, communication links, displays or other visual or audio user interfaces, --, in lines 45-58, col. 20);
		a memory including an object detection mode! visual analysis platform and a first object detection model for autonomous driving (see Ferstl: e.g.,  --imaging devices for which the trained object detection model are to be utilized are in motion, such as when the imaging devices are operated aboard an unmanned aerial vehicle or another autonomous mobile system (e.g., a robot). In such embodiments, the object detection model is preferably trained to recognize objects to sufficiently high degrees of confidence, at high rates of speed. The need to properly train an object detection model to recognize an object of interest to a sufficiently high degree of confidence, and at high rates of speed, necessarily requires large volumes of annotated images or video files, as well as sufficient processing power for training a model based on such files.--, in lines 37-58, col. 6, and in lines 16-34, col. 19; also see: -- the aerial vehicle 210 may further include one or more control systems having one or more electronic speed controls, power supplies, navigation systems and/or payload engagement controllers for controlling the operation of the aerial vehicle 210… the control system 220 may be integrated with one or more of the processors 212, the memory components 214 and/or the transceivers 216--, in line 59, col. 13 through line 18, col. 14);
		and an electronic processor communicatively connected to the memory (see Ferstl: e.g.,  --imaging devices for which the trained object detection model are to be utilized are in motion, such as when the imaging devices are operated aboard an unmanned aerial vehicle or another autonomous mobile system (e.g., a robot). In such embodiments, the object detection model is preferably trained to recognize objects to sufficiently high degrees of confidence, at high rates of speed. The need to properly train an object detection model to recognize an object of interest to a sufficiently high degree of confidence, and at high rates of speed, necessarily requires large volumes of annotated images or video files, as well as sufficient processing power for training a model based on such files.--, in lines 37-58, col. 6, and in lines 16-34, col. 19; also see: -- the aerial vehicle 210 may further include one or more control systems having one or more electronic speed controls, power supplies, navigation systems and/or payload engagement controllers for controlling the operation of the aerial vehicle 210… the control system 220 may be integrated with one or more of the processors 212, the memory components 214 and/or the transceivers 216--, in line 59, col. 13 through line 18, col. 14),
		the electronic processor is configured to extract object information from image data with the first object detection model (see Ferstl: e.g., --Information and/or data regarding features or objects expressed in imaging data, including colors, textures or outlines of the features or objects, may be extracted from the data in any number of ways. For example, colors of image pixels, or of groups of image pixels, in a digital image may be determined and quantified according to one or more standards, e.g., the RGB color model,… textures or features of objects expressed in a digital image may be identified using one or more computer-based methods, such as by identifying changes in intensities within regions or sectors of the image, or by defining areas of an image corresponding to specific surfaces.
		Furthermore, edges, contours, outlines, colors, textures, silhouettes, shapes or other characteristics of objects of any types, or portions of objects of such types, expressed in still or moving digital images may be identified using one or more algorithms or machine-learning tools. The objects or portions of objects may be stationary or in motion, and may be identified at single, finite periods of time, or over one or more periods or durations. Such algorithms or tools may be directed to recognizing and marking transitions (e.g., the edges, contours, outlines, colors, textures, silhouettes, shapes or other characteristics of objects or portions thereof) within the digital images as closely as possible, in a manner that minimizes noise and disruptions, and does not create false transitions. Some detection algorithms or techniques that may be utilized in order to recognize characteristics of objects or portions thereof in digital images--, in lines 3-64, col. 11);
		extract characterisiics of objects from metadata associated with the image data, generate a summary of the object information and the characteristics that are extracted (see Ferstl: e.g., -- a plurality of visual images 150-1 through 150-n are provided to an object detection algorithm operating on the server 180, the cloud-based environment 190 and/or the processors 112 as training inputs, and a plurality of annotations 165-1 through 165-n of such visual images 150-1 through 150-n are provided to the object detection algorithm as training outputs. In some embodiments, the training inputs may further include any other metadata associated with the visual images 150-1 through 150-n, including but not limited to identifiers of prevailing environmental conditions at times at which the visual images 150-1 through 150-n were captured--, in lines 24-60, col. 5 {herein, the training input is a generated summary of the object information and the characteristics that are extracted}, also see in lines 3-64, col. 11),
		generate coordinated visualizations based on the summary and the characteristics that are extracted (see Ferstl: e.g., -- a plurality of visual images 150-1 through 150-n are provided to an object detection algorithm operating on the server 180, the cloud-based environment 190 and/or the processors 112 as training inputs, and a plurality of annotations 165-1 through 165-n of such visual images 150-1 through 150-n are provided to the object detection algorithm as training outputs. In some embodiments, the training inputs may further include any other metadata associated with the visual images 150-1 through 150-n, including but not limited to identifiers of prevailing environmental conditions at times at which the visual images 150-1 through 150-n were captured--, in lines 24-60, col. 5, also see in lines 3-64, col. 11 {herein, as discussed above, the training input is a generated summary of the object information and the characteristics that are extracted, and, correspondingly, herein annotations of visual images is the  generated coordinated visualizations),
		although Ferstl discloses user can manually annotating the visual images (see Ferstl: e.g.,  --…. train an object detection model to recognize an object of interest to a sufficiently high degree of confidence, and at high rates of speed, necessarily requires large volumes of annotated images or video files, as well as sufficient processing power for training a model based on such files…. images and video files are annotated based on the contents of the images or video files alone. For example, images are currently labeled based on the identification of objects therein, e.g., by humans, the automatic detection of objects therein, e.g., by machine learning tools trained to recognize such objects therein, or based on variations in temporal context of the images.--, in line 51, col. 5 through line 57, col 6), Ferstl however does not explicitly disclose output the coordinated visualizations for display on the display device;
		OGALE teaches output the coordinated visualizations for display on the display device (see Ogale: e.g., -- the user interface subsystem 138 can generate a user interface presentation having image or video data containing a representation of the regions of space that are likely to be occupied by vehicles. An on-board display device can then display the user interface presentation for passengers of the vehicle 122.--, in [0033]-[0034], and, -- the object property neural network classifies the object that is likely centered at each selected location. For example, possible classifications include “car,” “pedestrian,” “bicycle,” “road marking,” and “road sign.” Based on its training and the three inputs, the object property neural network can select one of those classifications. The object property neural network can also define a bounding box for each predicted object. A bounding box is a box that identifies the boundaries or edges of an object. The bounding box can be two-dimensional or three-dimensional. A display interface of a vehicle can display such a bounding box to the driver of a semi-autonomous vehicle. The neural network subsystem can also provide the bounding box to the planning system of the vehicle for use in navigation of the vehicle. In some implementations, the object property neural network can predict a “mask” for each object. A mask differs from a bounding box in that it is form-fitted to a respective object. In other words, it more closely identifies the edges of the object. The mask can mark portions of the input sensor data that define the object.--, in [00059]);
		 Ferstl and Ogale are combinable as they are in the same field of endeavor: to detect and classify objects from image processing techniques, particularly both using the convolutional neural network as the detection and classification tool for autonomous driving. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Ferstl’s system using Ogale’s teachings by including output the coordinated visualizations for display on the display device to Ferstl’s user interface in order to generate a user interface presentation having image or video data containing a representation of the regions of space that are likely to be occupied by vehicles (see Ogale: e.g. in in [0033]-[0034], and [0059]);
		Ferstl as modified by Ogale further disclose receive a first one or more user inputs selecting  portion of information that is included in the coordinated visualizations (see Ogale: e.g., -- the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form--, in [0111]), 
		generate a recommendation graphical user interface element based on the coordinated visualizations and the first one or more user inputs, the recommendation graphical user interface element summarizing one or more potential weaknesses of the first object detection model, output the recommendation graphical user interface element for display on the display device (see Ogale: e.g., -- The training neural network subsystem 114 can generate, for each training example 123, one or more object predictions 135, where each object prediction comprises an object detection and properties for each detected object. A training engine 116 analyzes the object predictions 135 and compares the object predictions to the labels in the training examples 123. If the two differ, an error is indicated. The training engine 116 then generates updated model parameter values 145 by using an appropriate updating technique. For example, the model parameters might be updated by calculating the gradient of the error with respect to an individual model parameter. To decrease the error contribution, a value derived from the gradient can be subtracted from or added to the current value of the parameter. This is known as stochastic gradient descent with backpropagation.--, in [0040]; also see Ferstl: e.g., -- outputs are received from the trained classifier, and at box 880, locations of the objects of interest within the second imaging data are identified based on the outputs. The outputs may specify not only portions of a given image frame of the second imaging data that depict one or more of the objects of interest but also include a confidence level or interval (e.g., a percentage or number of standard deviations from the mean, or a margin of error above or below the mean) associated with a probability or likelihood that such portions actually depict one or more of the objects of interest. (104) At box 885, the second imaging data is annotated with the locations of the objects of interest identified based on the outputs.--, in lines 37-53, col. 30);
		output the recommendation graphical user interface element for display on the display device (see Ogale: e.g., -- The training neural network subsystem 114 can generate, for each training example 123, one or more object predictions 135, where each object prediction comprises an object detection and properties for each detected object. A training engine 116 analyzes the object predictions 135 and compares the object predictions to the labels in the training examples 123. If the two differ, an error is indicated. The training engine 116 then generates updated model parameter values 145 by using an appropriate updating technique. For example, the model parameters might be updated by calculating the gradient of the error with respect to an individual model parameter. To decrease the error contribution, a value derived from the gradient can be subtracted from or added to the current value of the parameter. This is known as stochastic gradient descent with backpropagation.--, in [0040], -- the user interface subsystem 138 can generate a user interface presentation having image or video data containing a representation of the regions of space that are likely to be occupied by vehicles. An on-board display device can then display the user interface presentation for passengers of the vehicle 122.--, in [0033]-[0034], and, -- the object property neural network classifies the object that is likely centered at each selected location. For example, possible classifications include “car,” “pedestrian,” “bicycle,” “road marking,” and “road sign.” Based on its training and the three inputs, the object property neural network can select one of those classifications. The object property neural network can also define a bounding box for each predicted object. A bounding box is a box that identifies the boundaries or edges of an object. The bounding box can be two-dimensional or three-dimensional. A display interface of a vehicle can display such a bounding box to the driver of a semi-autonomous vehicle. The neural network subsystem can also provide the bounding box to the planning system of the vehicle for use in navigation of the vehicle. In some implementations, the object property neural network can predict a “mask” for each object. A mask differs from a bounding box in that it is form-fitted to a respective object. In other words, it more closely identifies the edges of the object. The mask can mark portions of the input sensor data that define the object.--, in [00059]);
		receive a second user input selecting one or more individual objects from the object information that is extracted and included in the one or more potential weaknesses (see Ogale: e.g., --To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and pointing device, e.g., a mouse, trackball, or a presence sensitive display or other surface by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user's device in response to requests received from the web browser. Also, a computer can interact with a user by sending text messages or other forms of message to a personal device, e.g., a smartphone, running a messaging application, and receiving responsive messages from the user in return.--, in [0111]),
		output an image based on the image data and the second user input, the image highlighting the one or more individual objects (see Ogale: e.g., --Each location in the output map corresponds to a point in the projected sensor data and is associated with a numerical score representing the likelihood that the center of an object is located at a corresponding location in the environment. For example, the center prediction neural network can generate an output map with scores ranging from zero to one, where zero indicates a low likelihood that an object is centered at a particular location in the output map, and where one indicates a high likelihood that an object is centered at a particular location in the output map.--, in [0048]);
		receive a third user input classifying the one or more individual objects as an actual weakness with respect to the first object detection model (see Ogale: e.g., --To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and pointing device, e.g., a mouse, trackball, or a presence sensitive display or other surface by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user's device in response to requests received from the web browser. Also, a computer can interact with a user by sending text messages or other forms of message to a personal device, e.g., a smartphone, running a messaging application, and receiving responsive messages from the user in return.--, in [0111]), and 
		update the first object detection model based at least in part on the classification of the one or more individual objects as the actual weakness with respect to the first abject detection model to generate a second object detection model for autonomous driving (see Ogale: e.g., --Each location in the output map corresponds to a point in the projected sensor data and is associated with a numerical score representing the likelihood that the center of an object is located at a corresponding location in the environment. For example, the center prediction neural network can generate an output map with scores ranging from zero to one, where zero indicates a low likelihood that an object is centered at a particular location in the output map, and where one indicates a high likelihood that an object is centered at a particular location in the output map.--, in [0048]).

		Re Claim 2, Ferstl as modified by Ogale further disclose the electronic processor is further configured to extract visual features based on the image data, and generate the coordinated visualizations based on the summary and the visual features that are extracted (see Ferstl: e.g., --Information and/or data regarding features or objects expressed in imaging data, including colors, textures or outlines of the features or objects, may be extracted from the data in any number of ways. For example, colors of image pixels, or of groups of image pixels, in a digital image may be determined and quantified according to one or more standards, e.g., the RGB color model,… textures or features of objects expressed in a digital image may be identified using one or more computer-based methods, such as by identifying changes in intensities within regions or sectors of the image, or by defining areas of an image corresponding to specific surfaces.
		Furthermore, edges, contours, outlines, colors, textures, silhouettes, shapes or other characteristics of objects of any types, or portions of objects of such types, expressed in still or moving digital images may be identified using one or more algorithms or machine-learning tools. The objects or portions of objects may be stationary or in motion, and may be identified at single, finite periods of time, or over one or more periods or durations. Such algorithms or tools may be directed to recognizing and marking transitions (e.g., the edges, contours, outlines, colors, textures, silhouettes, shapes or other characteristics of objects or portions thereof) within the digital images as closely as possible, in a manner that minimizes noise and disruptions, and does not create false transitions. Some detection algorithms or techniques that may be utilized in order to recognize characteristics of objects or portions thereof in digital images--, in lines 3-64, col. 11).

		Re Claim 3, Ferstl as modified by Ogale further disclose wherein the visual features include a feature type, a feature size, a feature texture, and a feature shape (see Ferstl: e.g., --Information and/or data regarding features or objects expressed in imaging data, including colors, textures or outlines of the features or objects, may be extracted from the data in any number of ways. For example, colors of image pixels, or of groups of image pixels, in a digital image may be determined and quantified according to one or more standards, e.g., the RGB color model,… textures or features of objects expressed in a digital image may be identified using one or more computer-based methods, such as by identifying changes in intensities within regions or sectors of the image, or by defining areas of an image corresponding to specific surfaces.
		Furthermore, edges, contours, outlines, colors, textures, silhouettes, shapes or other characteristics of objects of any types, or portions of objects of such types, expressed in still or moving digital images may be identified using one or more algorithms or machine-learning tools. The objects or portions of objects may be stationary or in motion, and may be identified at single, finite periods of time, or over one or more periods or durations. Such algorithms or tools may be directed to recognizing and marking transitions (e.g., the edges, contours, outlines, colors, textures, silhouettes, shapes or other characteristics of objects or portions thereof) within the digital images as closely as possible, in a manner that minimizes noise and disruptions, and does not create false transitions. Some detection algorithms or techniques that may be utilized in order to recognize characteristics of objects or portions thereof in digital images--, in lines 3-64, col. 11).

		
		Re Claims 8-10, claims 8-10 are the corresponding method claim to claims 1-3 respectively.  Claims 8-10 thus are rejected for the similar reasons for claims 1-3. See above discussions with regard to claims 1-3 respectively. Further, Ferstl as modified by Ogale further disclose non-transitory computer readable medium comprising instructions that, when executed by one or more processors, cause the one or more processors to perform operations ((see Ogale: e.g., -- The training neural network subsystem 114 can generate, for each training example 123, one or more object predictions 135, where each object prediction comprises an object detection and properties for each detected object. A training engine 116 analyzes the object predictions 135 and compares the object predictions to the labels in the training examples 123. If the two differ, an error is indicated. The training engine 116 then generates updated model parameter values 145 by using an appropriate updating technique. For example, the model parameters might be updated by calculating the gradient of the error with respect to an individual model parameter. To decrease the error contribution, a value derived from the gradient can be subtracted from or added to the current value of the parameter. This is known as stochastic gradient descent with backpropagation.--, in [0040]; also see Ferstl: e.g., -- outputs are received from the trained classifier, and at box 880, locations of the objects of interest within the second imaging data are identified based on the outputs. The outputs may specify not only portions of a given image frame of the second imaging data that depict one or more of the objects of interest but also include a confidence level or interval (e.g., a percentage or number of standard deviations from the mean, or a margin of error above or below the mean) associated with a probability or likelihood that such portions actually depict one or more of the objects of interest. (104) At box 885, the second imaging data is annotated with the locations of the objects of interest identified based on the outputs.--, in lines 37-53, col. 30).
		

		Re Claims 15-16, claims 15-16 are the corresponding medium claim to claims 1-2 respectively.  Claims 15-16 thus are rejected for the similar reasons for claims 1-2. See above discussions with regard to claims 1-2 respectively. Further, Ferstl as modified by Ogale further disclose a non-transitory computer-readable medium comprising instructions that, when executed by an electronic processor, causes the electronic processor to perform a set of operations (see Ferstl: e.g., Fig. 1, and, --The computers, servers, devices and the like described herein have the necessary electronics, software, memory, storage, databases, firmware, logic/state machines, microprocessors, communication links, displays or other visual or audio user interfaces, --, in lines 45-58, col. 20).


2.	Claims 4-7, 11-14, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Ferstl as modified by Ogale, and further in view of MISRA (US 20200394425 A1, DATE FILED: 12-13-2019).
		Re Claim 4, Ferstl as modified by Ogale further disclose the coordinated visualizations include a size distribution visualization, an loU distribution visualization, an Area Under Curve (AUC) visualization, a score distribution, and a class distribution (see Ferstl: e.g., -- Moreover, textures or features of objects expressed in a digital image may be identified using one or more computer-based methods, such as by identifying changes in intensities within regions or sectors of the image, or by defining areas of an image corresponding to specific surfaces. (36) Furthermore, edges, contours, outlines, colors, textures, silhouettes, shapes or other characteristics of objects of any types, or portions of objects of such types, expressed in still or moving digital images may be identified using one or more algorithms or machine-learning tools. The objects or portions of objects may be stationary or in motion, and may be identified at single, finite periods of time, or over one or more periods or durations. Such algorithms or tools may be directed to recognizing and marking transitions (e.g., the edges, contours, outlines, colors, textures, silhouettes, shapes or other characteristics of objects or portions thereof)--,  in lines 3-64, col. 11; also see Ogale: e.g., -- The soft-max layer 690 is trained to generate, for each object, a probability distribution of object classifications from zero to one. For example, the soft-max layer 690 might determine that a particular object is a pedestrian with 90% confidence and a street sign with 10% confidence. Those confidences can be provided to a planning subsystem of a vehicle for use by the vehicle in making autonomous driving decisions.--, in [0101]-[0102], and --To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and pointing device, e.g., a mouse, trackball, or a presence sensitive display or other surface by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user's device in response to requests received from the web browser. Also, a computer can interact with a user by sending text messages or other forms of message to a personal device, e.g., a smartphone, running a messaging application, and receiving responsive messages from the user in return.--, in [0111]),
		Ferstl as modified by Ogale do not explicitly disclose an loU distribution visualization, an Area Under Curve (AUC) visualization,
		MISRA teaches an loU distribution visualization, an Area Under Curve (AUC) visualization {in object detection for autonomous driving} (see MISRA: e.g., -- detection model for the four classes at the IoU thresholds of f0:5; 0:70; 0:90 g. A large area under the PR curve (AUC) represents the joint condition of high recall (i.e., low false positive rate) and high precision (i.e., low false negative rate), and therefore, a high AUC implies that the classifier is not only returning accurate results but also a majority of these are positive estimates. In this respect, the <no_parking> class has the best detection record, and it was followed (in order) by the <parked_car>, <parked_auto_rickshaw>, and <parked_motor_bike> categories. This observation remains consistent across all IoUs, which establishes the reliability of the detection model.
[0111] FIGS. 9A and 9B illustrate precision recall curves for four object classes at IoU of thresholds of {0.5, 0.70, 0.90} in accordance with an example embodiment. Referring to FIG. 9A performance of the disclosed parking violation detection model for the four classes at the IoU thresholds of {0.5, 0.70, 0.90} is illustrated. A large area under the PR curve (AUC) represents the joint condition of high recall (i.e., low false positive rate) and high precision (i.e., low false negative rate), and therefore, a high AUC implies that the classifier is not only returning accurate results but also a majority of these are positive estimates.--, in [0110]-[0120]);
		  Ferstl (as modified by Ogale) and MISRA are combinable as they are in the same field of endeavor: to detect and classify objects from image processing techniques, particularly both using the convolutional neural network as the detection and classification tool for autonomous driving. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Ferstl as modified by Ogale’s system using MISRA’s teachings by including an loU distribution visualization, an Area Under Curve (AUC) visualization to Ferstl’s user interface in order to establish the reliability of the detection model (see MISRA: e.g. in in [0110]-[0120]).

		Re Claim 5, Ferstl as modified by Ogale and MISRA further disclose wherein the first one or more user inputs include a size range input to the size distribution visualization and an IoU value range input to the IoU distribution visualization, and wherein the AUC visualization is based on the size range input and the lol! value range input (see Ferstl: e.g., -- Moreover, textures or features of objects expressed in a digital image may be identified using one or more computer-based methods, such as by identifying changes in intensities within regions or sectors of the image, or by defining areas of an image corresponding to specific surfaces. (36) Furthermore, edges, contours, outlines, colors, textures, silhouettes, shapes or other characteristics of objects of any types, or portions of objects of such types, expressed in still or moving digital images may be identified using one or more algorithms or machine-learning tools. The objects or portions of objects may be stationary or in motion, and may be identified at single, finite periods of time, or over one or more periods or durations. Such algorithms or tools may be directed to recognizing and marking transitions (e.g., the edges, contours, outlines, colors, textures, silhouettes, shapes or other characteristics of objects or portions thereof)--,  in lines 3-64, col. 11; also see Ogale: e.g., -- The soft-max layer 690 is trained to generate, for each object, a probability distribution of object classifications from zero to one. For example, the soft-max layer 690 might determine that a particular object is a pedestrian with 90% confidence and a street sign with 10% confidence. Those confidences can be provided to a planning subsystem of a vehicle for use by the vehicle in making autonomous driving decisions.--, in [0101]-[0102], and --To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and pointing device, e.g., a mouse, trackball, or a presence sensitive display or other surface by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user's device in response to requests received from the web browser. Also, a computer can interact with a user by sending text messages or other forms of message to a personal device, e.g., a smartphone, running a messaging application, and receiving responsive messages from the user in return.--, in [0111]; further see MISRA: e.g., -- detection model for the four classes at the IoU thresholds of f0:5; 0:70; 0:90 g. A large area under the PR curve (AUC) represents the joint condition of high recall (i.e., low false positive rate) and high precision (i.e., low false negative rate), and therefore, a high AUC implies that the classifier is not only returning accurate results but also a majority of these are positive estimates. In this respect, the <no_parking> class has the best detection record, and it was followed (in order) by the <parked_car>, <parked_auto_rickshaw>, and <parked_motor_bike> categories. This observation remains consistent across all IoUs, which establishes the reliability of the detection model. [0111] FIGS. 9A and 9B illustrate precision recall curves for four object classes at IoU of thresholds of {0.5, 0.70, 0.90} in accordance with an example embodiment. Referring to FIG. 9A performance of the disclosed parking violation detection model for the four classes at the IoU thresholds of {0.5, 0.70, 0.90} is illustrated. A large area under the PR curve (AUC) represents the joint condition of high recall (i.e., low false positive rate) and high precision (i.e., low false negative rate), and therefore, a high AUC implies that the classifier is not only returning accurate results but also a majority of these are positive estimates.--, in [0110]-[0120]).

		Re Claim 6, Ferstl as modified by Ogale and MISRA further disclose wherein one individual object of the one or more individual objects is one example of the one or more potential weaknesses of the first object detection model (see Ogale: e.g., -- The training neural network subsystem 114 can generate, for each training example 123, one or more object predictions 135, where each object prediction comprises an object detection and properties for each detected object. A training engine 116 analyzes the object predictions 135 and compares the object predictions to the labels in the training examples 123. If the two differ, an error is indicated. The training engine 116 then generates updated model parameter values 145 by using an appropriate updating technique. For example, the model parameters might be updated by calculating the gradient of the error with respect to an individual model parameter. To decrease the error contribution, a value derived from the gradient can be subtracted from or added to the current value of the parameter. This is known as stochastic gradient descent with backpropagation.--, in [0040]; also see Ferstl: e.g., -- outputs are received from the trained classifier, and at box 880, locations of the objects of interest within the second imaging data are identified based on the outputs. The outputs may specify not only portions of a given image frame of the second imaging data that depict one or more of the objects of interest but also include a confidence level or interval (e.g., a percentage or number of standard deviations from the mean, or a margin of error above or below the mean) associated with a probability or likelihood that such portions actually depict one or more of the objects of interest. (104) At box 885, the second imaging data is annotated with the locations of the objects of interest identified based on the outputs.--, in lines 37-53, col. 30).

		Re Claim 7, Ferstl as modified by Ogale and MISRA further disclose the object information includes detection results, scores, and Intersection over Union (IoU) values (see MISRA: e.g., -- detection model for the four classes at the IoU thresholds of f0:5; 0:70; 0:90 g. A large area under the PR curve (AUC) represents the joint condition of high recall (i.e., low false positive rate) and high precision (i.e., low false negative rate), and therefore, a high AUC implies that the classifier is not only returning accurate results but also a majority of these are positive estimates. In this respect, the <no_parking> class has the best detection record, and it was followed (in order) by the <parked_car>, <parked_auto_rickshaw>, and <parked_motor_bike> categories. This observation remains consistent across all IoUs, which establishes the reliability of the detection model. [0111] FIGS. 9A and 9B illustrate precision recall curves for four object classes at IoU of thresholds of {0.5, 0.70, 0.90} in accordance with an example embodiment. Referring to FIG. 9A performance of the disclosed parking violation detection model for the four classes at the IoU thresholds of {0.5, 0.70, 0.90} is illustrated. A large area under the PR curve (AUC) represents the joint condition of high recall (i.e., low false positive rate) and high precision (i.e., low false negative rate), and therefore, a high AUC implies that the classifier is not only returning accurate results but also a majority of these are positive estimates.--, in [0110]-[0120]).
		
		Re Claims 11-14, claim 11-14 are the corresponding method claim to claims 4-7 respectively.  Claims 11-14 thus are rejected for the similar reasons for claims 4-7. See above discussions with regard to claims 4-7 respectively. Further, Ferstl as modified by Ogale and MISRA further disclose non-transitory computer readable medium comprising instructions that, when executed by one or more processors, cause the one or more processors to perform operations (see Ogale: e.g., -- The training neural network subsystem 114 can generate, for each training example 123, one or more object predictions 135, where each object prediction comprises an object detection and properties for each detected object. A training engine 116 analyzes the object predictions 135 and compares the object predictions to the labels in the training examples 123. If the two differ, an error is indicated. The training engine 116 then generates updated model parameter values 145 by using an appropriate updating technique. For example, the model parameters might be updated by calculating the gradient of the error with respect to an individual model parameter. To decrease the error contribution, a value derived from the gradient can be subtracted from or added to the current value of the parameter. This is known as stochastic gradient descent with backpropagation.--, in [0040]; also see Ferstl: e.g., -- outputs are received from the trained classifier, and at box 880, locations of the objects of interest within the second imaging data are identified based on the outputs. The outputs may specify not only portions of a given image frame of the second imaging data that depict one or more of the objects of interest but also include a confidence level or interval (e.g., a percentage or number of standard deviations from the mean, or a margin of error above or below the mean) associated with a probability or likelihood that such portions actually depict one or more of the objects of interest. (104) At box 885, the second imaging data is annotated with the locations of the objects of interest identified based on the outputs.--, in lines 37-53, col. 30).
		
		Re Claims 17-20, claims 17-20 are the corresponding medium claim to claims 4-7 respectively.  Claims 17-20 thus are rejected for the similar reasons for claims 1-2. See above discussions with regard to claims 4-7 respectively. Further, Ferstl as modified by Ogale and MISRA further disclose a non-transitory computer-readable medium comprising instructions that, when executed by an electronic processor, causes the electronic processor to perform a set of operations (see Ferstl: e.g., Fig. 1, and, --The computers, servers, devices and the like described herein have the necessary electronics, software, memory, storage, databases, firmware, logic/state machines, microprocessors, communication links, displays or other visual or audio user interfaces, --, in lines 45-58, col. 20).





Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WEIWEN YANG whose telephone number is (571)270-5670.  The examiner can normally be reached on Monday-Friday 8:30am-4:30pm east.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on 571-272-7778.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 
If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/WEI WEN YANG/Primary Examiner, Art Unit 2667