DETAILED ACTION
Response to Amendment
The amendment was received 12/21/20. Claims 1-20 are pending.
Claim Objections
Claims 16-18 are objected to because of the following informalities:  
Regarding claim 16, line 19’s “the initial provisional model”.
Regarding claim 17, claim 17 is objected the same as claim 16 for not being consistent three times with claim 10, line 21’s “the provisional model”.
Regarding claim 18, claim 18 is objected the same as claim 16 for not being consistent twice with claim 10, line 21’s “the provisional model”.

Appropriate correction is required.










Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. Accordingly, 35 USC 112(f) is NOT invoked. Accordingly the broadest reasonable interpretation is in view of MPEP 2111.01 III "PLAIN MEANING" REFERS TO THE ORDINARY AND CUSTOMARY MEANING GIVEN TO THE TERM BY THOSE OF ORDINARY SKILL IN THE ART, 3rd paragraph:
It is also appropriate to look to how the claim term is used in the prior art, which includes prior art patents, published applications, trade publications, and dictionaries. Any meaning of a claim term taken from the prior art must be consistent with the use of the claim term in the specification and drawings. Moreover , when the specification is clear about the scope and content of a claim term, there is no need to turn to extrinsic evidence for claim interpretation. 3M Innovative Props. Co. v. Tredegar Corp., 725 F.3d 1315, 1326-28, 107 USPQ2d 1717, 1726-27 (Fed. Cir. 2013) (holding that "continuous microtextured skin layer over substantially the entire laminate" was clearly defined in the written description, and therefore, there was no need to turn to extrinsic evidence to construe the claim).

Accordingly:
The claimed “differentiating” (as in “a provisional model capable of at least provisionally differentiating the specific single person from the rest of the different persons” in claim 10, lines 16-18) is interpreted in light of applicant’s disclosure (US 2020/0042781 A1) such as, emphasis added:
“[0365] FIG. 13D illustrates one embodiment of iteratively generating and using models 4-model-21, 4-model-22, 4-model-23 in an attempt to detect multiple appearances of a certain person 1-ped-5 out of a large plurality of representations 1-ped-5-des-p3, 1-ped-8-des-q4, 1-ped-5-des-q5, 1-ped-5-des-r1, 1-ped-6-des-r2, 1-ped-9-des-s6, 1-ped-7-des-s7 associated with various different persons 1-ped-5, 1-ped-6, 1-ped-7, 1-ped-8, 1-ped-9.  At first, one of the representation 1-ped-5-des-p3, which represents an appearance of person 1-ped-5 as captured in imagery data 4-visual-p3 by vehicle 10p in conjunction with geo-temporal tag 10-loc-7a-T31, is either selected at random, or is selected as a result or an occurrence of interest associated with geo-temporal tag 10-loc-7a-T31, in which such selection can be made in a server 95-server after receiving at least some of the representations from the vehicles, or it can be made by one of the vehicles.  The selected representation 1-ped-5-des-p3 represents pedestrian 1-ped-5 that is currently "unknown" to server 95-server in the sense that pedestrian 1-ped-5 cannot be distinguished by the server, or by the vehicles (collectively by the system), from the other pedestrians 1-ped-6, 1-ped-7, 1-ped-8, 1-ped-9 for which representations 1-ped-8-des-q4, 1-ped-5-des-q5, 1-ped-5-des-r1, 1-ped-6-des-r2, 1-ped-9-des-s6, 1-ped-7-des-s7 exist.  Server 95-server, which may be located off-board the vehicles, is now tasked with finding other occurrences (i.e., appearances) of pedestrian 1-ped-5, perhaps in order to track such pedestrian, or perhaps in order to arrive at any kind of conclusion or discovery regarding such pedestrian 1-ped-5 or associated activities.  Server 95-server therefore takes and uses representation 1-ped-5-des-p3 to generate an initial model 4-model-21 operative to detect further appearances of pedestrian 1-ped-5.  For example, representation 1-ped-5-des-p3 may comprise a simple machine-based description of pedestrian 1-ped-5, such as a description of clothes worn by pedestrian 1-ped-5--e.g., representation 1-ped-5-des-p3 may simply state that pedestrian 1-ped-5 wore (at time T31 as indicated by geo-temporal tag 10-loc-7a-T31 associated with representation 1-ped-5-des-p3) a green shirt and blue jeans--and therefore initial model 4-model-21 may be a trivial model that is operative to simply detect any pedestrian wearing a green shirt and blue jeans.  In one embodiment, model 4-model-21 has to be trivial, as it was generated from a single appearance/representation 1-ped-5-des-p3 of pedestrian 1-ped-5, and a single appearance of any pedestrian is not sufficient to generate a more complex model.  In another embodiment, model 4-model-21 has to be trivial, as it was generated from a single or few appearances/representations of pedestrian 1-ped-5, which were derived from distant images taken by a single or few vehicles--e.g., when imagery data 4-visual-p3 was captured by vehicle 10p while being relatively distant from pedestrian 1-ped-5, and therefore only very basic features of such pedestrian were captured.  Since model 4-model-21 is trivial, it would be impossible to use it for detecting additional appearances of pedestrian 1-ped-5 over a large geo-temporal range, as there may be hundreds of different pedestrians wearing a green shirt and blue jeans when considering an entire city or an entire country.  However, 4-model-21 is good enough to be used for successfully finding other appearances of pedestrian 1-ped-5 if the search span could be restricted to a limited geo-temporal span.  For example, model 4-model-21 is based on representation 1-ped-5-des-p3 having a geo-temporal tag 10-loc-7a-T31, meaning that pedestrian 1-ped-5 was spotted near location 10-loc-7a at time T31, so that the geo-temporal span of searching for other appearances of pedestrian 1-ped-5 using model 4-model-21 could be restricted to those of the representations that are within a certain range of 10-loc-7a and within a certain time-differential of T31, in which such certain range could be perhaps 100 meters, and such certain time-differential could be perhaps 60 seconds, meaning that the search will be performed only in conjunction with representations that were derived from imagery data that was captured within 100 meters of location 10-loc-7a and within 60 seconds of time T1.  When such a restricted search is applied, chances are that even a trivial model such as 4-model-21 can successfully distinguish between pedestrian 1-ped-5 and a relatively small number of other pedestrians 1-ped-6 found within said restricted geo-temporal span.  For example, representation 1-ped-6-des-r2 of pedestrian 1-ped-6, as derived from imagery data captured by vehicle 10r and having geo-temporal tag 10-loc-7b-T32, is associated with location 10-loc-7b (FIG. 13A) that is perhaps 70 meters away from 10-loc-7a, and with time T32 that is perhaps 40 seconds after T31, and therefore 1-ped-6-des-r2 falls within the geo-temporal span of the search, and consequently model 4-model-21 is used to decide whether 1-ped-6-des-r2 is associated with pedestrian 1-ped-5 or not. Since pedestrian 1-ped-6 did not wear a green shirt and blue jeans at time T32, server 95-server concludes, using 4-model-21, that 1-ped-6-des-r2 is not associated with pedestrian 1-ped-5.  Representation 1-ped-5-des-r1 of pedestrian 1-ped-5, as derived from imagery data captured by vehicle 10r and having geo-temporal tag 10-loc-7c-T32, is associated with location 10-loc-7c (FIG. 13A) that is perhaps 90 meters away from 10-loc-7a, and with time T32 that is 40 seconds after T31, and therefore 1-ped-5-des-r1 also falls within the geo-temporal span of the search, and consequently model 4-model-21 is used to decide whether 1-ped-5-des-r1 is associated with pedestrian 1-ped-5 or not. Since pedestrian 1-ped-5 didn't change his clothing during the 40 second period between T31 and T32, then a green shirt and blue jeans are detected by server 95-server using 4-model-21, and it is therefore concluded that representation 1-ped-5-des-r1 is associated with pedestrian 1-ped-5.  Now, that model 4-model-21 was successfully used to detect another representation 1-ped-5-des-r1 of pedestrian 1-ped-5, a better model 4-model-22 can be generated to better detect yet other appearances of pedestrian 1-ped-5, but this time over a larger geo-temporal range.  For example, server 95-server can now combine the two representations 1-ped-5-des-p3, 1-ped-5-des-r1, or alternatively to combine the model 4-model-21 and the newly found representation 1-ped-5-des-r1, to generate the better model 4-model-22.  In one embodiment, model 4-model-22 is more complex than model 4-model-21, and can optionally be generated using machine learning (ML) techniques that uses the 
two appearances 1-ped-5-des-p3, 1-ped-5-des-r1 to train a model until producing model 4-model-22, which is now specifically trained to detect pedestrian 1-ped-5.  Server 95-server now uses the better model 4-model-22 to search again for additional appearances of pedestrian 1-ped-5, but this time over a much larger geo-temporal span, which in one embodiment may contain all of the geo-temporal tags 10-loc-7a-T31, 10-loc-9d-T35, 10-loc-9c-T35, 10-loc-7c-T32, 10-loc-7b-T32, 10-loc-9b-T34, 10-loc-9a-T34, and therefore include the representations 1-ped-8-des-q4, 1-ped-5-des-q5, 1-ped-9-des-s6, 1-ped-7-des-s7 that were not considered in the previous search and that represent various new pedestrians 1-ped-7, 1-ped-8, 1-ped-9 as well as pedestrian 1-ped-5.  Model 4-model-22 is good enough to successfully filter away representations 
1-ped-8-des-q4, 1-ped-9-des-s6, 1-ped-7-des-s7 belonging to pedestrians 1-ped-7, 1-ped-8, 1-ped-9, and to detect that only representation 1-ped-5-des-q5 is associated with pedestrian 1-ped-5.  With the new representation 1-ped-5-des-q5 just detected, the server 95-server can now construct a geo-temporal path 10-loc-7a-T31, 10-loc-7c-T32, 10-loc-9c-T35 via which pedestrian 1-ped-5 has walked.  Again, server 95-server can now generate an even more sophisticated and accurate model 4-model-23 of pedestrian 1-ped-5 using the three representations 1-ped-5-des-p3, 1-ped-5-des-r1, 1-ped-5-des-q5, or alternatively using the model 4-model-22 and the newly detected 
representation ped-5-des-q5.  It is noted that the representations 1-ped-5-des-p3, 1-ped-8-des-q4, 1-ped-5-des-q5, 1-ped-5-des-r1, 1-ped-6-des-r2, 1-ped-9-des-s6, 1-ped-7-des-s7, as shown in FIG. 13D, seem to be located outside server 95-server, but in one embodiment, at least some of the representations may be stored internally in server 95-server, after being received in the server from the vehicles.  In another embodiment, at least some of the representations 1-ped-5-des-p3, 1-ped-8-des-q4, 1-ped-5-des-q5, 
1-ped-5-des-r1, 1-ped-6-des-r2, 1-ped-9-des-s6, 1-ped-7-des-s7 are stored locally in the respective vehicles, and are accessed by the server as may be needed by the server.  It is noted that the detection of representations using the respective models 4-model-21, 4-model-22, as shown in FIG. 13D, seems to be occurring inside server 95-server, but in one embodiment, the detection of at least some of the representations may be done onboard the vehicles storing the representations, after receiving the respective models 4-model-21, 4-model-22 from the server.  In one embodiment, the detection of representations using the respective models 4-model-21, 4-model-22 is done inside server 95-server.”

wherein “distinguished” or “distinguish” is defined:














distinguish verb (used with object)
1	to mark off as different (often followed by from or by):
He was distinguished from the other boys by his height.
2	to recognize as distinct or different; recognize the salient or individual features or characteristics of:
It is hard to distinguish her from her twin sister.
3	to perceive clearly by sight or other sense; discern; recognize:
He could not distinguish many of the words.
4	to set apart as different; be a distinctive characteristic of; characterize:
It is his Italian accent that distinguishes him.
5	to make prominent, conspicuous, or eminent:
to distinguish oneself in battle.
to divide into classes; classify:
Let us distinguish the various types of metaphor.
7	Archaic. to single out for or honor with special attention.
verb (used without object)
8	to indicate or show a difference (usually followed by between).
9	to recognize or note differences; discriminate.

wherein the claimed “differentiating” is defined:
differentiate verb (used with object), dif·fer·en·ti·at·ed, dif·fer·en·ti·at·ing.
1	to form or mark differently from other such things; distinguish.
2	to change; alter.
3	to perceive the difference in or between.
4	to make different by modification, as a biological species.
5	Mathematics. to obtain the differential or the derivative of.

wherein “object” is defined:
7	Grammar. (in many languages, as English) a noun, noun phrase, or noun substitute representing by its syntactical position either the goal of the action of a verb or the goal of a preposition in a prepositional phrase, as ball in John hit the ball, Venice in He came to Venice, coin and her in He gave her a coin. Compare direct object, indirect object.






Thus, the claimed “differentiating” has the meaning “distinguish (verb (used with object)” that is “taken” (said via MPEP 2111.01 III, 3rd paragraph) under the broadest reasonable interpretation. Thus, any implicit selection such a preferring or choosing in favor one over the other as implied by “discriminate” falls outside the broadest reasonable interpretation because the selection implied by “discriminate” is directed to a “verb (used without object)” that is not within the broadest reasonable interpretation that is directed to meanings comprising “verb (used with object)”: the claimed “differentiating the specific single person” wherein “differentiating” is the verb of “verb (used with object)” and “person” is the “object” of “verb (used with object)”. In contrast claims 1 and 19 respectively have “select one” or “selecting one” in “verb (used with object)” that occurs regardless of said “successfully distinguish between pedestrian”.












To switch from meanings of “distinguish” comprising “verb (used with object)”-form, corresponding to the disclosed “pedestrian…distinguished” or “distinguish between pedestrian”: both disclosed in a “verb (used with object)”-format, to “discriminate”, a “verb (used without object)”, appears as improper use of “the prior art” “dictionaries”, resulting in improper Claim Interpretation of the claimed “differentiating the specific single person” to force or improperly import from applicant’s disclosure a selection, represented in fig. 6F: “1-event-4” or fig. 13D: “1-ped-5-des-p3”, resulting in “cannot be distinguished” (thus is selection is performed without being distinguished), via said [0365]:
“At first, one of the representation 1-ped-5-des-p3, which represents an appearance of person 1-ped-5 as captured in imagery data 4-visual-p3 by vehicle 10p in conjunction with geo-temporal tag 10-loc-7a-T31, is either selected at random, or is 
selected as a result or an occurrence of interest associated with geo-temporal tag 10-loc-7a-T31, in which such selection can be made in a server 95-server after receiving at least some of the representations from the vehicles, or it can be made by one of the vehicles. The selected representation 1-ped-5-des-p3 represents pedestrian 1-ped-5 that is currently "unknown" to server 95-server in the sense that pedestrian 1-ped-5 cannot be distinguished by the server, or by the vehicles (collectively by the system), from the other pedestrians 1-ped-6, 1-ped-7, 1-ped-8, 1-ped-9 for which representations 1-ped-8-des-q4, 1-ped-5-des-q5, 1-ped-5-des-r1, 1-ped-6-des-r2, 1-ped-9-des-s6, 1-ped-7-des-s7 exist.”

in the claimed “differentiating the specific single person” that is in “verb (used with object)”-form, which is in contrast to “verb (used without object)”-form that comprises a selection or preference or choice in favor of one over the other via said “discriminate”.





The claimed “geo-temporal tags” (as in “each of the representations is associated with a geo-temporal tag, in which each of the geo-temporal tags is a record of both a location and a time at which the respective imagery data was captured” in claim 20, lines 2-4) is interpreted in light of applicant’s disclosure and MPEP 2111 III.     "PLAIN MEANING" REFERS TO THE ORDINARY AND CUSTOMARY MEANING GIVEN TO THE TERM BY THOSE OF ORDINARY SKILL IN THE ART: 1st and last paragraphs, emphasis added:
"[T]he ordinary and customary meaning of a claim term is the meaning that the term would have to a person of ordinary skill in the art in question at the time of the invention, i.e., as of the effective filing date of the patent application." Phillips v. AWH Corp.,415 F.3d 1303, 1313, 75 USPQ2d 1321, 1326 (Fed. Cir. 2005) (en banc); Sunrace Roots Enter. Co. v. SRAM Corp., 336 F.3d 1298, 1302, 67 USPQ2d 1438, 1441 (Fed. Cir. 2003); Brookhill-Wilk 1, LLC v. Intuitive Surgical, Inc., 334 F.3d 1294, 1298 67 USPQ2d 1132, 1136 (Fed. Cir. 2003) ("In the absence of an express intent to impart a novel meaning to the claim terms, the words are presumed to take on the ordinary and customary meanings attributed to them by those of ordinary skill in the art.").

It is also appropriate to look to how the claim term is used in the prior art, which includes prior art patents, published applications, trade publications, and dictionaries. Any meaning of a claim term taken from the prior art must be consistent with the use of the claim term in the specification and drawings. Moreover, when the specification is clear about the scope and content of a claim term, there is no need to turn to extrinsic evidence for claim interpretation. 3M Innovative Props. Co. v. Tredegar Corp., 725 F.3d 1315, 1326-28, 107 USPQ2d 1717, 1726-27 (Fed. Cir. 2013) (holding that "continuous microtextured skin layer over substantially the entire laminate" was clearly defined in the written description, and therefore, there was no need to turn to extrinsic evidence to construe the claim).



 


Claim Review - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-9 are NOT rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 1, line 28’s term of degree phrase “model is at least sub-optimal” is provided with a “standard” or “an approved model” or “an even more improved specific model 4-model-3 (FIG. 6G)” that is received “with approval” via applicant’s disclosure, emphasis added:
“[0275]   In one embodiment, the system is further configured to: identify, using the improved specific model 4-model-2, yet additional appearances, or representations thereof 4-visual-kS, of said one of the persons 1-ped-4 in the corpus of visual data 4-visual collectively captured by the plurality of on-road vehicles 10i, 10j, 10k; and improve further said initial model using the yet additional appearances identified 4-visual-kS, thereby resulting in an even more improved specific model 4-model-3 (FIG. 6G) operative to even better detect and identify said one person 1-ped-4 specifically.”

wherein the meaning of “improved” is “taken” via MPEP 2111.01 III., 3rd paragraph and via definition 5:
improve
verb (used without object), im·proved, im·prov·ing.
5	to increase in value, excellence, etc.; become better:
The military situation is improving.






wherein “better” is defined via definition 3:
better
adjective, comparative of good, with best as superlative.
3	of superior suitability, advisability, desirability, acceptableness, etc.; preferable:
a better time for action.

wherein “acceptableness” is defined:
acceptable
adjective
1	capable or worthy of being accepted.

wherein “accepted” is defined:
accept
verb (used with object)
to take or receive (something offered); receive with approval or favor:
to accept a present; to accept a proposal.

; and




























MPEP 2173.05(b)    Relative Terminology [R-08.2017], emphasis added:

The use of relative terminology in claim language, including terms of degree, does not automatically render the claim indefinite under 35 U.S.C. 112(b)  or pre-AIA  35 U.S.C. 112, second paragraph. Seattle Box Co., Inc. v. Industrial Crating & Packing, Inc., 731 F.2d 818, 221 USPQ 568 (Fed. Cir. 1984). Acceptability of the claim language depends on whether one of ordinary skill in the art would understand what is claimed, in light of the specification.

I.    TERMS OF DEGREE
Terms of degree are not necessarily indefinite. "Claim language employing terms of degree has long been found definite where it provided enough certainty to one of skill in the art when read in the context of the invention." Interval Licensing LLC v. AOL, Inc., 766 F.3d 1364, 1370, 112 USPQ2d 1188, 1192-93 (Fed. Cir. 2014) (citing Eibel Process Co. v. Minnesota & Ontario Paper Co., 261 U.S. 45, 65-66 (1923) (finding ‘substantial pitch’ sufficiently definite because one skilled in the art ‘had no difficulty … in determining what was the substantial pitch needed’ to practice the invention)). Thus, when a term of degree is used in the claim, the examiner should determine whether the specification provides some standard for measuring that degree. Hearing Components, Inc. v. Shure Inc., 600 F.3d 1357, 1367, 94 USPQ2d 1385, 1391 (Fed. Cir. 2010); Enzo Biochem, Inc., v. Applera Corp., 599 F.3d 1325, 1332, 94 USPQ2d 1321, 1326 (Fed. Cir. 2010); Seattle Box Co., Inc. v. Indus. Crating & Packing, Inc., 731 F.2d 818, 826, 221 USPQ 568, 574 (Fed. Cir. 1984). If the specification does not provide some standard for measuring that degree, a determination must be made as to whether one of ordinary skill in the art could nevertheless ascertain the scope of the claim (e.g., a standard that is recognized in the art for measuring the meaning of the term of degree). For example, in Ex parte Oetiker, 23 USPQ2d 1641 (Bd. Pat. App. & Inter. 1992), the phrases "relatively shallow," "of the order of," "the order of about 5mm," and "substantial portion" were held to be indefinite because the specification lacked some standard for measuring the degrees intended.

wherein “standard” is defined:
standard
noun
1	something considered by an authority or by general consent as a basis of comparison; an approved model.

Thus the “an even more improved specific model 4-model-3 (FIG. 6G)” that is received “with approval” is the “standard” or “an approved model” regarding “sub-optimal”. Thus the claimed “sub-optimal” is “a term of degree…used in the claim” and has “been found definite where it provided enough certainty to one of skill in the art when read in the context of the invention”.
Response to Arguments
Applicant’s arguments, see remarks, page 11:
“Claim 10 has been amended so as to now include the limitations: 

- "...a provisional model capable of at least provisionally differentiating the specific single person from the rest of the different persons... 
and [AltContent: rect]

- "...improve said provisional model ... to ... better differentiating the specific single person from the rest of the different persons". 

It is noted that provisionally differentiating the specific single person and then better differentiating the specific single person (from the rest of the different persons) is a description of a solution to the [0002] "...real challenge when trying to associate together multiple images or other representations of one specific person" as described by the Applicant, in which Farabet is silent regarding the challenge, as correctly noted by the Examiner, and therefore the described solution is novel in view of Farabet.” 

, filed 12/21/20, with respect to the rejection(s) of claim(s) 19 and 10-12,17,18 or 10-12,17,19 and 17-19 in the Office action of 10/1/20, starting page 4, under 35 USC 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of 35 USC 103 in view of:








A.	Vishnukumar et al. (Machine Learning and Deep Neural Network – Artificial Intelligence Core for Lab and Real-World Test and Validation for ADAS and Autonomous Vehicles) that teaches fig. 3 corresponding to “control systems capable…to differentiate different…pedestrians” via pages 714,715:
“Autonomous vehicles or driverless cars specially can detect surrounding environment using a variety of techniques such as LiDAR, RADAR, odometry, GPS, and last but not the least -computer vision. According to SAE’s automated vehicle
classification [2], starting from level zero in which the vehicle has no control over the automobile (but it may provide warning to the driver), and all the way up to level five, which is complete autonomous driving which means that other than starting the autonomous system and setting up the required destination and related settings for navigation, and no human intervention or human input is required. The autonomous
vehicle or driverless vehicle can drive to any location where it is legal and possible to drive. Advanced and sophisticated control systems, software and algorithms interpret all the sensory data and information to identify and detect appropriate and right navigation paths, as well as obstacles, other subsystems and relevant signage information [3]. Autonomous or driver less vehicles have sophisticated control systems capable of taking in sensor data and analyzing the data to differentiate different objects in the surrounding environment and recognize vehicles, pedestrians and other obstacles in the surrounding environment which will be very helpful for later path planning to desired destination [4].”; and

B.	Li et al. (A Unified Framework for Concurrent Pedestrian and Cyclist Detection) that explicitly teaches “differentiating” from each other between a different walking person and a different person on a bike as indicated in fig. 2 via page 269, right column, 2nd full paragraph:
“It's noted that traditional pedestrian or cyclist detection methods always consider pedestrians and cyclists separately [3], [4], although pedestrians and cyclists often appear in one picture. This often leads to scanning the input image several times and causing confused detection results, such as classifying cyclists as pedestrians, and vice versa, due to their similar appearance. In general, cyclists move faster than pedestrians, different attentions with pedestrians should be paid from ADAS or autonomous vehicles. Therefore, detecting pedestrians and cyclists concurrently and differentiating them clearly are urgently needed for the adaptive decision of ADAS and autonomous vehicles.”


Applicant's arguments, page 12:
“Claim 12 has been amended so as to now include the limitations: 

‘...the certain geographical area comprises at least a city; 
said different persons comprise at least one million different persons; and therefore: 
said training and or re-training of the provisional model into the improved model constitutes an iterative model building approach operative to facilitate said differentiation of the specific single person from the rest of the different persons when confronted with said geographical area comprising at least a city and said different persons comprising at least one million different persons ’. 

It is noted that iterative model building approach operative to facilitate said differentiation of the specific single person from the rest of the different persons is a further description of a solution to the [0002] "...real challenge when trying to associate together multiple images or other representations of one specific person" as described by the Applicant, in which Farabet is silent regarding the challenge, as correctly noted by the Examiner, and therefore the described solution is novel in view of Farabet.” 

, filed 12/21/20, with respect to the rejection(s) of claim(s) 19 and 10-12,17,18 or 10-12,17,19 and 17-19 in the Office action of 10/1/20, starting page 4 under 35 USC 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of 35 USC 103 as discussed above regarding claim 10 and further in view of:
intervening reference ZOU et al. (US Patent App. Pub. No.: US 2020/0065563 A1) that teaches a face database of three million images; and .
Guo et al. (MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition) that also teaches a database of a million faces.







Applicant’s arguments, see remarks, pages 12,13:
“Claim 19 has been amended so as to now include the limitations: 

- ‘...said model is capable of differentiating the specific single person from the rest of the different persons’ 
and 
- ‘...improving said capability of differentiating the specific single person from the rest of the different persons’ 

It is noted that differentiating the specific single person from the rest of the different persons and then improving said capability of differentiating is a description 
 of a solution to the [0002] "...real challenge when trying to associate together multiple images or other representations of one specific person" as described by the Applicant, in which Farabet is silent regarding the challenge, as correctly noted by the Examiner, and therefore the described solution is novel in view of Farabet. 

Therefore, the Applicant respectfully submits that independent claim 19 and depending claim 20 are in condition for allowance…

Claim 1 has been amended so as to now include the limitations: 

‘...said provisional model is at least sub-optimal for directly differentiating the specific single person from the rest of the many different persons, but is capable of accurately differentiating the specific single person from a sub-group of the many different persons’ 

It is noted that differentiating the specific single person from a sub-group of the many different persons is a description of a solution to the [0002] "...real challenge when trying to associate together multiple images or other representations of one specific person" as described by the Applicant, in which Farabet and Boghossian are silent regarding the challenge, as correctly noted by the Examiner, and therefore the described solution is novel in view of Farabet and Boghossian. 

Therefore, the Applicant respectfully submits that independent claim 1 and all depending claims are in condition for allowance.” 

, filed 12/21/20, with respect to the rejection of claims 19 and 20 and 1-3,5,6,8,9 or 1-3,5,6,8,9 and 20 in the Office action of 10/1/20 have been fully considered and are persuasive.  The 35 USC 103 rejection of claims 19 and 20 and 1-3,5,6,8,9 or 1-3,5,6,8,9 and 20 and claim 4 and claim 7 in the Office action of 10/1/20, has been withdrawn. Thus, the rejection of claims 10-18 is the only rejection that remains. 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 10,11,17 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Farabet et al. (US Patent App. Pub. No.: US 2019/0303759 A1) with reference to provisional application 62/648,399, filed on Mar. 27, 2018, in view of Vishnukumar et al. (Machine Learning and Deep Neural Network – Artificial Intelligence Core for Lab and Real-World Test and Validation for ADAS and Autonomous Vehicles) and Li et al. (A Unified Framework for Concurrent Pedestrian and Cyclist Detection): claim 12, including claim 3, is given the filing date of 3/19/19 due to the claimed “million different persons”. 


Regarding claim 10, Farabet teaches a system operative to track persons by utilizing models generated using imagery data captured by a plurality of on-road vehicles, comprising: 
a plurality of on-road vehicles (fig. 4:50(1)-(N)) moving in a certain geographical area, in which each of the on-road vehicles is configured to use an onboard imagery sensor (fig. 28:72: “Surround Camera(s)”) to capture imagery data (via fig. 3:6000) of areas surrounding locations visited by the on-road vehicle, thereby resulting in a corpus of imagery data collectively captured by the plurality of on-road vehicles (said fig. 4:50(1)-(N)), in which various different persons (as indicated in fig. 12:8001: “(AI)”), such as different pedestrians and different drivers, appear in the corpus of imagery data, and in which each of at least some of the different persons appear more than once (via “Long-view stereo cameras (501)”, cited below: [00170]) in the corpus of imagery data and in conjunction with more than one location and time (via said “basic object tracking”) of being captured; 
wherein the system is configured to: 







use at least one of the appearances (as indicated in fig. 12:8001: “(AI)”), and/or a representation thereof, of a specific one (said via “certain…pedestrians, cyclists” comprising “a single or specific person” or “exceptionally selective”) of the different persons (as indicated in fig. 12:8001: “(AI)”) in the corpus of imagery data collectively captured by the plurality of on-road vehicles (fig. 4:50(1)-(N)), to generate a provisional model (via fig. 5:8000: “Model Validation”) capable of at least provisionally differentiating the specific single person (said via “certain…pedestrians, cyclists” comprising “a single or specific person” or “exceptionally selective”) from the rest of the different persons (said as indicated in fig. 12:8001: “(AI)”) and thereby to provisionally detect (via “cameras may be used to perform…pedestrian detection”, cited below:[00167]) and track (via said “basic object tracking”) said specific single person (as indicated in fig. 12:8001: “(AI)” comprising said “certain…pedestrians, cyclists” comprising “a single or specific person” or “exceptionally selective”) 








identify (via fig. 3:6040: “Stop Sign”), using the provisional model (said fig. 5:8000: “Model Validation”), additional appearances (in a “highly iterative”, cited below:[0074], way), and/or representations thereof, of said specific single person (as indicated in fig. 12:8001: “(AI)” comprising said “certain…pedestrians, cyclists” comprising “a single or specific person” or “exceptionally selective”) in the corpus of visual data collectively captured by the plurality of on-road vehicles (said fig. 4:50(1)-(N)), thereby tracking (via said “basic object tracking”) said specific single person (as indicated in fig. 12:8001: “(AI)” comprising said “certain…pedestrians, cyclists” comprising “a single or specific person” or “exceptionally selective”); and 












improve (via fig. 5: back-arrow, pointing to fig. 5:5000) said provisional model (said fig. 5:8000: “Model Validation”) using the additional appearances (said in a “highly iterative” way) identified (via fig. 3:6040: “Stop Sign”), thereby resulting in an improved specific single person (said as indicated in fig. 12:8001: “(AI)” comprising said “certain…pedestrians, cyclists” comprising “a single or specific person” or “exceptionally selective”) by better differentiating the specific single person (said as indicated in fig. 12:8001: “(AI)” comprising said “certain…pedestrians, cyclists” comprising “a single or specific person” or “exceptionally selective”) from the rest of the different persons (said as indicated in fig. 12:8001: “(AI)” comprising said “certain…pedestrians, cyclists” comprising “a single or specific person” or “exceptionally selective” via:
“[0057] The present invention includes a method suitable for training, updating, and deploying one or more neural networks used to recognize certain objects and features, including, without limitation, (1) LaneNet (for detecting lanes), (2) PoleNet (for detecting traffic poles), (3) WaitNet (for detecting wait conditions and intersections), (4) SignNet (for detecting traffic signs), (5) LightNet (for detecting traffic lights), (6) DriveNet (for detecting cars, pedestrians, cyclists and potentially other objects). In additional
embodiments, the present invention includes a method for training and updating one or more neural networks used to perform in-cabin driver and passenger monitoring, including, without limitation, neural networks used to monitor state of driver, including gaze tracking, head pose tracking, drowsiness detection, sleepiness, eye openness, emotion detection, heart rate monitor, liveliness of driver, and driver impairment.”

wherein “certain” is defined via Dictionary.com:





certain
adjective
1	free from doubt or reservation; confident; sure:
I am certain he will come.
2	destined; sure to happen (usually followed by an infinitive):
He is certain to be there.
3	inevitable; bound to come:
They realized then that war was certain.
4	established as true or sure; unquestionable; indisputable:
It is certain that he tried.
5	fixed; agreed upon; settled:
on a certain day; for a certain amount.
6	definite or particular, but not named or specified:
A certain person phoned. He had a certain charm.
7	that may be depended on; trustworthy; unfailing; reliable:
His aim was certain.
8	some though not much:
a certain reluctance.
9	Obsolete. steadfast.

wherein “particular” is defined:
particular
adjective
1	of or relating to a single or specific person, thing, group, class, occasion, etc., rather than to others or all; special rather than general:
one's particular interests in books.
2	immediately present or under consideration; in this specific instance or place:
Look at this particular clause in the contract.
3	distinguished or different from others or from the ordinary; noteworthy; marked; unusual:
She sang with particular warmth at last evening's concert.
4	exceptional or especial:
Take particular pains with this job.
5	being such in an exceptional degree:
a particular friend of mine.
6	dealing with or giving details, as an account or description, of a person; detailed; minute.
7	exceptionally selective, attentive, or exacting; fastidious; fussy:
to be particular about one's food;






“[0074] The process of training the DNNs is highly iterative, comprising a plurality of workflows. A first workflow, illustrated in Figure 6, is the most basic, and illustrates the first few months of a DL project. This workflow essentially focuses on collecting large amounts of initial data from vehicles (50(1)-(N)), providing that data to a dataset store (5003) (a service that handles immutable datasets for further processing), labeling that data (5004), and training initial models (5005). The frames selected for labeling may be randomly selected in this early workflow or selected at regular time intervals. In a preferred embodiment, this workflow typically labels 300,000 to 600,000 frames. In preferred embodiments, the models are tested and verified by simulation or re-simulation (8000). The models are preferably pruned, optimized, and deployed in vehicles (50(1)-(N)). Model refinement and pruning (5006) may be accomplished as described in U.S. Application No. 62/630,445, incorporated by reference. After labeling
hundreds of thousands of frames, the process leads diminishing returns. For example, if a DNN reaches 97% accuracy on a random data distribution, then 97% of all new data randomly collected is already well predicted, so labeling it adds no information to the existing training set.”;

“[00167] Figure 29 illustrates one example of camera types and locations, with 11 cameras (501)-(508).Front-facing cameras (501)-(505) help identify forward facing paths and obstacles, and provide information critical to making an occupancy grid and determining the preferred vehicle paths. Front facing cameras may be used to perform many of the same functions as LIDAR, including emergency braking, pedestrian detection, and collision avoidance. Front-facing cameras may also be used for ADAS functions and systems including Lane Departure Warnings ("LDW"), and Autonomous Cruise Control ("ACC"), and other functions such as traffic sign recognition.”;

“[00170] In preferred embodiments, a long-view stereo camera pair (501) can be used for depth-based object detection, especially for objects for which a neural network has not yet been trained. Long-view stereo cameras (501) may also be used for object detection and classification, as well as basic object tracking. In the embodiment shown in Figure 29, front-facing long-view cameras (501) has a 30 degree field of view. Stereo cameras for automotive applications may be obtained from Continental, LG, Bosch, DENSO, Hitachi and Fujitsu Ten. For example, a suitable stereo camera includes the Conti Multi-Function Stereo Camera MFS430orthe Bosch Stereo Video Camera, with two CMOS color imagers with a resolution of 1280 x 960 pixels. The Bosch Stereo Video Camera is designed to record a horizontal range of 50 degrees and offer a 3-D measurement range of more than 50 meters; it is designed for ASIL-B. The Bosch unit includes an integrated control unit comprising one scalable processing unit, which provides a programmable logic ("FPGA") and a dual core micro-processor with an integrated CAN or Ethernet interface on a single chip. The unit generates a precise 3-D map of the vehicle's environment, including a distance estimate for all the points in the image.”).



Thus, Farabet does not teach, as indicated in bold above, the claimed:
A.	“capable of at least provisionally differentiating the specific single person from the rest of the different persons”; and
B.	“by better differentiating the specific single person from the rest of the different persons”
	Accordingly, Vishnukumar teaches:
A.	capable of at least provisionally differentiating (said via “control systems capable…to differentiate different objects”) the specific single person (said comprised by “pedestrians”) from the rest of the different persons (said comprised by “pedestrians”); and
B.	by better (said via “efficient feedback” as shown in figures 1,2 and 3)  differentiating (said via “control systems capable…to differentiate different objects”) the specific single person (said comprised by “pedestrians”) from the rest of the different persons (said comprised by “pedestrians” via:…









page 714:
“I. INTRODUCTION

An autonomous car is a vehicle that is capable of sensing its environment and navigating without human input [1].

A. Autonomous Vehicles and its History

Autonomous vehicles or driverless cars specially can detect surrounding environment using a variety of techniques such as LiDAR, RADAR, odometry, GPS, and last but not the least -computer vision. According to SAE’s automated vehicle
classification [2], starting from level zero in which the vehicle has no control over the automobile (but it may provide warning to the driver), and all the way up to level five, which is complete autonomous driving which means that other than starting the autonomous system and setting up the required destination and related settings for navigation, and no human intervention or human input is required. The autonomous
vehicle or driverless vehicle can drive to any location where it is legal and possible to drive. Advanced and sophisticated control systems, software and algorithms interpret all the sensory data and information to identify and detect appropriate and right navigation paths, as well as obstacles, other subsystems and relevant signage information [3]. Autonomous or driver less vehicles have sophisticated control systems capable of taking in sensor data and analyzing the data to differentiate different objects in the surrounding environment and recognize vehicles, pedestrians and other obstacles in the surrounding environment which will be very helpful for later path planning to desired destination [4].”; and




















page 718:
“In order to have efficient feedback and information flow between different stages of the V-model, and from real-world to laboratory test and validation, it is needed to find an efficient data mining/data sorting technique and a database system. The data here refers to the data logged from all the sensors of the vehicle under test (VUT) in real-world, including processed data such as sensor fusion in real-time, bus-data and calculations in real-time. The logged data also includes reaction of the VUT for the corresponding surrounding situation/environment. The advantage is that the data can be reused in laboratory tests and validation so that one can recreate real-world scenario in a simulation environment. Discussing about autonomous vehicles, we have an incredible number of eventualities and unpredictable situations in the realworld. Localizing the vehicle (localize is to bring in all the information of the surrounding to knowledge of the vehicle) in such real-world environment is not an easy task, and requires lot of sensors to achieve real-time and efficient localization, resulting in large amounts of data, but such huge amount of data [19] from different sensors is to be managed at a time (simultaneously) to facilitate localization of the vehicle (For
instance, sensor fusion of Stereo-Camera [5], short-,mediumand long-range radar system, needs data from all the mentioned sensors at a time/ simultaneously). Such huge amount of data can also be termed as big-data [20]. Hence data mining is
efficient to sort such huge amount of data, since we humans cannot cope up with it efficiently without data mining techniques anymore [19].”).).

Thus, one of skill in the art of machine learning and validation thereof can modify Farabet’s teaching of said fig. 5:8000: “Model Validation” with Vishnukumar’s teaching of the control systems of figures 1-3 comprising the efficient feedback and recognize that the modification is predictable or looked forward to because the modification provides “efficient feedback…of…validation”, Vishnukumar, cited above.
	





 However, the combination does not teach, as indicated in bold above, the remaining claimed:
B.	“differentiating the specific single person from the rest of the different persons”.
	Accordingly, Li teaches:	 
B.	differentiating (or “differentiating”) the specific single person (comprised by “pedestrians and cyclists” one each is shown in the images of fig. 2) from the rest (or the remainder to be differentiated as well as indicated in the images of fig. 2) of the different persons (said or “pedestrians and cyclists” as shown in fig. 2 via page 269, right column, 2nd full paragraph:
“It's noted that traditional pedestrian or cyclist detection methods always consider pedestrians and cyclists separately [3], [4], although pedestrians and cyclists often appear in one picture. This often leads to scanning the input image several times and causing confused detection results, such as classifying cyclists as pedestrians, and vice versa, due to their similar appearance. In general, cyclists move faster than pedestrians, different attentions with pedestrians should be paid from ADAS or autonomous vehicles. Therefore, detecting pedestrians and cyclists concurrently and differentiating them clearly are urgently needed for the adaptive decision of ADAS and autonomous vehicles.”

	Thus, one of ordinary skill in the art of machine learning can modify Farabet’s teaching of said fig. 5:8000: “Model Validation” as already modified via the combination’s efficient validation differentiation with Li’s teaching of “differentiating” “pedestrians and cyclists” by modifying Farabet’s teaching of said fig. 5:8000: “Model Validation” as already modified via the combination with Li’s teaching of “Algorithm 1: Training the UB-MPR detection proposal method” in page 274 and recognize that the modification is predictable or looked forward to because “detecting…and differentiating” “pedestrians and cyclists” “are urgently needed for…autonomous vehicles”, Li, cited above.
Regarding claim 11, Farabet as combined teaches the system of claim 10, wherein the system is further configured to: 
identify (said via fig. 3:6040: “Stop Sign”), using the improved and/or representations thereof, of said specific single person (as indicated in fig. 12:8001: “(AI)”) in the corpus of visual data collectively captured by the plurality of on-road vehicles  (said fig. 4:50(1)-(N))
improve (via said fig. 5: back-arrow, pointing to fig. 5:5000) further said specific single person by even better differentiating (said fig. 5:8000: “Model Validation” as modified via the combination) the specific single person from the rest of the different persons.





Regarding claim 17, Farabet as combined teaches the system of claim 10, wherein: 
the system further comprises a server (fig. 4:5000: “GPU Servers”); 
the sever (said fig. 4:5000: “GPU Servers”) is configured to obtain said at least one of the appearances (said in a “highly iterative” way) from the respective on-road vehicle (said fig. 4:50(1)-(N)), in which said generation of the 
the server (said fig. 4:5000: “GPU Servers”) is further configured to distribute (via the arrows in fig. 4) the initial 
said identification (said via fig. 3:6040: “Stop Sign”), using the initial specific single person (as indicated in fig. 12:8001: “(AI)”) in the corpus of visual data, is done locally on-board (represented in fig. 4 as 1001: “DNNs” going into fig. 4:50(1)-(N)) the on-road vehicles (said fig. 4:50(1)-(N)).



Regarding claim 18, Farabet as combined teaches the system of claim 10, wherein: 
the system further comprises a server (said fig. 4:5000: “GPU Servers”); and 
the server (said fig. 4:5000: “GPU Servers”) is configured to collect at least some of the appearances (said in a “highly iterative” way) from the respective on-road vehicles  (said fig. 4:50(1)-(N)); 
in which: 
said generation of the initial 
said identification (said via fig. 3:6040: “Stop Sign”), using the initial single specific person (as indicated in fig. 12:8001: “(AI)”) in the corpus of visual data, is done (represented in fig. 4 as “DNNs” between fig. 4:5000 and fig. 4:8000) in the server (said fig. 4:5000: “GPU Servers”) using the appearances collected (said in a “highly iterative” way).





Claim 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Farabet et al. (US Patent App. Pub. No.: US 2019/0303759 A1) with reference to provisional application 62/648,399, filed on Mar. 27, 2018, in view of Vishnukumar et al. (Machine Learning and Deep Neural Network – Artificial Intelligence Core for Lab and Real-World Test and Validation for ADAS and Autonomous Vehicles) and Li et al. (A Unified Framework for Concurrent Pedestrian and Cyclist Detection) as applied above further in view of ZOU et al. (US Patent App. Pub. No.: US 2020/0065563 A1).
Regarding claim 12, Farabet as combined teaches the system of claim 10, wherein said improvement (said fig. 5:back-arrow, pointing to fig. 5:5000) and/or re-training the model (said fig. 5:8000: “Model Validation” as modified via the combination) using at least the additional appearances (said in a “highly iterative” way) as input, in which said training (via fig. 4:5000: “Train”) and/or re-training is associated with machine learning techniques (via fig. 4: “DNNs”);
the certain geographical area comprises at least a city (or “a city”);
said different persons (said as indicated in fig. 12:8001: “(AI)” comprising said “certain…pedestrians, cyclists” comprising “a single or specific person” or “exceptionally selective”) comprise at least one million (corresponding to “ten million images”)  different persons; and
therefore:


said training (said via fig. 4:5000: “Train”) and/or re-training of the provisional model (said fig. 5:8000: “Model Validation” as modified via the combination) into the improved model (said fig. 5:8000: “Model Validation” as modified via the combination) constitutes an iterative model building approach (or “an iterative process for training, verifying and deploying DNNs”) operative to facilitate said differentiation (said fig. 5:8000: “Model Validation” as modified via the combination) of the specific single person (said as indicated in fig. 12:8001: “(AI)” comprising said “certain…pedestrians, cyclists” comprising “a single or specific person” or “exceptionally selective”) from the rest of the different persons (said as indicated in fig. 12:8001: “(AI)” comprising said “certain…pedestrians, cyclists” comprising “a single or specific person” or “exceptionally selective”) when confronted with said geographical area comprising at least a city (said or “a city”) and said different persons (said as indicated in fig. 12:8001: “(AI)” comprising said “certain…pedestrians, cyclists” comprising “a single or specific person” or “exceptionally selective”) comprising at least one million (corresponding to “ten million images”) different persons (via:
“[0028] A human driver is required to be in the control loop for automation levels 0-2 but is not required for automation levels 3-5. The ADAS system must provide for a human driver to take control within about one second for levels 1 and 2, within several seconds for level 3, and within a couple of minutes for levels 4 and 5. A human driver must stay attentive and not perform other activities while driving during level 0-2, while the driver may perform other, limited activities for automation level 3, and even sleep for automation levels 4 and 5. Level 4 functionality allows the driver to go to sleep, and if any condition such that the car can no longer drive automatically, and the driver does not take over, the car will pull off safely. Level 5 functionality includes robot-taxis, where driverless taxis operate within a city or campus that has been previously mapped.”;





“[0042] After DAVE, two developments spurred further research in neural networks. First, large, labeled data sets such as the ImageNet Large Scale Visual Recognition Challenge ("ILSVRC") became widely available for training and validation. The ILSRVC data-set contains over ten million images in over 1000 categories.”; and

“[0054] The present invention includes an iterative process for training, verifying and deploying DNNs to perform autonomous driving functions, comprising a plurality of workflows. The invention provides advanced systems and methods that are especially useful to facilitate autonomous driving functionality, including a platform for autonomous driving Levels 3, 4, and/or 5.”).

	Thus, the combination does not teach the claimed “at least one million different persons”.
	Accordingly, Zou teaches the claimed:
	at least one million different persons (or “3.31 million images of…people” or “facial” “3.31 million images” or “3.31 million training images for the face recognition” via:
“[0043] Datasets are used for training and evaluating the face recognition convolutional neural network and/or the gender-age classifier.  In certain example embodiments, different datasets may be used for these and/or other purposes.  For instance, the VGGFace2 Dataset was used to train a convolutional neural network for generating large margin facial feature vectors for face recognition task.  The VGGFace2 Dataset is a large-scale face dataset that contains 3.31 million images of 9,131 people, with an average of 362.6 images for each person.  Images are downloadable from Google Image Search and have many variations in terms of pose, age, illumination, ethnicity, and profession (e.g., actors, athletes, politicians).  In this dataset, 2,000 images have both gender labels (female/male) and age labels (young/old).  In an example 
implementation, 90% of these images was used to train a gender-age classifier and evaluate its accuracy with the remaining 10% images.  To evaluate the accuracy of the face recognition network, the Labeled Faces in the Wild (LFW) dataset was used.  The LFW dataset includes face photographs designed for studying the problem of unconstrained face recognition.  The data set contains more than 13,000 images of faces collected from the web.”







“[0046] These components may be trained separately in certain example embodiments.  This approach is advantageous because not all face images in the VGGFace2 dataset have the necessary labels for training the second network.  After the first network is trained (e.g., with all 3.31 million images), it is used to generate the facial feature vectors of the images with labels that the second network needs (there are only 2,000 such images in VGGFace2 dataset).  These vectors are then used to train the second network.  To train multiple classifiers directly in accordance with conventional approaches, more than 2,000 images are needed.  However, it is possible to use the approach set forth above to successfully train both parts with high accuracies, even though only 2,000 images are used.  It will be appreciated that the inputs are facial 
feature vectors instead of images, in at least certain example embodiments.  This has been found to work, as these vector representations are retrieved from a high-level layer of the face recognition convolutional neural network.  Besides the discriminative large-margin property that typically is important for face recognition, they are also good "abstract" representations of facial images, so the network uses shallow layers for each classifier because the feature vector representations are already quite abstract, which accordingly do not need too much training data.”; and

“[0089] Because the filtered-out space is not checked during feature vector matching, the accuracies of the facial feature classifiers are important.  In comparison with 3.31 million training images for the face recognition convolutional neural network, the gender-age network was trained with only 1,800 images as explained above and was found to have an average accuracy of about 94%.  The labeled images were sufficient for this example, but in different contexts more training data may be needed to improve the accuracy yet further.).

Thus one of ordinary skill in the art of collections can modify Farabet’s teaching of said “ten million images” with Zou’s teaching of “3.31 million images of…people” by storing the images in memory for a computer to use and recognize that the modification is predictable or looked forward because the modification is used to “improve the accuracy…further”, Zou: cited above, of the computer.





Claims 13,14 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Farabet et al. (US Patent App. Pub. No.: US 2019/0303759 A1), with reference to provisional application 62/648/399, filed on Mar. 27, 2018, in view of Vishnukumar et al. (Machine Learning and Deep Neural Network – Artificial Intelligence Core for Lab and Real-World Test and Validation for ADAS and Autonomous Vehicles) and Li et al. (A Unified Framework for Concurrent Pedestrian and Cyclist Detection), as applied above, further in view of ZOU et al. (US Patent App. Pub. No.: US 2020/0065563 A1), as applied above, further in view of Dumov (US Patent App. Pub. No.: US 2020/0010051 A1), with reference to provisional application No. 62/694,413, filed on Jul. 5, 2018.














Regarding claim 13, Farabet teaches the system of claim 12, wherein: 
at least some of the additional appearances (said in a “highly iterative” way) are captured (as indicated in fig. 6: “Selected Data/Frames”) while the respective person (as indicated in fig. 12:8001: “(AI)” corresponding to “pedestrians…and passenger monitoring”) was less than 10 (ten) meters from the respective on-road vehicle (via fig. 6:50(1)-(N)) capturing the respective imagery data (via said fig. 28:72: “Surround Camera(s)”), and so as to allow a clear appearance (as indicated in fig. 12:8001: “(AI)”) of the person's face (as indicated in fig. 12:8001: “(AI)” via :
“[0057] The present invention includes a method suitable for training, updating, and deploying one or more neural networks used to recognize certain objects and features, including, without limitation, (1) LaneNet (for detecting lanes), (2) PoleNet (for detecting traffic poles), (3) WaitNet (for detecting wait conditions and intersections), (4) SignNet (for detecting traffic signs), (5) LightNet (for detecting traffic lights), (6) DriveNet (for detecting cars, pedestrians, cyclists and potentially other objects). In additional embodiments, the present invention includes a method for training and updating one or more neural networks used to perform in-cabin driver and passenger monitoring, including, without limitation, neural networks used to monitor state of driver, including gaze tracking, head pose tracking, drowsiness detection, sleepiness, eye openness, emotion detection, heart rate monitor, liveliness of driver, and driver impairment.”); and 

said clear appearance of the person's face (as indicated in fig. 12:8001: “(AI)”) is used as an input to said training and/or re-training the model (said fig. 5:8000: “Model Validation” as modified via the combination).
Thus, Farabet does not teach, as indicated in bold above:
“less than 10 (ten) meters from the respective on-road vehicle”; and
“said clear appearance of the person's face is used as an input to said training and/or re-training the model”.


Accordingly, Dumov teaches via said provisional application 62/694,413:
less than 10 (ten) meters (or inclusively, 0 meters to 100/3.281=30.48 meters or 0 feet to 100 feet, via “within…100 feet”) from the respective on-road vehicle (or “stopping location” via page 13:
“[0062] In some implementations, the authentication is programmed to happen within a time window and inside a geo-fence. For example, the time window can be set to be 5 minutes and the geo-fence can be a circle centered at the stopping location with a radius of 100 feet. The vehicle will leave the stopping location if the passenger fails to
authenticate within 5 minutes or if the passenger is trying to authenticate more than 100
feet away. If the passenger has trouble authenticating, he/she can request help by initiating a live communication session with a remote human tele-operator using the vehicle’s exterior input/output devices.”); and

said clear appearance (to fig. 3:304: “INDOOR CAMERA”) of the person's face (via fig. 3:116a,b: “PASSENGER”) is used as an input to said training and/or re-training the model (via “machine learning methods…to identify passengers or objects” via:
“[0037] In some implementations, the captured image data is sent to vehicle computer 302 for analysis. For example, vehicle computer 302 can use machine learning methods to analyze the captured image data to identify passengers or objects. Vehicle computer 302 can calculate the number of passengers and identify passengers inside the vehicle using facial recognition technology. In another example, vehicle computer 302 can alert passengers and cause autonomous vehicle 112a to stop if one or more prohibited objects such as weapons are detected inside the vehicle.”).

	






Thus, one of ordinary skill in the art of machine learning can modify Farabet’s teaching of said fig. 12:8001: “(AI)” and “pedestrians…and passenger monitoring” with Dumov’s teaching of said “machine learning methods…to identify passengers or objects” and recognize that the modification is predictable or looked forward to because Dumov’s teaching allows people to “request help” when “authenticating” “autonomous shared vehicles” that “is widely anticipated” via Dumov:
“[0002] Autonomous shared vehicle services are set to transform the landscape of
mass transit systems. The autonomous shared vehicle services could potentially offer fares at a fraction of the cost of those offered by public transport networks today, and also change the city landscape by dramatically reducing the number of cars on the roads and the number of parking lots and structures. The world’s first self-driving taxis were debuted in Singapore in 2016, and it is widely anticipated that autonomous shared vehicle services will be fully operational in many cities throughout the world in the next few years.”
“[0003] The disclosed embodiments are directed to identifying and authenticating
autonomous shared vehicles.”













Regarding claim 14, Farabet as combined teaches the system of claim 13, wherein: 
at least some of the additional appearances (said in a “highly iterative” way) are captured (as indicated in fig. 6: “Selected Data/Frames”) in conjunction with the respective person (as indicated in fig. 12:8001: “(AI)” corresponding to “pedestrians…and passenger monitoring” as modified via the combination) walking (via said “pedestrians”) and/or moving, and so as to allow a clear appearance (said as indicated in fig. 6: “Selected Data/Frames”) of the (said “pedestrians”) person's walking and/or moving patterns of motion; and 
said clear appearance (said as indicated in fig. 6: “Selected Data/Frames”) of the person (said as indicated in fig. 12:8001: “(AI)” corresponding to “pedestrians…and passenger monitoring” as modified via the combination) walking (via said “pedestrians”) and/or moving is used as an input (via said fig.5: back-arrow, pointing to fig. 5:5000) to said training and/or re-training the model (said fig. 5:8000: “Model Validation” as modified via the combination), 








thereby resulting in said improved (via said fig. 5: back-arrow, pointing to fig. 5:5000) 
“[0047] The early BB8 trials showed great promise, but achieving the functional safety necessary for Level 3-5 vehicles posed a unique problem. Deep neural networks are largely "black boxes," comprised of millions of nodes and tuned over time. A DNN's decisions can be difficult if not impossible to interpret, making troubleshooting and refinement challenging. With deep learning, a neural network learns many levels of abstraction. They range from simple concepts to complex ones. Each layer categorizes information. It then refines it and passes it along to the next. Deep learning stacks the layers, allowing the machine to learn a "hierarchical representation." For example, a first layer might look for edges. The next layer may look for collections of edges that form angles. The next might look for patterns of edges. After many layers, the neural network learns the concept of, say, a pedestrian crossing the street.

[0048] For example, Figure 3 illustrates the training of a neural network to recognize traffic signs. The neural network is comprised of an input layer (6010), a plurality of hidden layers (6020), and an output layer (6030). Training image information (6000) is input into nodes (300) and propagates forward through the network. The correct result (6040) is used to adjust the weights of the nodes (6011, 6021, 6031), and the process is used for thousands of images, each resulting in revised weights. After sufficient training, the neural network can accurately identify images, with even greater precision than humans.”

“[0086] According to embodiments of the invention, the system creates a model of stationary and moving obstacles, using an Al agent for each obstacle. The obstacles may include pedestrians, bicyclists, motorcycles, and cars. Where the obstacles are not near the vehicle of interest, the system models obstacles as represented in a simple form such as, for example, a radial distance function, or list of points at known positions in the plane, as well as their instantaneous motion vectors. The obstacles are thus modeled much as Al agents are modeled in a videogame engine. Al pedestrians, bicyclists, motorcycles, and cars are trained to behave much as they would in the real world. For example, pedestrians might suddenly jay-walk (especially at night or in rainy conditions), bicyclists may fail to heed stop signs and traffic lights, motorcycles may weave between traffic, and cars may swerve, change lanes suddenly, or brake unexpectedly.”).

Regarding claim 15, Farabet as combined teaches the system of claim 13, wherein said using of the clear appearance (said as indicated in fig. 12:8001: “(AI)” as modified via the combination) of the person's face (as indicated in fig. 12:8001: “(AI)” as modified via the combination) as an input to said training and/or re-training the model (said fig. 5:8000: “Model Validation” as modified via the combination), results in said improved specific single person; and 
the system is further configured to use the improved (via said fig. 5: back-arrow, pointing to fig. 5:5000) identify (via “identify the objects…including…pedestrians”) said  specific single person (said as indicated in fig. 12:8001: “(AI)” as modified via the combination) in an external visual database (via fig. 5:5001: “New driving data”), thereby determining an identity of said single specific person (via:
page 16:
“[0064] Deep-learning infrastructure runs its own neural network to identify the objects and compare them with the objects identified by the neural networks deployed in Vehicle (50). For inferencing, the infrastructure preferably includes servers powered by GPUs and NVIDIA's TensorRT 3 programmable inference accelerator. The combination of GPU-powered servers and TensorRT inference acceleration makes real-time responsiveness possible. Alternatively, when performance is less critical, servers powered by CPUs, FPGAs, and other processors may be used for inferencing, though they are disfavored as their performance falls short of that provided by the GPU/TensorRT 3 solution.”

page 69,70:
“[00249] For example, the systems and methods described herein may be used in simulating, training, and deploying DNNs for use in a wide variety of applications. Figure 43 provides an overview of a system for use in a robotics application. Application (600) is used to define a virtual robot, including the robot's chassis, effectors, actuators, computing platform, and sensors. The robot's intelligence comprises one or more DNNs trained to accomplish a number of functions. DNNs may include one or more neural networks used to recognize certain objects and features, including, without limitation, automotive parts and components, pallets, products and product inventory (with or without bar codes), tools, devices, plants, passageways, sidewalks, conveyor belts, signs, cars, pedestrians, cyclists and potentially other objects. DNNs may further include networks trained to control the robot's actuators, effectors, and other
components in response to the identification of one or more objects.”).
Thus, the combination does not teach, as indicated in bold, “identify said specific single person in an external visual database”. 
Accordingly, Dumov teaches:
identify said specific single person (via an “identification…match”) in an external visual database (or “data storage 104”, as shown in fig. 1, that is external to fig. 1:112a,b: “AUTONOMOUS VEHICLE” via page 6:
[0033] Once autonomous vehicle 112a arrives, it sends an alert to mobile device 114a. For example, the alert can include one or both of the autonomous vehicle 112a’s location and the maximum waiting time. In some examples, in order to board autonomous vehicle 112a, passenger 116a identifies him/herself and authenticates with the vehicle. For example, passenger 116a can use mobile device 114a to directly communicate with autonomous vehicle 112a using first communication channel 118a, e.g., a relatively shortrange protocol such as Bluetooth or Near-Field Connection. Passenger 116a can send certain identification information such as the passenger’s ID to autonomous vehicle 112a using first communication 118a. Autonomous vehicle 112a will then compare this information with that stored in data storage 104, and if a match is found, will allow passenger 116a to board the vehicle.”).

Thus, one of ordinary skill in the art of data collection can modify the combination’s fig. 12:8001: “(AI)” as modified via the combination and Farabet’s teaching of said fig. 5:5001: “New driving data” with Dumov’s teaching of the ID match via Dumov’s fig. 1:104: “DATA STORAGE” and recognize that the modification is predictable or looked forward to for the same reasons as presented in the rejection of claim 13.





Claim 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Farabet et al. (US Patent App. Pub. No.: US 2019/0303759 A1) with reference to provisional application 62/648,399, filed on Mar. 27, 2018, in view of Vishnukumar et al. (Machine Learning and Deep Neural Network – Artificial Intelligence Core for Lab and Real-World Test and Validation for ADAS and Autonomous Vehicles) and Li et al. (A Unified Framework for Concurrent Pedestrian and Cyclist Detection), as applied above, further in view of Guo et al. (MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition).
Regarding claim 12, Farabet as combined teaches the system of claim 10, wherein said improvement (said fig. 5:back-arrow, pointing to fig. 5:5000) and/or re-training the model (said fig. 5:8000: “Model Validation” as modified via the combination) using at least the additional appearances (said in a “highly iterative” way) as input, in which said training (via fig. 4:5000: “Train”) and/or re-training is associated with machine learning techniques (via fig. 4: “DNNs”);
the certain geographical area comprises at least a city (or “a city”);
said different persons (said as indicated in fig. 12:8001: “(AI)” comprising said “certain…pedestrians, cyclists” comprising “a single or specific person” or “exceptionally selective”) comprise at least one million (corresponding to “ten million images”)  different persons; and
therefore:

said training (said via fig. 4:5000: “Train”) and/or re-training of the provisional model (said fig. 5:8000: “Model Validation” as modified via the combination) into the improved model (said fig. 5:8000: “Model Validation” as modified via the combination) constitutes an iterative model building approach (or “an iterative process for training, verifying and deploying DNNs”) operative to facilitate said differentiation (said fig. 5:8000: “Model Validation” as modified via the combination) of the specific single person (said as indicated in fig. 12:8001: “(AI)” comprising said “certain…pedestrians, cyclists” comprising “a single or specific person” or “exceptionally selective”) from the rest of the different persons (said as indicated in fig. 12:8001: “(AI)” comprising said “certain…pedestrians, cyclists” comprising “a single or specific person” or “exceptionally selective”) when confronted with said geographical area comprising at least a city (said or “a city”) and said different persons (said as indicated in fig. 12:8001: “(AI)” comprising said “certain…pedestrians, cyclists” comprising “a single or specific person” or “exceptionally selective”) comprising at least one million (corresponding to “ten million images”) different persons (via:
“[0028] A human driver is required to be in the control loop for automation levels 0-2 but is not required for automation levels 3-5. The ADAS system must provide for a human driver to take control within about one second for levels 1 and 2, within several seconds for level 3, and within a couple of minutes for levels 4 and 5. A human driver must stay attentive and not perform other activities while driving during level 0-2, while the driver may perform other, limited activities for automation level 3, and even sleep for automation levels 4 and 5. Level 4 functionality allows the driver to go to sleep, and if any condition such that the car can no longer drive automatically, and the driver does not take over, the car will pull off safely. Level 5 functionality includes robot-taxis, where driverless taxis operate within a city or campus that has been previously mapped.”;





“[0042] After DAVE, two developments spurred further research in neural networks. First, large, labeled data sets such as the ImageNet Large Scale Visual Recognition Challenge ("ILSVRC") became widely available for training and validation. The ILSRVC data-set contains over ten million images in over 1000 categories.”; and

“[0054] The present invention includes an iterative process for training, verifying and deploying DNNs to perform autonomous driving functions, comprising a plurality of workflows. The invention provides advanced systems and methods that are especially useful to facilitate autonomous driving functionality, including a platform for autonomous driving Levels 3, 4, and/or 5.”).

	Thus, the combination does not teach the claimed “at least one million different persons”.
	Accordingly, Guo teaches:
	at least one million different persons (or “one million…face images” or “variance is introduced by popular celebrities with millions of images” via
pages 87,88:
“In this paper, we design a benchmark task as to recognize one million celebrities from their face images and identify them by linking to the unique entity keys in a knowledge base. We also construct associated datasets to train and test for this benchmark task. Our paper is mainly to close the following two gaps in current face recognition, as reported in [1]. First, there has not been enough effort in determining the identity of a person from a face image with disambiguation, especially at the web scale. The current face identification task mainly focuses on finding similar images (in terms of certain types of distance metric) for the input image, rather than answering questions such as “who is in the image?” and “if it is Anne in the image, which Anne?”. This lacks an important step of “recognizing”. The second gap is about the scale. The publicly available datasets are much smaller than that being used privately in industry, such as Facebook [2,3] and Google [4], as summarized in Table 1. Though the research in face recognition highly desires large datasets consisting of many distinct people, such large dataset is not easily or publicly accessible to most researchers. This greatly limits the contributions from research groups, especially in academia.”; and

 page 89, 2nd full paragraph:
“The large scale of our problem naturally introduces the following attractive challenges. With the increased number of classes, the inter-class variance tends to decrease. There are celebrities look very similar to each other (or even twins) in our one-million list. Moreover, large intra-class variance is introduced by popular celebrities with millions of images available, as well as celebrities with very large appearance variation (e.g., due to age, makeups, or even sex reassignment surgery).”).

Thus one of ordinary skill in the art of image collections can modify Farabet’s teaching of said “ten million images” with Guo’s teaching of “variance is introduced by popular celebrities with millions of images” by including Guo’s teaching of “variance is introduced by popular celebrities with millions of images” with Farabet’s teaching of said “ten million images” and recognize that the modification is predictable or looked forward to because the modification is used “to close the…gap…in…recognition…in determining the identity of a person…with disambiguation”, thus providing “an important step of ‘recognizing’”, and Guo’s “one million…face images” are “easily…accessible” thus addressing the accessibility “gap” to people that desire such a “large dataset”, Guo: cited above.













Claims 13,14 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Farabet et al. (US Patent App. Pub. No.: US 2019/0303759 A1), with reference to provisional application 62/648/399, filed on Mar. 27, 2018, in view of Vishnukumar et al. (Machine Learning and Deep Neural Network – Artificial Intelligence Core for Lab and Real-World Test and Validation for ADAS and Autonomous Vehicles) and Li et al. (A Unified Framework for Concurrent Pedestrian and Cyclist Detection), as applied above, further in view of Guo et al. (MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition), as applied above, further in view of Dumov (US Patent App. Pub. No.: US 2020/0010051 A1), with reference to provisional application No. 62/694,413, filed on Jul. 5, 2018.













Regarding claim 13, Farabet teaches the system of claim 12, wherein: 
at least some of the additional appearances (said in a “highly iterative” way) are captured (as indicated in fig. 6: “Selected Data/Frames”) while the respective person (as indicated in fig. 12:8001: “(AI)” corresponding to “pedestrians…and passenger monitoring”) was less than 10 (ten) meters from the respective on-road vehicle (via fig. 6:50(1)-(N)) capturing the respective imagery data (via said fig. 28:72: “Surround Camera(s)”), and so as to allow a clear appearance (as indicated in fig. 12:8001: “(AI)”) of the person's face (as indicated in fig. 12:8001: “(AI)” via :
“[0057] The present invention includes a method suitable for training, updating, and deploying one or more neural networks used to recognize certain objects and features, including, without limitation, (1) LaneNet (for detecting lanes), (2) PoleNet (for detecting traffic poles), (3) WaitNet (for detecting wait conditions and intersections), (4) SignNet (for detecting traffic signs), (5) LightNet (for detecting traffic lights), (6) DriveNet (for detecting cars, pedestrians, cyclists and potentially other objects). In additional embodiments, the present invention includes a method for training and updating one or more neural networks used to perform in-cabin driver and passenger monitoring, including, without limitation, neural networks used to monitor state of driver, including gaze tracking, head pose tracking, drowsiness detection, sleepiness, eye openness, emotion detection, heart rate monitor, liveliness of driver, and driver impairment.”); and 

said clear appearance of the person's face (as indicated in fig. 12:8001: “(AI)”) is used as an input to said training and/or re-training the model (said fig. 5:8000: “Model Validation” as modified via the combination).
Thus, Farabet does not teach, as indicated in bold above:
“less than 10 (ten) meters from the respective on-road vehicle”; and
“said clear appearance of the person's face is used as an input to said training and/or re-training the model”.


Accordingly, Dumov teaches via said provisional application 62/694,413:
less than 10 (ten) meters (or inclusively, 0 meters to 100/3.281=30.48 meters or 0 feet to 100 feet, via “within…100 feet”) from the respective on-road vehicle (or “stopping location” via page 13:
“[0062] In some implementations, the authentication is programmed to happen within a time window and inside a geo-fence. For example, the time window can be set to be 5 minutes and the geo-fence can be a circle centered at the stopping location with a radius of 100 feet. The vehicle will leave the stopping location if the passenger fails to
authenticate within 5 minutes or if the passenger is trying to authenticate more than 100
feet away. If the passenger has trouble authenticating, he/she can request help by initiating a live communication session with a remote human tele-operator using the vehicle’s exterior input/output devices.”); and

said clear appearance (to fig. 3:304: “INDOOR CAMERA”) of the person's face (via fig. 3:116a,b: “PASSENGER”) is used as an input to said training and/or re-training the model (via “machine learning methods…to identify passengers or objects” via:
“[0037] In some implementations, the captured image data is sent to vehicle computer 302 for analysis. For example, vehicle computer 302 can use machine learning methods to analyze the captured image data to identify passengers or objects. Vehicle computer 302 can calculate the number of passengers and identify passengers inside the vehicle using facial recognition technology. In another example, vehicle computer 302 can alert passengers and cause autonomous vehicle 112a to stop if one or more prohibited objects such as weapons are detected inside the vehicle.”).

	






Thus, one of ordinary skill in the art of machine learning can modify Farabet’s teaching of said fig. 12:8001: “(AI)” and “pedestrians…and passenger monitoring” with Dumov’s teaching of said “machine learning methods…to identify passengers or objects” and recognize that the modification is predictable or looked forward to because Dumov’s teaching allows people to “request help” when “authenticating” “autonomous shared vehicles” that “is widely anticipated” via Dumov:
“[0002] Autonomous shared vehicle services are set to transform the landscape of
mass transit systems. The autonomous shared vehicle services could potentially offer fares at a fraction of the cost of those offered by public transport networks today, and also change the city landscape by dramatically reducing the number of cars on the roads and the number of parking lots and structures. The world’s first self-driving taxis were debuted in Singapore in 2016, and it is widely anticipated that autonomous shared vehicle services will be fully operational in many cities throughout the world in the next few years.”
“[0003] The disclosed embodiments are directed to identifying and authenticating
autonomous shared vehicles.”













Regarding claim 14, Farabet as combined teaches the system of claim 13, wherein: 
at least some of the additional appearances (said in a “highly iterative” way) are captured (as indicated in fig. 6: “Selected Data/Frames”) in conjunction with the respective person (as indicated in fig. 12:8001: “(AI)” corresponding to “pedestrians…and passenger monitoring” as modified via the combination) walking (via said “pedestrians”) and/or moving, and so as to allow a clear appearance (said as indicated in fig. 6: “Selected Data/Frames”) of the (said “pedestrians”) person's walking and/or moving patterns of motion; and 
said clear appearance (said as indicated in fig. 6: “Selected Data/Frames”) of the person (said as indicated in fig. 12:8001: “(AI)” corresponding to “pedestrians…and passenger monitoring” as modified via the combination) walking (via said “pedestrians”) and/or moving is used as an input (via said fig.5: back-arrow, pointing to fig. 5:5000) to said training and/or re-training the model (said fig. 5:8000: “Model Validation” as modified via the combination), 








thereby resulting in said improved (via said fig. 5: back-arrow, pointing to fig. 5:5000) 
“[0047] The early BB8 trials showed great promise, but achieving the functional safety necessary for Level 3-5 vehicles posed a unique problem. Deep neural networks are largely "black boxes," comprised of millions of nodes and tuned over time. A DNN's decisions can be difficult if not impossible to interpret, making troubleshooting and refinement challenging. With deep learning, a neural network learns many levels of abstraction. They range from simple concepts to complex ones. Each layer categorizes information. It then refines it and passes it along to the next. Deep learning stacks the layers, allowing the machine to learn a "hierarchical representation." For example, a first layer might look for edges. The next layer may look for collections of edges that form angles. The next might look for patterns of edges. After many layers, the neural network learns the concept of, say, a pedestrian crossing the street.

[0048] For example, Figure 3 illustrates the training of a neural network to recognize traffic signs. The neural network is comprised of an input layer (6010), a plurality of hidden layers (6020), and an output layer (6030). Training image information (6000) is input into nodes (300) and propagates forward through the network. The correct result (6040) is used to adjust the weights of the nodes (6011, 6021, 6031), and the process is used for thousands of images, each resulting in revised weights. After sufficient training, the neural network can accurately identify images, with even greater precision than humans.”

“[0086] According to embodiments of the invention, the system creates a model of stationary and moving obstacles, using an Al agent for each obstacle. The obstacles may include pedestrians, bicyclists, motorcycles, and cars. Where the obstacles are not near the vehicle of interest, the system models obstacles as represented in a simple form such as, for example, a radial distance function, or list of points at known positions in the plane, as well as their instantaneous motion vectors. The obstacles are thus modeled much as Al agents are modeled in a videogame engine. Al pedestrians, bicyclists, motorcycles, and cars are trained to behave much as they would in the real world. For example, pedestrians might suddenly jay-walk (especially at night or in rainy conditions), bicyclists may fail to heed stop signs and traffic lights, motorcycles may weave between traffic, and cars may swerve, change lanes suddenly, or brake unexpectedly.”).

Regarding claim 15, Farabet as combined teaches the system of claim 13, wherein said using of the clear appearance (said as indicated in fig. 12:8001: “(AI)” as modified via the combination) of the person's face (as indicated in fig. 12:8001: “(AI)” as modified via the combination) as an input to said training and/or re-training the model (said fig. 5:8000: “Model Validation” as modified via the combination), results in said improved specific single person; and 
the system is further configured to use the improved (via said fig. 5: back-arrow, pointing to fig. 5:5000) identify (via “identify the objects…including…pedestrians”) said  specific single person (said as indicated in fig. 12:8001: “(AI)” as modified via the combination) in an external visual database (via fig. 5:5001: “New driving data”), thereby determining an identity of said single specific person (via:
page 16:
“[0064] Deep-learning infrastructure runs its own neural network to identify the objects and compare them with the objects identified by the neural networks deployed in Vehicle (50). For inferencing, the infrastructure preferably includes servers powered by GPUs and NVIDIA's TensorRT 3 programmable inference accelerator. The combination of GPU-powered servers and TensorRT inference acceleration makes real-time responsiveness possible. Alternatively, when performance is less critical, servers powered by CPUs, FPGAs, and other processors may be used for inferencing, though they are disfavored as their performance falls short of that provided by the GPU/TensorRT 3 solution.”

page 69,70:
“[00249] For example, the systems and methods described herein may be used in simulating, training, and deploying DNNs for use in a wide variety of applications. Figure 43 provides an overview of a system for use in a robotics application. Application (600) is used to define a virtual robot, including the robot's chassis, effectors, actuators, computing platform, and sensors. The robot's intelligence comprises one or more DNNs trained to accomplish a number of functions. DNNs may include one or more neural networks used to recognize certain objects and features, including, without limitation, automotive parts and components, pallets, products and product inventory (with or without bar codes), tools, devices, plants, passageways, sidewalks, conveyor belts, signs, cars, pedestrians, cyclists and potentially other objects. DNNs may further include networks trained to control the robot's actuators, effectors, and other
components in response to the identification of one or more objects.”).
Thus, the combination does not teach, as indicated in bold, “identify said specific single person in an external visual database”. 
Accordingly, Dumov teaches:
identify said specific single person (via an “identification…match”) in an external visual database (or “data storage 104”, as shown in fig. 1, that is external to fig. 1:112a,b: “AUTONOMOUS VEHICLE” via page 6:
[0033] Once autonomous vehicle 112a arrives, it sends an alert to mobile device 114a. For example, the alert can include one or both of the autonomous vehicle 112a’s location and the maximum waiting time. In some examples, in order to board autonomous vehicle 112a, passenger 116a identifies him/herself and authenticates with the vehicle. For example, passenger 116a can use mobile device 114a to directly communicate with autonomous vehicle 112a using first communication channel 118a, e.g., a relatively shortrange protocol such as Bluetooth or Near-Field Connection. Passenger 116a can send certain identification information such as the passenger’s ID to autonomous vehicle 112a using first communication 118a. Autonomous vehicle 112a will then compare this information with that stored in data storage 104, and if a match is found, will allow passenger 116a to board the vehicle.”).

Thus, one of ordinary skill in the art of data collection can modify the combination’s fig. 12:8001: “(AI)” as modified via the combination and Farabet’s teaching of said fig. 5:5001: “New driving data” with Dumov’s teaching of the ID match via Dumov’s fig. 1:104: “DATA STORAGE” and recognize that the modification is predictable or looked forward to for the same reasons as presented in the rejection of claim 13.





Claim 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Farabet et al. (US Patent App. Pub. No.: US 2019/0303759 A1), with reference to provisional application 62/648/399, filed on Mar. 27, 2018, in view of Vishnukumar et al. (Machine Learning and Deep Neural Network – Artificial Intelligence Core for Lab and Real-World Test and Validation for ADAS and Autonomous Vehicles) and Li et al. (A Unified Framework for Concurrent Pedestrian and Cyclist Detection), as applied above, further in view of Boghossian et al. (US Patent 9,336,451) and Matsushita et al. (US Patent 10,755,080).
Regarding claim 16, Farabet teaches the system of claim 10, wherein the system is further configured to: 
generate representations (said via fig. 3:6000) for at least some appearances of persons (corresponding to fig. 12:80001(AI)) in the corpus of imagery data (said via fig. 3:6000), in which each of the representations (said via fig. 3:6000) is generated from a specific one appearance, or from a specific one sequence of related appearances, of one of the persons (corresponding to fig. 12:8001:(AI)), in imagery data captured by one of the on-road vehicles (fig. 4:50(1)-(N)); 






estimate (or “determine”), per each of at least some of the representations (said via fig. 3:6000), a location-at-the-time-of-being-captured (via a “camera position”) of the respective person (corresponding to fig. 12:8000:(AI)), based at least in part on the location of the respective on-road vehicle (via said fig. 4:50(1)-(N)) during the respective capture, thereby associating the representations (said via fig. 3:6000) with static locations (via said “camera position”) respectively, and regardless of a dynamic nature of the on-road vehicles (via said fig. 4:50(1)-(N)) that are on the move (via:
“[0077] In another embodiment, KPIs or metrics in general are computed on one or more of the current best models in the model store in order to determine conditions or combination of conditions in which said current best models are not considered to perform sufficiently well. For example, one condition dimension may include properties of the whole image or frame, such but not limited to lighting or illumination (e.g. day, night, cloudy, twilight, backlit), weather (e.g. clear, rain, snow, fog), setting (e.g.
rural, urban, sub-urban, highway), topography (e.g. flat, curve, hill), region (Europe, North America, China, etc.), camera position and/or lens type, or any combination of the aforementioned. Additionally, or optionally, one may combine both with object dimensions such as for example properties of individual detected objects (e.g. object class, occlusion level, object size, etc.). Conditions or combination of conditions in which said current best models are not considered to perform sufficiently well are used to direct mining and labeling of additional data which fulfill said conditions or combination of conditions. Mining of such additional data may be facilitated for instance by the use tags that may have been added upon capture or curation of said data.”); and 








associate (via “any combination of the aforementioned” cited above, resulting in “given places and times”, cited below) each of the representations (said via fig. 3:6000) with a time (or “day, night…twilight”, cited above or “night…times”, cited below) at which the respective person (corresponding to fig. 12:8001:(AI)) was captured, thereby possessing, per each of the representations (said via fig. 3:6000), a geo-temporal (via said “camera position” and “day, night…twilight” or “night…times”) tag (or “labeling…tags”, cited above) comprising (said via “any combination of the aforementioned” cited above, resulting in “given places and times”, cited below) both the time (said “day, night…twilight” or “night…times”) at which the respective person (corresponding to fig. 12:8001:(AI)) was captured and estimated location (via said determined “camera position”) of the respective person (corresponding to fig. 12:8001:(AI)) at the time (said “day, night…twilight” or “night…times”) of being captured (via:
“[0081] A fifth and final workflow in the training and refinement process is illustrated in Figure 10.This workflow is designed to be used after the fourth workflow is used. The ultimate goal is to use prediction scores and aggregate them on a GPS map, to illustrate where the DNNs perform well, and where they perform less well. Using that heat map, the system decides where to send the car. This closes the data collection loop: the system collects data, sees how well the DNNs perform, and returns to areas that would benefit from further attention. In a further embodiment, KPIs or metrics for previously
mentioned conditions or set of conditions may be used to direct the planning of routes in addition to GPS and or the coverage map. For instance, in case KPIs or metrics show that model performance is not sufficient under certain types of bridges at night, one may plan specific routes to collect additional data at given places and times when/where such conditions are met.”); 





wherein: 
said at least one of the appearances of the specific single one person (corresponding to fig. 12:8001:(AI)), which is used to generate the initial 
page 14:
“[0055] The system and methods described herein include a platform to ingest raw native sensor data (from cameras, lidar, radar etc.) and produce highly-optimized DNNs to be embedded directly into the DriveAV stack (running on the car). The system provides for incremental deep learning and optimization. In embodiments, the system provides for training a first model, measuring performance, iterating, collecting targeted data, and producing new and refined models. The system preferably includes a platform with model stores, dataset stores, progress trackers. The system preferably processes massive datasets necessary for AV training, curates, stores, labels, searches, and prioritize data. In addition, the system preferably produces key performance indicators (KPIs) against specific constraints (e.g. in specific GPS radius, or in specific weather conditions).”

pages 44,45:
“[00161] Controller (100) provides autonomous driving outputs in response to an array of sensor inputs including, for example: one or more ultrasonic sensors (66), one or more RADAR sensors (68), one or more Light Detection and Ranging ("LIDAR") sensors (70), one or more surround cameras (72) (typically such cameras are located at various places on vehicle body (52) to image areas all around the vehicle body), one or more stereo cameras (74) (in preferred embodiments, at least one such stereo camera faces forward to provide depth-perception for object detection and object recognition in the vehicle path), one or more infrared cameras (75), GPS unit (76) that provides location coordinates, a steering sensor (78) that detects the steering angle, speed sensors (80) (one for each of the wheels (54)), an inertial sensor or inertial measurement unit ("IMU") (82) that monitors movement of vehicle body (52) (this sensor can be for example an accelerometer(s) and/or a gyrosensor(s) and/or a magnetic compass(es)), tire vibration sensors (85), and microphones (102) placed around and inside the vehicle. Other sensors may be used, as is known to persons of ordinary skill in the art.”): 


pointing-out (via said “searches”), using the geo-temporal (via said “camera position” and “day, night…twilight” or “night…times”) tags (or said “labeling…tags”), at least two of the representations (said via fig. 3:6000) as representations (said via fig. 3:6000) having a similar (via said “stereo cameras”), though not necessarily identical, geo-temporal (via said “camera position” and “day, night…twilight” or “night…times”) tags (or said “labeling…tags”), which indicates geo-temporal proximity, in which the representations (said via fig. 3:6000) that are currently pointed-out (via said “searches”) were generated from imagery data (via fig. 6: “Selected Data/Frames”) captured previously by at least two different ones of the on-road vehicles (via said fig. 4:50(1)-(N)) respectively; and 
analyzing (via fig. 3) the representations (said via fig. 3:6000), which were pointed-out (via said “searches”), to identify which of the representations (said via fig. 3:6000) belong to  the specific single person (corresponding to fig. 12:8001:(AI)), in which the representations (said via fig. 3:6000) identified (via fig. 3:6040: “Stop Sign”) constitute said at least two appearances (said via “stereo cameras” corresponding to fig. 12:80001(AI)) found (via said “searches”) in the system.
Thus, Farabet does not teach, as indicated in bold above:
A.	“geo-temporal tags”; and
B.	“identify which of the representations belong to a single person”.




Accordingly, Boghossian teaches:
A.	geo-temporal tag (or a tag of “tagged object…satisfying a temporal and spatial relationship” via c.3,ll. 56-68:
“A fourth aspect of the present invention provides a method of operating data processing apparatus comprising: displaying a network map of camera locations and a scene for a first camera view field; responding to a user tagging an object in said scene to: determine other view fields of cameras in said network in which the tagged object may possibly appear based on a possible object in said other view fields satisfying a temporal and spatial relationship between exit and/or entry points in said first camera view field and an entry and/or exit point for said other view fields; and display possible routes in said network between camera locations for which a said temporal and spatial 
relationship is satisfied.”).
















Thus, one of ordinary skill in the art of tracking and tag data of people can modify, Farabet’s teaching of “basic object tracking” and said “cameras may be used to perform…pedestrian detection” and data tagging upon capture in the context of the combined day, night, twilight and camera position being search conditions with Boghossian’s teaching of the spatial-temporal tagged object with a “tracklets” table and recognize that the modification is predictable or looked forward to because the tracklets are “reducing the amount of data” and hence the amount of computer processing (analyzing) resulting in faster or reduced processing (analyzing) relative to processing or analyzing “all the metadata attributes…and… the video image data each time the behavior of an object is to be analyzed” via Boghossian, c.10,ll. 20-41:
“Referring now to FIG. 9, there is illustrated a process flow control diagram 160 for the single camera tracker module 30.  The single camera tracker module 30 operates on data in metadata database 28, namely observations table 84, and populates tables within that database with results of its operations. In general outline, the function of the single camera tracker module 30 is to define the track taken by an object in a view field in terms of a "tracklet". A tracklet has an identity corresponding to the object ID to which the tracklet relates.  The tracklet is defined by the key parameters of the path taken by an object in a view field, namely where and when the object entered and exited that view field.  A "tracklet" defines the behavior of an object within a view field.  Data defining the tracklet is stored in the "tracklets" table 90.  In this way, the behavior of an object in a view field may be characterized by way of a single attribute, namely the tracklet, thereby reducing the amount of data characterizing the behavior of an object in view field.  That is to say, it is not necessary to analyze all the metadata attributes for an object when wishing to determine its behavior in a view field and moreover not necessary to analyze the video image data each time the behavior of an object is to be analyzed.”.

Thus, the combination does not teach the remaining limitation, B.




Accordingly, Matsushita teaches:
B.	identify (via fig. 5: “SEARCH RESULT”: 502,503 via an absence therein of an “X” mark of fig. 5:505: “X”) which of the representations (fig. 5:502,503,505) belong to a single person (as shown by the “single human” of fig. 5:501-504 resulting in a “search…associated with a single human” via c.15,l. 42 to c.16,l.4:
“A query selection unit 815 determines, based on the association information accumulated in the external storage device 104 by the association information accumulation unit 211, whether there is a face image feature that is associated with the human body image feature for the human body image selected by the search result selection unit 814.  In a case where the associated face image feature is stored in the external storage device 104, the face image feature is acquired from the external storage device 104.  A plurality of human body images may be sorted out by the search result selection unit 814, and a plurality of face image features may be associated with a single human body image feature.  As a result, a large number of face image features may be acquired.  Accordingly, in such a case, one representative face image feature is sorted out from the face image features. In the case of sorting out the face image feature, clustering is performed on the face image feature and only the face image feature close to the center of gravity of each cluster is sorted out from each cluster.  A face image search unit 813 designates the face image feature sorted out by the query selection unit 815 as a query and the face image search is performed using the face image feature stored in the external storage device 104 by the face image feature accumulation unit 207.  Further, a face image with a similarity to a face image feature being higher than the predetermined threshold is identified as the search result.  In the case of calculating the similarity, the sum total of distances of SIFT features at each facial feature point is obtained and the reciprocal number of the sum total of the distances is normalized to obtain the similarity.”).

Thus, one of ordinary skill in tracking people and search thereof can modify Farabet’s teaching of the “searches” using queries such as said night time or day time at a camera position with Matsushita’s teaching of fig. 5: “SEARCH RESULT” using five cameras and recognize that the modification is predictable of looked forward to because Matsushita’s teaching provides a “sorted out” feature or a clarifying feature while performing a search based on the five cameras.

Allowable Subject Matter
Claims 1-9 and 19,20 are allowed
The following is an examiner’s statement of reasons A. and B. for allowance:
A.	The claims are allowed for the same reasons as discussed above, reproduced below, applicant’s remarks of 12/21/20, pages 12,13:
“Claim 19 has been amended so as to now include the limitations: 

- ‘...said model is capable of differentiating the specific single person from the rest of the different persons’ 
and 
- ‘...improving said capability of differentiating the specific single person from the rest of the different persons’ 

It is noted that differentiating the specific single person from the rest of the different persons and then improving said capability of differentiating is a description 
 of a solution to the [0002] "...real challenge when trying to associate together multiple images or other representations of one specific person" as described by the Applicant, in which Farabet is silent regarding the challenge, as correctly noted by the Examiner, and therefore the described solution is novel in view of Farabet. 

Therefore, the Applicant respectfully submits that independent claim 19 and depending claim 20 are in condition for allowance…

Claim 1 has been amended so as to now include the limitations: 

‘...said provisional model is at least sub-optimal for directly differentiating the specific single person from the rest of the many different persons, but is capable of accurately differentiating the specific single person from a sub-group of the many different persons’ 

It is noted that differentiating the specific single person from a sub-group of the many different persons is a description of a solution to the [0002] "...real challenge when trying to associate together multiple images or other representations of one specific person" as described by the Applicant, in which Farabet and Boghossian are silent regarding the challenge, as correctly noted by the Examiner, and therefore the described solution is novel in view of Farabet and Boghossian. 

Therefore, the Applicant respectfully submits that independent claim 1 and all depending claims are in condition for allowance.” ; and


B.	Regarding claim 19, claim 19 is allowed because the prior art does not anticipate or render obvious, as detailed below regarding claim 19, lines 10-12, when considered as a whole the claimed:
“generating a model using at least the representation selected as an input, in which said model is capable of differentiating the specific single person from the rest of the different persons”.

Thus, claim 20 is allowed for depending on claim 19.
Regarding claim 1, claim 1, lines 25-30 has similar language:
“use said representation selected to generate a provisional model of the respective specific singe person, in which said provisional model is at least sub-optimal for directly differentiating the specific single person from the rest of the many different persons” wherein “the representation selected belongs to a specific single one of the different persons”

to claim 19’s:
“generating a model using at least the representation selected as an input, in which said model is capable of differentiating the specific single person from the rest of the different persons”.

Thus, claim 1, lines 25-30’s:

“use said representation selected to generate a provisional model of the respective specific singe person, in which said provisional model is at least sub-optimal for directly differentiating the specific single person from the rest of the many different persons” wherein “the representation selected belongs to a specific single one of the different persons”

is not anticipated or rendered obvious for the same reasons as discussed in claim 19

Thus, claims 2-9 are allowed for depending on claim 1.




Regarding claim 10, lines 16-18’s:
“a provisional model operative to capable of at least provisionally differentiating the specific single person from the rest of the different persons” 

that appears to closely resemble the above allowed limitations of claims 19 and 1:
claim 19, lines 10-12:
“generating a model using at least the representation selected as an input, in which said model is capable of differentiating the specific single person from the rest of the different persons

claim 1, lines 25-30:
“use said representation selected to generate a provisional model of the respective specific singe person, in which said provisional model is at least sub-optimal for directly differentiating the specific single person from the rest of the many different persons” wherein “the representation selected belongs to a specific single one of the different persons”

; however, a selection, as claimed in claims 19 and 1, does not fall under the broadest reasonable interpretation of “differentiating” as described in the above Claim Interpretation of claim 10. 
	Thus, claims 1-9 and 19,20 are allowed.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”






Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Claim 19 is/are reviewed under 35 U.S.C. 103 via Farabet et al. (US Patent App. Pub. No.: US 2019/0303759 A1) with reference to provisional application 62/648,399, filed on Mar. 27, 2018, in view of Vishnukumar et al. (Machine Learning and Deep Neural Network – Artificial Intelligence Core for Lab and Real-World Test and Validation for ADAS and Autonomous Vehicles) and Kang et al. (Pedestrian Detection Based on Adaptive Selection of Visible Light or Far-Infrared Light Camera Image by Fuzzy Inference System and Convolutional Neural Network-Based Verification) and Li et al. (A Unified Framework for Concurrent Pedestrian and Cyclist Detection):
Regarding claim 19, Farabet, via provisional application 62/648/399, teaches a method for tracking persons by utilizing models generated using imagery data captured by a plurality of on-road vehicles, comprising: 








collecting, by a server (fig. 4:5000: “GPU Servers”), from on-road vehicles (via fig. 5:50(1)-(N): “DRIVE Pegasus”), a plurality of representations (via fig. 3:6000:a group of rectangles) of various different persons (as indicated in fig. 12:8001:“(AI)”), and selecting (corresponding to “frames selected”) one of the representation (via fig. 3:6040: “Stop Sign”) of a specific (via “certain…pedestrians, cyclists” comprising “a single or specific person” or “exceptionally selective”) single one of the persons (as indicated in fig. 12:8001:“(AI)”) out of the plurality of representations (said via fig. 3:6000:a group of rectangles) of various different persons (or two people as indicated in fig. 12:8001:“(AI)”), in which each of the representations (said via fig. 3:6000:a group of rectangles) was derived from a respective imagery data (as indicated in fig. 6: “Selected Data/Frames”) captured by the respective on-road vehicle (said via fig. 5:50(1)-(N): “DRIVE Pegasus”) while moving (via fig. 1: “Driver”) in a certain geographical area (as indicated in fig. 6: “Selected Data/Frames” via:



















“[0057] The present invention includes a method suitable for training, updating, and deploying one or more neural networks used to recognize certain objects and features, including, without limitation, (1) LaneNet (for detecting lanes), (2) PoleNet (for detecting traffic poles), (3) WaitNet (for detecting wait conditions and intersections), (4) SignNet (for detecting traffic signs), (5) LightNet (for detecting traffic lights), (6) DriveNet (for detecting cars, pedestrians, cyclists and potentially other objects). In additional
embodiments, the present invention includes a method for training and updating one or more neural networks used to perform in-cabin driver and passenger monitoring, including, without limitation, neural networks used to monitor state of driver, including gaze tracking, head pose tracking, drowsiness detection, sleepiness, eye openness, emotion detection, heart rate monitor, liveliness of driver, and driver impairment.”

wherein “certain” is defined via Dictionary.com:
certain
adjective
1	free from doubt or reservation; confident; sure:
I am certain he will come.
2	destined; sure to happen (usually followed by an infinitive):
He is certain to be there.
3	inevitable; bound to come:
They realized then that war was certain.
4	established as true or sure; unquestionable; indisputable:
It is certain that he tried.
5	fixed; agreed upon; settled:
on a certain day; for a certain amount.
6	definite or particular, but not named or specified:
A certain person phoned. He had a certain charm.
7	that may be depended on; trustworthy; unfailing; reliable:
His aim was certain.
8	some though not much:
a certain reluctance.
9	Obsolete. steadfast.














wherein “particular” is defined:
particular
adjective
1	of or relating to a single or specific person, thing, group, class, occasion, etc., rather than to others or all; special rather than general:
one's particular interests in books.
2	immediately present or under consideration; in this specific instance or place:
Look at this particular clause in the contract.
3	distinguished or different from others or from the ordinary; noteworthy; marked; unusual:
She sang with particular warmth at last evening's concert.
4	exceptional or especial:
Take particular pains with this job.
5	being such in an exceptional degree:
a particular friend of mine.
6	dealing with or giving details, as an account or description, of a person; detailed; minute.
7	exceptionally selective, attentive, or exacting; fastidious; fussy:
to be particular about one's food.); 

generating (via the arrows in fig. 5) a model (via fig. 5:8000: “Model Validation”) using at least the representation selected (said resulting in “frames selected”  via fig. 3:6040: “Stop Sign”) as an input (as shown by the loop-back-arrow in fig. 5), in which said model (said via fig. 5:8000: “Model Validation”) is capable of differentiating the specific single person (said as indicated in fig. 12:8001:“(AI)”)  from the rest of the different persons (said or two people as indicated in fig. 12:8001:“(AI)”); 






detecting, using the model generated (said via fig. 5:8000: “Model Validation”), out of at least some of the plurality of representations (said via fig. 3:6000:a group of rectangles), at least one additional representation of said specific single person (via “cameras may be used to perform…pedestrian detection”), thereby tracking (via “basic object tracking”) said specific single person (said as indicated in fig. 12:8001:“(AI)” via pages 46,47:
“[00167] Figure 29 illustrates one example of camera types and locations, with 11 cameras (501)-(508).Front-facing cameras (501)-(505) help identify forward facing paths and obstacles, and provide information critical to making an occupancy grid and determining the preferred vehicle paths. Front facing cameras may be used to perform many of the same functions as LIDAR, including emergency braking, pedestrian detection, and collision avoidance. Front-facing cameras may also be used for ADAS functions and systems including Lane Departure Warnings ("LDW"), and Autonomous Cruise Control ("ACC"), and other functions such as traffic sign recognition.”

“[00170] In preferred embodiments, a long-view stereo camera pair (501) can be used for depth-based object detection, especially for objects for which a neural network has not yet been trained. Long-view stereo cameras (501) may also be used for object detection and classification, as well as basic object tracking. In the embodiment shown in Figure 29, front-facing long-view cameras (501) has a 30 degree field of view. Stereo cameras for automotive applications may be obtained from Continental, LG, Bosch, DENSO, Hitachi and Fujitsu Ten. For example, a suitable stereo camera includes the Conti Multi-Function Stereo Camera MFS430orthe Bosch Stereo Video Camera, with two CMOS color imagers with a resolution of 1280 x 960 pixels. The Bosch Stereo Video Camera is designed to record a horizontal range of 50 degrees and offer a 3-D measurement range of more than 50 meters; it is designed for ASIL-B. The Bosch unit includes an integrated control unit comprising one scalable processing unit, which provides a programmable logic ("FPGA") and a dual core micro-processor with an integrated CAN or Ethernet interface on a single chip. The unit generates a precise 3-D map of the vehicle's environment, including a distance estimate for all the points in the image.”); and 




improving (via said feed-back arrow in fig. 5) said model (said via fig. 5:8000: “Model Validation”) by generating (via said arrows in fig. 5) a new and better model (said via fig. 5:8000: “Model Validation”), in which said generation (via said arrows in fig. 5) of the new and better model (said via fig. 5:8000: “Model Validation”) uses, as an input (via an input-arrow into fig. 5:8000: “Model Validation”), the at least one additional representation detected (via “cameras may be used to perform…pedestrian detection”), together with at least one of (i) the representation identified, and (ii) the model (said via fig. 5:8000: “Model Validation” before using said feed-back arrow of fig. 5), thereby improving said capability of differentiating (via “not able to differentiate”) the specific single person (said as indicated in fig. 12:8001:“(AI)”) from the rest of the different persons (said or two people as indicated in fig. 12:8001:“(AI)” via:
“[0051] A principle challenge in developing a reliable training, verification, and simulation environment is creating a realistic environment, including lighting, reflections, shadows, and third-party drivers and pedestrians. Realistic lighting is critical for verifying DNNs for autonomous vehicles. For example, in one accident involving a semi-autonomous Tesla vehicle, the cameras were not able to differentiate the white side of a tractor trailer against a brightly lit sky. In that case, the radar should not have had any problems detecting the trailer, but the radar was trained to tune out what looks like an overhead road sign to avoid false braking events. Testing a vast number of environments with realistic reflections and lighting conditions is critical to automotive safety.”














“[0074] The process of training the DNNs is highly iterative, comprising a plurality of workflows. A first workflow, illustrated in Figure 6, is the most basic, and illustrates the first few months of a DL project. This workflow essentially focuses on collecting large amounts of initial data from vehicles (50(1)-(N)), providing that data to a dataset store (5003) (a service that handles immutable datasets for further processing), labeling that data (5004), and training initial models (5005). The frames selected for labeling may be randomly selected in this early workflow or selected at regular time intervals. In a preferred embodiment, this workflow typically labels 300,000 to 600,000 frames. In preferred embodiments, the models are tested and verified by simulation or re-simulation (8000). The models are preferably pruned, optimized, and deployed in vehicles (50(1)-(N)). Model refinement and pruning (5006) may be accomplished as described in U.S. Application No. 62/630,445, incorporated by reference. After labeling
hundreds of thousands of frames, the process leads diminishing returns. For example, if a DNN reaches 97% accuracy on a random data distribution, then 97% of all new data randomly collected is already well predicted, so labeling it adds no information to the existing training set.”).”).

	Thus, Farabet does not teach, as indicated in bold above, the claimed:

A.	“selecting one”; 
B.	“which said model is capable of differentiating the specific single person from the rest of the different persons”; and
C.	“thereby improving said capability of differentiating the specific single person from the rest of the different persons”.
	Accordingly, Vishnukumar teaches:
B.	which said model is capable of differentiating (via “control systems capable…to differentiate different objects”) the specific single person (comprised by “pedestrians”) from the rest of the different persons (said comprised by “pedestrians”); and
C.	thereby improving (via “efficient feedback” as shown in figures 1,2 and 3) said capability of differentiating (said via “control systems capable…to differentiate different objects”) the specific single person (said comprised by “pedestrians”) from the rest of the different persons (said comprised by “pedestrians” via:

page 714:
“I. INTRODUCTION

An autonomous car is a vehicle that is capable of sensing its environment and navigating without human input [1].

A. Autonomous Vehicles and its History

Autonomous vehicles or driverless cars specially can detect surrounding environment using a variety of techniques such as LiDAR, RADAR, odometry, GPS, and last but not the least -computer vision. According to SAE’s automated vehicle
classification [2], starting from level zero in which the vehicle has no control over the automobile (but it may provide warning to the driver), and all the way up to level five, which is complete autonomous driving which means that other than starting the autonomous system and setting up the required destination and related settings for navigation, and no human intervention or human input is required. The autonomous
vehicle or driverless vehicle can drive to any location where it is legal and possible to drive. Advanced and sophisticated control systems, software and algorithms interpret all the sensory data and information to identify and detect appropriate and right navigation paths, as well as obstacles, other subsystems and relevant signage information [3]. Autonomous or driver less vehicles have sophisticated control systems capable of taking in sensor data and analyzing the data to differentiate different objects in the surrounding environment and recognize vehicles, pedestrians and other obstacles in the surrounding environment which will be very helpful for later path planning to desired destination [4].”; and




















page 718:
“In order to have efficient feedback and information flow between different stages of the V-model, and from real-world to laboratory test and validation, it is needed to find an efficient data mining/data sorting technique and a database system. The data here refers to the data logged from all the sensors of the vehicle under test (VUT) in real-world, including processed data such as sensor fusion in real-time, bus-data and calculations in real-time. The logged data also includes reaction of the VUT for the corresponding surrounding situation/environment. The advantage is that the data can be reused in laboratory tests and validation so that one can recreate real-world scenario in a simulation environment. Discussing about autonomous vehicles, we have an incredible number of eventualities and unpredictable situations in the realworld. Localizing the vehicle (localize is to bring in all the information of the surrounding to knowledge of the vehicle) in such real-world environment is not an easy task, and requires lot of sensors to achieve real-time and efficient localization, resulting in large amounts of data, but such huge amount of data [19] from different sensors is to be managed at a time (simultaneously) to facilitate localization of the vehicle (For
instance, sensor fusion of Stereo-Camera [5], short-,mediumand long-range radar system, needs data from all the mentioned sensors at a time/ simultaneously). Such huge amount of data can also be termed as big-data [20]. Hence data mining is
efficient to sort such huge amount of data, since we humans cannot cope up with it efficiently without data mining techniques anymore [19].”).

	Thus, one of skill in the art of machine learning and validation thereof can modify Farabet’s teaching of said fig. 5:8000: “Model Validation” with Vishnukumar’s teaching of the control systems of figures 1-3 comprising the efficient feedback and recognize that the modification is predictable or looked forward to because the modification provides “efficient feedback…of…validation”, Vishnukumar, cited above.
	Thus the combination does not teach, as indicated in bold above, the claimed: 
A.	“selecting one”; 
B.	“the specific single person from the rest of the different persons”; and
C.	“differentiating the specific single person from the rest of the different persons”	 



Accordingly, Kang teaches:
A.	selecting one (or “selects one” via page 2, 1st full paragraph:
“However, these methods may increase the processing time and computational complexity as they have to take into account both visible light and FIR camera images, and process the convolutional neural network (CNN) twice [13]. In order to overcome these limitations, our research suggests a method that is able to detect the pedestrians under varying conditions. The proposed method is more reliable than a single camera-based method, reduces the complexity of the algorithm, and requires less processing time compared to the methods using both visible light and FIR camera images. This
is because our method adaptively selects one candidate between two pedestrian candidates derived from visible light and FIR camera images based on a fuzzy inference system (FIS). To enhance the detection accuracy and processing speed, only the selected one candidate is verified by the CNN.”).

	Thus one of ordinary skill in the art of computer models with vehicles can modify Farabet’s teaching of “frames selected” and the exceptionally selective pedestrians with Kang’s teaching of “selects one” by making a corresponding computer model select one of the exceptionally selective pedestrians in a frame and recognize that the modification is predictable or looked forward to because the modification is used “To enhance the detection accuracy and processing speed…by the CNN”, Kang, cited above.
	The combination does not teach, as indicated in bold above, the remaining claimed:
B.	“the specific single person from the rest of the different persons”; and
C.	“differentiating the specific single person from the rest of the different persons”.
	




Accordingly, Li teaches:	 
B.	the specific single person (comprised by “pedestrians and cyclists” one each is shown in the images of fig. 2) from the rest (or the remainder to be differentiated as well as indicated in the images of fig. 2) of the different persons (said or “pedestrians and cyclists” as shown in fig. 2); and
C.	differentiating (or “differentiating”) the specific single person (said comprised by “pedestrians and cyclists” one each is shown in the images of fig. 2) from the rest (said or the remainder to be differentiated as well as indicated in the images of fig. 2) of the different persons (said or “pedestrians and cyclists” as shown in fig. 2 via page 269, right column, 2nd full paragraph:
“It's noted that traditional pedestrian or cyclist detection methods always consider pedestrians and cyclists separately [3], [4], although pedestrians and cyclists often appear in one picture. This often leads to scanning the input image several times and causing confused detection results, such as classifying cyclists as pedestrians, and vice versa, due to their similar appearance. In general, cyclists move faster than pedestrians, different attentions with pedestrians should be paid from ADAS or autonomous vehicles. Therefore, detecting pedestrians and cyclists concurrently and differentiating them clearly are urgently needed for the adaptive decision of ADAS and autonomous vehicles.”

	






Thus, one of ordinary skill in the art of machine learning can modify Farabet’s teaching of said fig. 5:8000: “Model Validation” as already modified via the combination’s efficient feedback with Li’s teaching of “differentiating” “pedestrians and cyclists” by modifying Farabet’s teaching of said fig. 5:8000: “Model Validation” as already modified via the combination’s efficient feedback with Li’s teaching of “Algorithm 1: Training the UB-MPR detection proposal method” in page 274 and recognize that the modification is predictable or looked forward to because “detecting…and differentiating” “pedestrians and cyclists” “are urgently needed for…autonomous vehicles”, Li, cited above.
	The combination does not result in the claimed invention when claim 19’s:
“generating a model using at least the representation selected as an input, in which said model is capable of differentiating the specific single person from the rest of the different persons”

is considered as a whole since the third combination of Li’ urgent need to distinguish between pedestrians and cyclists would require re-modifying the first combination of differences B. and C. of Vishnukumar’s efficient differentiation feedback in view of the second combination of Kang’s teaching of difference A., the claimed “selecting one” that improves the detection of the CNN, to arrive at a fourth combination to meet the claimed:
“generating a model using at least the representation selected as an input, in which said model is capable of differentiating the specific single person from the rest of the different persons”



Thus modifying the third combination of Li in view of the previous two combinations of Vishnukumar and Kang resulting in a fourth combination to meet the claimed:
“generating a model using at least the representation selected as an input, in which said model is capable of differentiating the specific single person from the rest of the different persons”

appears as improper hindsight reconstruction using applicant’s disclosure.
Thus, the claimed:

“generating a model using at least the representation selected as an input, in which said model is capable of differentiating the specific single person from the rest of the different persons”

is not anticipated or rendered obvious when considered as a whole to one of ordinary skill in the art.












THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENNIS ROSARIO whose telephone number is (571)272-7397.  The examiner can normally be reached on Monday-Friday, 9AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571)272-7778.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/DENNIS ROSARIO/Examiner, Art Unit 2667 

/MATTHEW C BELLA/Supervisory Patent Examiner, Art Unit 2667