DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 8/8/2022 and 7/12/2022 has been entered. Claims 1-20 are pending.















Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
Regarding claims 1-20, 35 USC 112(f) is not invoked in claims 1-20.







Accordingly the following meanings are “taken” via MPEP 2111.01 III. "PLAIN MEANING" REFERS TO THE ORDINARY AND CUSTOMARY MEANING GIVEN TO THE TERM BY THOSE OF ORDINARY SKILL IN THE ART, 3rd paragraph, emphasis added:
“It is also appropriate to look to how the claim term is used in the prior art, which includes prior art patents, published applications, trade publications, and dictionaries. Any meaning of a claim term taken from the prior art must be consistent with the use of the claim term in the specification and drawings. Moreover , when the specification is clear about the scope and content of a claim term, there is no need to turn to extrinsic evidence for claim interpretation. 3M Innovative Props. Co. v. Tredegar Corp., 725 F.3d 1315, 1326-28, 107 USPQ2d 1717, 1726-27 (Fed. Cir. 2013) (holding that "continuous microtextured skin layer over substantially the entire laminate" was clearly defined in the written description, and therefore, there was no need to turn to extrinsic evidence to construe the claim).”















The claimed “engine” (as in “generating, using a trained facial classification neural engine, one or more first labels” and “generating, using a supporting engine, a second label” in claim 1) is interpreted in light of applicant’s disclosure: 
A.	“[0041]       The system includes various engines, each of which is constructed, 
programmed, configured, or otherwise adapted, to carry out a function or set of functions. The term engine as used herein means a tangible device, component, or arrangement of components implemented using hardware, such as by an application specific integrated circuit (ASIC) or field-programmable gate array (FPGA), for example, or as a combination of hardware and software, such as by a processor- based computing platform and a set of program instructions that transform the computing platform into a special-purpose device to implement the particular functionality. An engine may also be implemented as a combination of the two, with certain functions facilitated by hardware alone, and other functions facilitated by a combination of hardware and software.”; 

B.	[0067]: penultimate S: “ The layers following the input layer may be convolution layers that produce feature maps that are filtering results of the inputs and are used by the next convolution layer.”; and  
C.	fig. 7:[0098]: “FIG. 7 is a flow chart illustrating an example method 700 for fine tuning the facial classification neural engine 612, in accordance with some embodiments.”; 
and definition thereof via Dictionary.com:
A.	“a piece or collection of software that drives a later process” or 
B.	“a means by which something is achieved, accomplished, or furthered” 
is “taken” as the meaning of the claimed “engine” via MPEP 2111.01 III:
engine, noun
4	Computers. a piece or collection of software that drives a later process (used in combination, as in game engine; software engine).See also search engine.
6	a means by which something is achieved, accomplished, or furthered:
Trade is an engine of growth that creates jobs, reduces poverty, and increases economic opportunity; 

and use in the prior art:

A.	O’Toole et al. (Face Space Representations in Deep Convolutional Neural 
Networks), September 2018, uses “engine” as provided by a “model” (four models are
shown in fig.1, below): 
“High-end computational graphical processing units, the preferred computational engine of choice for DCNNs, are 30–50 thousand times faster than computers in the 80s.” (page 795, Box 1. Deep Convolutional Neural Networks for Face Recognition, 2nd para, 3rd S); 

“In computer vision, despite the strong limitation of image-based PCA to operate only within a single (frontal) viewpoint, this model provided the computational engine for the first generation of commercially viable face recognition systems [22].” (pages 795,796); 

and

See Key Figure (Figure 1), evolution in computational models, pages 797,798:










    PNG
    media_image1.png
    772
    675
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    146
    678
    media_image2.png
    Greyscale

, wherein “graphical processing units” or “GPU” is used in Modasshir et al. as “an NVIDIA GTX1080” (a graphics card) (Deep Neural Networks: a Comparison on Different Computing Platforms):
	“As DNNs have intense computational requirements in the majority of applications, they utilize a cluster of computers or a cutting edge Graphical Processing Unit (GPU), often having excessive power consumption and generating a lot of heat.” (Abstract, 3rd S); 

and

“A Dell Alienware gaming3 laptop with Intel Core i7-7820HK as CPU, 32 GB DDR4 as RAM, and an NVIDIA GTX1080 as GPU was the most powerful machine tested.” (pages 385,386);

B.	Araujo et al. (US Patent App. Pub. No.: US 2020/0019699A1), filed July 10, 

2018, uses “engine”:

“[0039] Moreover, it should be appreciated that the use of the term “engine,” if used herein with regard to describing embodiments and features of the invention, is not intended to be limiting of any particular implementation for accomplishing and/or performing the actions, steps, processes, etc., attributable to and/or performed by the engine. An engine may be, but is not limited to, software, hardware and/or firmware or any combination thereof that performs the specified functions including, but not limited to, any use of a general and/or specialized processor in combination with appropriate software loaded or stored in a machine-readable memory and executed by the processor. Further, any name associated with a particular engine is, unless otherwise specified, for purposes of convenience of reference and not intended to be limiting to a specific implementation. Additionally, any functionality attributed to an engine may be equally performed by multiple engines, incorporated into and/or combined with the functionality of another engine of the same or different type, or distributed across one or more engines of various configurations.”;









The claimed “identification” (in “verifying, via at least one client computing device, a correct identification” in claim 5) is interpreted in light of applicant’s disclosure via applicant’s fig. 3: 312,314: “RECOGNIZE IMAGE”: “IMAGE RECOGNIZED” and definition thereof via Dictionary.com wherein “an act or instance of identifying; the state of being identified” is “taken” as the meaning of the claimed “identification” in 
“verifying, via at least one client computing device, a correct identification” in claim 5
 via MPEP 2111.01 III:
identification
noun
1	an act or instance of identifying; the state of being identified.















The claimed “identification” (in “upon receiving, from at least M employee client computing devices, a consistent identification of the person: verifying that the consistent identification is the correct identification” in claim 7) is interpreted in light of applicant’s disclosure (“The authentication engine receives, from at least one of the client computing device(s) 640, a selection of one of the possible identifications as the correct identification.” [00108], last S) and definition thereof via Dictionary.com wherein “something that verifies the identity of a person, animal, or thing” is “taken” as the meaning of the claimed “identification” in 
“upon receiving, from at least M employee client computing devices, a consistent identification (“something that verifies the identity of a person, animal, or thing”) of the person: verifying that the consistent identification (“something that verifies the identity of a person, animal, or thing”) is the correct identification (the meaning of this “identification” has already been “taken” above regarding claim 5 as “the state of being identified”)” 

in claim 7 via MPEP 2111.01 III:
identification
noun
2	something that identifies a person, animal, or thing:
He carries identification with him at all times.

wherein “identifies” is defined:
identify
verb (used with object), i·den·ti·fied, i·den·ti·fy·ing.
1	to recognize or establish as being a particular person or thing; verify the identity of:
to identify handwriting; to identify the bearer of a check.
2	to serve as a means of identification for:
His gruff voice quickly identified him.
3	to make, represent to be, or regard or treat as the same or identical:
They identified Jones with the progress of the company.



Response to Arguments
Applicant's arguments filed 7/12/2022 have been fully considered but they are not persuasive.
CLAIM REJECTIONS – 35 USC 102
	Claims 1,14, and 15
	Applicant’s state in pages 10,11:
First, Claim 1 is amended to clarify that the "first label act[s] as an identifier of the person depicted in the probe image." Luo does not disclose generating a label for a person depicted in a probe image, wherein the label acts as an identifier of the person depicted in the probe image. As explained in Applicant's Amendment & Response Under 37 C.F.R. § 1.111 submitted February 15, 2022 ("First Response"), Luo instead appears merely to describe taking two colored images as input and outputting an "image label" of "1" if the two images are of the same person or "0" otherwise.' Luo's "image label" of "1" or "0" plainly does not "act[s] as an identifier of the person depicted in" either of the two colored input images, but merely as an indicator that the two colored input images are of the same person. Thus, Claim 1 is patentable over Luo for at least this reason. 
	
	









The examiner respectfully disagrees Luo teaches generating a label1 “yjk” via “generate…labels”, pg. 24045, 3.1.1, penult S and page 24048:

    PNG
    media_image3.png
    466
    1171
    media_image3.png
    Greyscale

wherein the label (yjk) acts as an identifier (via a “unique identification” of “data interpreted by a computer” such as a “pedestrian target dataset”, pg. 24045, 3.1.1: Training set, 1st S) of the person depicted in the probe image.


 
 
 



Applicant’s state in pages 10,11:
First, Claim 1 is amended to clarify that the "first label act[s] as an identifier of the person depicted in the probe image." Luo does not disclose generating a label for a person depicted in a probe image, wherein the label acts as an identifier of the person depicted in the probe image. As explained in Applicant's Amendment & Response Under 37 C.F.R. § 1.111 submitted February 15, 2022 ("First Response"), Luo instead appears merely to describe taking two colored images as input and outputting an "image label" of "1" if the two images are of the same person or "0" otherwise.' Luo's "image label" of "1" or "0" plainly does not "act[s] as an identifier of the person depicted in" either of the two colored input images, but merely as an indicator that the two colored input images are of the same person. Thus, Claim 1 is patentable over Luo for at least this reason. 
	
	The examiner respectfully disagrees since the image-data identifier label yjk  (i.e., a class: “label is the…class”, pg. 24047, last S) is the goal via “From this trend the output of the entire network would gradually approach to the value of the class” i.e., label  yjk, pg. 24048, 2. ContrastiveLoss, last para, penult S.












Applicant’s arguments, see remarks, page 11, filed 7/12/2022:
Second, Claim 1 is also amended to clarify that the "second label act[s] as an identifier of the person depicted in the probe image." For reasons explained above, none of the alleged "image label[s]" described in Luo appear to act as an identifier of a person depicted in an image (i.e., a person depicted in either of the two colored input images of Luo). Thus, Claim 1 is additionally patentable over Luo for at least this reason. 
Third, Claim 1 is also amended to clarify that the generation of the second label is performed "upon determining that the probability is within a predefined low accuracy range" (emphasis added). Applicant cannot find anywhere Luo describes a "predefined low accuracy range." And, the Office's citation to § 3.1.3 of Luo, which merely defines a probability range of 0 to 1, does not demonstrate how Luo describes a "predefined low accuracy range," as claimed by Applicant. Indeed, a probability of "1" in Luo, which falls in the supposed "predefined low accuracy range" (i.e., the range of 0 to 1) alleged by the Office to be disclosed in Luo, appears to correspond to a hi-zh likelihood (or hi-zh accuracy) of a match between Luo's two colored input images. So, it is not clear to Applicant how the probability range of 0 to 1 referenced by the Office could correspond to a "predefined low accuracy range," as claimed by Applicant. Regardless, and in addition to the foregoing, Applicant cannot find anywhere Luo describes generating a second label "upon determining that the probability [generated for a first label] is within a predefined low accuracy range" (emphasis added), as recited in amended Claim 1. Thus, Claim 1 is additionally patentable over Luo for at least this reason. 

, with respect to the rejection(s) of claim(s) 1,14, and 15 under 35 USC 102 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of 35 USC 103 in view of Dimiralp et al. (US Patent App. Pub. No.: US 2020/0104753 A1) that teaches labels over time in fig. 2: “Classifier Labels 158”: “Observer Labels 166” based on a cause and effect arrows in fig. 1 corresponding to claim 1’s cause and effect: “upon [the cause]…., generating [the effect]”. In contrast, Luo generates label yjk  as input: fig. 3: “Input 1:Image1” such that label yjk appears as the cause instead of the effect.



Applicant's arguments filed 7/12/2022 have been fully considered but they are not persuasive.
Applicants state in pages 11,12:
Fourth, Claim 1 recites "generating, using a supporting engine, a second label for the person depicted in the probe image, . . . wherein the supporting engine operates independently of the trained facial classification neural engine." In the first Office Action dated December 17, 2021 ("First OA"), the Office appeared to equate the "loss functions" described in § 3.1.3 of Luo with Applicant's claimed "supporting engine" that operates independently of the claimed "trained facial classification neural engine."2 However, as explained in Applicant's Amendment & Response Under 37 C.F.R. § 1.111 submitted February 15, 2022 ("First Response"), the "loss functions" described in § 3.1.3 of Luo do not operate independently of any other part of Luo's network.3 Rather, in the last two paragraphs of § 3.1.2 (see also Fig. 3), Luo describes that the input to the loss functions comes explicitly from the previous layers of its network. Accordingly, the loss functions of Luo necessarily depend on the previous layers of Luo's network. As such, Applicant explained that the loss functions in Luo do not, and cannot, operate independently from any other part of Luo's network. 

	In response, the current Office action of 5/16/2022, page 25, last paragraph maps the claimed “supporting engine”2 to driving or guiding comprised by “parameter refined…training”3 (pg. 24048: 2 ContrastiveLoss, last para, last S) comprising “to guide”. 






Applicants state in pages 11,12:
Fourth, Claim 1 recites "generating, using a supporting engine, a second label for the person depicted in the probe image, . . . wherein the supporting engine operates independently of the trained facial classification neural engine." In the first Office Action dated December 17, 2021 ("First OA"), the Office appeared to equate the "loss functions" described in § 3.1.3 of Luo with Applicant's claimed "supporting engine" that operates independently of the claimed "trained facial classification neural engine."2 However, as explained in Applicant's Amendment & Response Under 37 C.F.R. § 1.111 submitted February 15, 2022 ("First Response"), the "loss functions" described in § 3.1.3 of Luo do not operate independently of any other part of Luo's network.3 Rather, in the last two paragraphs of § 3.1.2 (see also Fig. 3), Luo describes that the input to the loss functions comes explicitly from the previous layers of its network. Accordingly, the loss functions of Luo necessarily depend on the previous layers of Luo's network. As such, Applicant explained that the loss functions in Luo do not, and cannot, operate independently from any other part of Luo's network. 

In response the parameter refined training (i.e., the claimed engine) as represented by the training of Luo’s fig. 3 (Conv and MaxPool) comprises:
a)	“independent neurons”, pg. 24043, 2.1 CNN, 1st para, 3rd S, which is understood to mean that a neuron in a square-plane is independent from another neuron in another square-plane:

    PNG
    media_image4.png
    526
    1166
    media_image4.png
    Greyscale



; and 
b)	“the input images are being independently processed with no interactions”, pg. 24044, 3 Our method, 1st para, 7th S, which is understood to mean that the processing (in either fig. 2: traditional Siamese network or fig. 3: Matching-Siamese network (MSN)) of one image is independent from the processing of another image such as “the training process” (pg. 24047, 3.1.3 The loss function: 1. SoftmaxWithLoss, 2nd para, last S) of one image is independent of the training process of another image. 















Applicants state in pages 11,12:
Fourth, Claim 1 recites "generating, using a supporting engine, a second label for the person depicted in the probe image, . . . wherein the supporting engine operates independently of the trained facial classification neural engine." In the first Office Action dated December 17, 2021 ("First OA"), the Office appeared to equate the "loss functions" described in § 3.1.3 of Luo with Applicant's claimed "supporting engine" that operates independently of the claimed "trained facial classification neural engine."2 However, as explained in Applicant's Amendment & Response Under 37 C.F.R. § 1.111 submitted February 15, 2022 ("First Response"), the "loss functions" described in § 3.1.3 of Luo do not operate independently of any other part of Luo's network.3 Rather, in the last two paragraphs of § 3.1.2 (see also Fig. 3), Luo describes that the input to the loss functions comes explicitly from the previous layers of its network. Accordingly, the loss functions of Luo necessarily depend on the previous layers of Luo's network. As such, Applicant explained that the loss functions in Luo do not, and cannot, operate independently from any other part of Luo's network. 

	In response the independent aspects (independent neurons and independent training) of Luo still hold.
Applicants state in pages 11,12:
Fourth, Claim 1 recites "generating, using a supporting engine, a second label for the person depicted in the probe image, . . . wherein the supporting engine operates independently of the trained facial classification neural engine." In the first Office Action dated December 17, 2021 ("First OA"), the Office appeared to equate the "loss functions" described in § 3.1.3 of Luo with Applicant's claimed "supporting engine" that operates independently of the claimed "trained facial classification neural engine."2 However, as explained in Applicant's Amendment & Response Under 37 C.F.R. § 1.111 submitted February 15, 2022 ("First Response"), the "loss functions" described in § 3.1.3 of Luo do not operate independently of any other part of Luo's network.3 Rather, in the last two paragraphs of § 3.1.2 (see also Fig. 3), Luo describes that the input to the loss functions comes explicitly from the previous layers of its network. Accordingly, the loss functions of Luo necessarily depend on the previous layers of Luo's network. As such, Applicant explained that the loss functions in Luo do not, and cannot, operate independently from any other part of Luo's network. 

	In response the independent aspects (independent neurons and independent training) of Luo still hold.


Applicants state in pages 11,12:
Fourth, Claim 1 recites "generating, using a supporting engine, a second label for the person depicted in the probe image, . . . wherein the supporting engine operates independently of the trained facial classification neural engine." In the first Office Action dated December 17, 2021 ("First OA"), the Office appeared to equate the "loss functions" described in § 3.1.3 of Luo with Applicant's claimed "supporting engine" that operates independently of the claimed "trained facial classification neural engine."2 However, as explained in Applicant's Amendment & Response Under 37 C.F.R. § 1.111 submitted February 15, 2022 ("First Response"), the "loss functions" described in § 3.1.3 of Luo do not operate independently of any other part of Luo's network.3 Rather, in the last two paragraphs of § 3.1.2 (see also Fig. 3), Luo describes that the input to the loss functions comes explicitly from the previous layers of its network. Accordingly, the loss functions of Luo necessarily depend on the previous layers of Luo's network. As such, Applicant explained that the loss functions in Luo do not, and cannot, operate independently from any other part of Luo's network. 

	In response said each independent “training process” comprises a “loss function”4. Thus it is understood that the independent training process with a loss function of one image (fig. 6(a): top: a person) is independent of another independent training process with a loss function of another image (fig. 6(b):bottom: another person). Thus the part of learning to recognize or detect one person is independent from the part to recognize or detect another person.







Applicants state in page 12:
In response to Applicant's remarks, in the Final Office Action, the Office appeared to allege that the convolution layers "Convl" (also referred to in Luo as "C1") and "Conv5" (also referred to in Luo as "C5") of Luo are "engines" that operate independently.4 Applicant respectfully asserts that the Office's allegation is improper for at least a couple reasons. One, it is plainly apparent from the language of Applicant's Claim 1 that both the "trained facial classification neural engine" and the "supporting engine" operate on the same "probe image." However, as explained above, Luo describes taking two colored images as input.5 Notably, these two colored images "are input to C1 and C5 respectively."6 That is, convolution layers C1 and C5 operate on separate images, not the same "probe image," as called for in Applicant's Claim 1. Two, it is also plainly apparent from Luo that the convolution layers "Convl" and "Conv5" do not operate independently. Indeed, Luo criticizes the "traditional Siamese network" for processing the input images independently, and indicates it's proposed network (i.e., the MSN) addresses this deficiency.7 Thus, Claim 1 is additionally patentable over Luo for at least this reason.

	The examiner respectfully disagrees since Luo teaches independent image processing operations of the MSN. Thus, each MSN operation in fig. 3: “Conv” and “MaxPool” are independent image processing operations with no interactions until concatenation (fig. 3: “Concat”) is encountered, thus joining the independent image operations:










    PNG
    media_image5.png
    1108
    1167
    media_image5.png
    Greyscale








In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e., 
a)	“the ‘second label…is used to train the ‘facial classification neural engine”; and
b)	“the ‘label’ generated by one ‘engine’ (i.e., the ‘supporting engine’) is used to train another, independent ‘engine’ (i.e., the 'facial classification neural engine’ "
 via applicant’s remarks, page 13:
Specifically, as can be plainly understood from the explicit language of Applicant's Claim 1, the "second label" is generated by the "supporting engine," but is used to train the "facial classification neural engine." Moreover, Claim 1 explicitly recites that the "supporting engine" operates independently of the "facial classification neural engine." That is, the "label" generated by one "engine" (i.e., the "supporting engine") is used to train another, independent "engine" (i.e., the 'facial classification neural engine"). In contrast, the last paragraph of § 3.1.3 of Luo describes refining network parameters through backpropagation. The use of backpropagation to refine network parameters clearly demonstrates that no portion of Luo's network, illustrated for example in Figure 3, operates independently. Thus, Claim 1 is additionally patentable over Luo for at least this reason.

) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
	In contrast, claim 1 says:
	“further training the facial classification neural engine based on the second label”.





Applicants state in page 13:
Specifically, as can be plainly understood from the explicit language of Applicant's Claim 1, the "second label" is generated by the "supporting engine," but is used to train the "facial classification neural engine." Moreover, Claim 1 explicitly recites that the "supporting engine" operates independently of the "facial classification neural engine." That is, the "label" generated by one "engine" (i.e., the "supporting engine") is used to train another, independent "engine" (i.e., the 'facial classification neural engine"). In contrast, the last paragraph of § 3.1.3 of Luo describes refining network parameters through backpropagation. The use of backpropagation to refine network parameters clearly demonstrates that no portion of Luo's network, illustrated for example in Figure 3, operates independently. Thus, Claim 1 is additionally patentable over Luo for at least this reason.

In response, the independent aspects (i.e., independent neurons and independent processing, such as the image training process of fig. 3 or any other independent image processing such as “MSN” “processed” “pedestrian images”5 of fig. 4 ) of Luo still hold.
Thus, the backpropagation to refine network parameters is comprised by each independent image training process such that each independent MSN image-training process or operation of one image is independent of another independent MSN image-training process or operation of another image.






In view of the above remarks a new grounds of rejection is presented via said 
Demiralp et al. (US Patent App. Pub. No.: US 2020/0104753 A1) that teaches:
a)	labels over a time range in fig. 2: “Classifier Labels 158”: “Observer Labels 166” based on a cause and effect arrows in fig. 1 corresponding to claim 1’s cause and effect: “upon [the cause]…., generating [the effect]”; and
b)	a “probability” limit range from 0-100 (or 0 to 100) in terms of percent6 of a “threshold” with respect to said labels 158,166 with “accuracy” ([0021], last S):

    PNG
    media_image6.png
    1138
    1082
    media_image6.png
    Greyscale

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Regarding inquiry 4, see Suggestions.
Claim(s) 1,14,15 and 16,19 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Luo et al. (Pedestrian tracking in surveillance video based on modified CNN) in view of Demiralp et al. (US Patent App. Pub. No.: US 2020/0104753 A1).
Regarding claim 1, Luo teaches a system comprising: 
processing circuitry (or “machine”, pg. 24049, 1st full para, last S); and 
a memory (“8G of memory”, id.) storing instructions which, when executed by the processing circuitry, cause the processing circuitry to perform operations comprising: 
receiving, from a vision device comprising one or more cameras (via “different cameras”, section 3.1.1 Training set, 2nd S), a probe image via (fig. 4: “Probe pedestrian image”);
generating, using a trained (via “training…completed”, 3.1.3 The loss function, 1. SoftmaxWithLoss: 2nd para, last S) facial (via “half face”, 4.2.2 Pedestrian tracing, last para, last S) classification (via “classify”, 3.1.3 The loss function, 1st para, last S) neural (via “The multi-loss regularized deep neural network”, id., 1st S) engine (via driving or guiding, as shown by the parts in fig. 3, resulting in said “training…completed”), a first (via “k-th”, section: 3.1.3 The loss function: 1. SoftmaxWithLoss, last S) label[[s]] (via “generate…labels”, 3.1.1 Training set, penultimate S, at fig. 3:“Loss1[SoftmaxWithLoss] ”:label for classification: “Loss2[ContrastiveLoss]”:label for clustering or tracking) for a person (as shown in fig. 6(a)) depicted in the probe image, the first label7 acting as an identifier (via a “unique identification”) of the person depicted in the probe image, and generating a probability (or “output…a probability”, 3.1.3 The loss function:1. SoftmaxWithLoss: 1st para, 1st & 2nd Ss) for , the probability8 corresponding to a confidence (or “degree of confidence”) that the first label accurately identifies the person depicted in the probe image; 

upon determining that the probability is within a predefined low accuracy (via “accurately”, 3.1.2 Network structure design, 1st para, last S) range (via a “correlation… range of 0 to 1”, 3.1.3 The loss function: 1. SoftmaxWithLoss:1st para,2nd S) [[;]] , generating, using a supporting (or resulting in “refined…training”, 3.1.3 The loss function: 2. ContrastiveLoss: last para, last S) engine (via driving or guiding, as shown by other parts of said the parts in fig. 3, resulting in said “training…completed”), a second (via said k-th) label (via said “generate…labels”) for the person depicted in the probe image, the second label acting as an identifier of the person depicted in the probe image, wherein the supporting engine operates independently (or “independently processed”, 3 Our method, 7th S, represented by the extraction processes in fig. 3) of the trained facial classification neural engine; and
further training (via “numerous training iterations”, 3.1.3 The loss function: 2. ContrastiveLoss: last para, last S) the facial classification neural engine (or “the entire network”, id., penultimate S, as shown in figure 3 comprising said other parts of the parts) based on (via fig. 3: “Loss2[ContrastiveLoss]” being the basis for said “numerous training iterations”) the second label.  

Luo does not teach: 
upon determining that the probability is within a predefined low accuracy range  [[;]] , generating, using a supporting engine, a second label.



Demiralp teaches: 
upon (via input arrows in figs. 1 and 9 causing a resulting arrow) determining (expressing the action or result or determine by setting a threshold, fig. 2:204:202: “Threshold”) that the probability (resulting in a thresholded “defined…probability”, [0028], last two Ss, mapped to figs.1,2,9:156: “Output Data”:902: “Receive output data from a classification model”) is within a predefined (via “set” “threshold” “of 85% to 100%” or “50% to 100%, [0033] penult & last Ss) low (via “decreasing the performance”, [0064] 2nd S, in terms of “accuracy” “metrics”, [0021], last S) accuracy (via said decreasing accuracy metrics, represented as predefined-low-accuracy-white rectangle track in fig. 5 wherein the track is pre-“defined”, [0055] 2nd S, relative to the display of fig. 9:922: “Output the performance track on the user interface”) range (said threshold of 85%-100% or 50%-100% resulting in “threshold 204” “misclassifications”, [0035] penultimate S, represented as said predefined-low-accuracy-white rectangle track in fig. 5) [[;]] , generating (via other resulting arrows in figures 1 and 9), using a supporting (via generating “new”,“training” “labels”, [0021] 2nd S: to provide for the support of) engine (or “supported”, [0101] 1st S, “services”, [0109] 5th S, as “software modules”, [0022] last S, providing the support of), a second (via fig. 2: “Time” axis: “tb” is second after “ta”) label (via fig. 9:904: “Generate a set of classifier labels based on the output data” mapped to fig. 1:158: “Classifier Labels” so as to be generated on a display, fig. 1:102: “User Device”, via fig. 9:922: “Output the performance track on the user interface” wherein the track is “representing”, [0029] 1st S, or is equivalent to the generated labels).

	Thus one of ordinary skill in the art of probabilities and “motion detection algorithms on a video” (Demiralp [0027] 3rd S) and “moving object detection” (Luo, Abstract, 1st S) can modify Luo’s teaching of said “output…a probability” with Demiralp’s teaching of “defined…probability”: 
a)	install Luo’s fig. 4: “MSN” (Matching-Siamese network”) into Demiralp’s fig. 1 (see fig. 1 below: diagonal arrow to MSN);
b)	“retrain” (Demiralp [0061] last S) the installed MSN using Demiralp’s fig. 1:120: “Processor”; and
c)	 recognize that the modification is predictable or looked forward to because the modification “enhances development and debugging experience by data scientists utilizing the computer systems to develop and evaluate the predictive classification models.” (Demiralp [0020] last S):



    PNG
    media_image7.png
    1315
    903
    media_image7.png
    Greyscale

Regarding claim 14, Luo as combined teaches the system of claim 1, wherein the probe image is one of a plurality of images that track the person, the plurality of images being received from the vision device, wherein generating, using the supporting engine, the second label for the person depicted in the probe image comprises: 
determining (via fig. 4: “Result=1?”), using the trained facial classification neural engine (represented in fig. 4: “MSN”), that at least a threshold (regarding what is considered the truth via fig. 4: “Result=1?” wherein “1” represents being true) number (or frame number via fig. 4: “Pedestrian image of current frame is numbered sequentially” being true) of the plurality of images have a specified identification (via “identity”, Abstract, 6th S, represented: fig. 4: “Output match similarity”) with a probability (via said “a probability”) within a predefined high accuracy range (via said “correlation… range of 0 to 1”); and 
determining (said via fig. 4: “Result=1?”=a red bounding box) that the probe image (represented in Table 4 as the red bounding box around pedestrian 1) has the specified identification based on the at least the threshold number (represented in Table 4: frames 1-3) of the plurality of images having the specified identification (given that each frame 1-3 has the red bounding box around pedestrian 1).  
Regarding claim 15, Luo as combined teaches the system of claim 14, the operations further comprising: identifying (via said fig. 4: “Output match similarity”) the plurality of images that track the person based on timestamps (via Table 4: “pedestrian tracing…Time”) associated with the plurality of images and a physical position of the person within a space depicted in the plurality of images.  


Regarding claim 16, claim 16 is rejected the same as claim 1. Thus, argument presented in claim 1 is equally applicable to claim 16. Accordingly, Luo discloses claim 16 of a non-transitory machine-readable medium storing instructions which, when executed by processing circuitry of one or more computing machines, cause the processing circuity to perform operations comprising: 
receiving, from a vision device comprising one or more cameras, a probe image;  
generating, using a trained facial classification neural engine, a first label[[s]] for a person depicted in the probe image, the first label acting as an identifier of the person depicted in the probe image, and generating a probability for , the probability corresponding to a confidence that the first label accurately identifies the person depicted in the probe image; 
upon determining that the probability is within a predefined low accuracy range [[;]] , generating, using a supporting engine, a second label for the person depicted in the probe image, the second label acting as an identifier of the person depicted in the probe image, wherein the supporting engine operates independently of the trained facial classification neural engine; and
further training the facial classification neural engine based on the second label.  

Regarding claim 19, claim 19 is rejected the same as claim 14. Thus, argument presented in claim 14 is equally applicable to claim 19. Accordingly, Luo discloses claim 19 of the machine-readable medium of claim 16, wherein the probe image is one of a plurality of images that track the person, the plurality of images being received from the vision device, wherein generating, using the supporting engine, the second label for the person depicted in the probe image comprises: 
determining, using the trained facial classification neural engine, that at least a threshold number of the plurality of images have a specified identification with a probability within a predefined high accuracy range; and 
determining that the probe image has the specified identification based on the at least the threshold number of the plurality of images having the specified identification.  

Regarding claim 20, claim 20 is rejected the same as claim 1. Thus, argument presented in claim 1 is equally applicable to claim 16. Accordingly, Luo discloses claim 20 of a method comprising: 
receiving, from a vision device comprising one or more cameras, a probe image;
generating, using a trained facial classification neural engine, a first label[[s]] for a person depicted in the probe image, the first label acting as an identifier of the person depicted in the probe image, and generating a probability for , the probability corresponding to a confidence that the first label accurately identifies the person depicted in the probe image ; 
upon determining that the probability is within a predefined low accuracy range [[;]] , generating, using a supporting engine, a second label for the person depicted in the probe image, the second label acting as an identifier of the person depicted in the probe image, wherein the supporting engine operates independently of the trained facial classification neural engine; and
further training the facial classification neural engine based on the second label.  

Claims 2 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Luo et al. (Pedestrian tracking in surveillance video based on modified CNN) in view of Demiralp et al. (US Patent App. Pub. No.: US 2020/0104753 A1) as applied above further in view of Raut et al. (Result Oriented Based Face Recognition using Neural Network with Erosion and Dilation Technique).
Regarding claim 2, Luo teaches the system of claim 1, the operations further comprising: 
using the further trained facial classification neural engine to identify (in the context of being the same during pedestrian tracking or tracing) one or more persons in visual data from the vision device; and 
based on the identified one or more persons in the visual data, controlling access to a physical location or an electronic resource.  


Thus, Luo does not teach:
based on the identified one or more persons in the visual data, controlling access to a physical location or an electronic resource.






Thus, Raut teaches:
  based on the identified (via fig. 1: “Face recognition”) one or more persons in the visual data, controlling access to a physical location or an electronic resource (and thus “granting…access to physical and virtual domains”, 1st pg., left col, 3rd para, 1st S).
Thus one of ordinary skill in tracking as taught by both references (“tracking of known individuals” Raut, page 1822, l. col, 2nd full para, 1st S) can modify Luo’s tracking with Raut by:
a)	tracking people in video by “extracting…face…gait”, Raut, id., 2nd S;
b)	recognizing extracted faces in video; and
c)	recognizing that the modification is predictable or looked forward to because “Real time systems for identifying humans in a scene has a lot of importance in security and surveillance applications”, Raut, id., 1st S. 
Regarding claim 17, claim 17 is rejected the same as claim 2. Thus, argument presented in claim 2 is equally applicable to claim 17. Accordingly, Luo as combined teaches the machine-readable medium of claim 16, the operations further comprising: 
using the further trained facial classification neural engine to identify one or more persons in visual data from the vision device; and 
based on the identified one or more persons in the visual data, controlling access to a physical location or an electronic resource.




Claim 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Luo et al. (Pedestrian tracking in surveillance video based on modified CNN) in view of Demiralp et al. (US Patent App. Pub. No.: US 2020/0104753 A1) as applied above further in view of Buibas et al. (US Patent 10,282,852).
Regarding claim 3, Luo teaches the system of claim 1, wherein generating, using the supporting engine, the second label for the person depicted in the probe image comprises: 
generating the second label based on an identity card or token provided by the person or based on a user identifier and password entered by the person.  

Thus Luo does not teach:
generating the second label based on an identity card or token provided by the person or based on a user identifier and password entered by the person. 

Accordingly, Buibas teaches:
generating (via fig. 4:arrows pointing to “take” “put” “move”) the second label (via fig. 4:“output labels 412 and 413”:“take pizza”: “move soup can”: c.16,ll. 60-63:) based on (A) an identity card (via fig. 19:1904: “Insert Card”) or (B) token provided by the person or (C) based on a user identifier and password entered by the person.




Thus, one of ordinary skill in the art of tracking can modify Luo’s label with Buibas’ by:
a)	making Luo’s tracking L2 loss function label be as Buibas’s fig. 4: “take pizza”:“move soup can”:“take”:“put”:”move”;
b)	putting the surveillance/tracking cameras in a food store; and
c)	recognizing that the modification is predictable or looked forward to because the modification results in generating money from pizza and soup.
















Claim 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Luo et al. (Pedestrian tracking in surveillance video based on modified CNN) in view of Demiralp et al. (US Patent App. Pub. No.: US 2020/0104753 A1) as applied above further in view of Nambiar et al. (Gait-based Person Re-identification: A Survey).
Regarding claim 4, Luo teaches the system of claim 1, wherein generating, using the supporting engine, the second label for the person depicted in the probe image comprises: 
generating the second label based on a combination (via eqn. (8), pg. 24048, represented in fig. 4: “MSN”) of weak authentication factors, the weak authentication factors comprising one or more of: a height, a weight and a gait.  

Thus, Luo does not teach:
generating the second label based on a combination of weak authentication factors, the weak authentication factors comprising one or more of: a height, a weight and a gait.

Accordingly, Nambiar teaches:
generating the second label (or “a unique label”, pg. 33:3, 1st para, last S, represented in fig. 1: “Descriptor Generation”) based on a combination (via “Fusion”, page 33:14, 2.3.4 Mulit-modal Fusion) of weak authentication factors (or fig. 2:“Soft biometrics”, represented in fig. 1: “Feature extraction”, pg. 33:3), the weak authentication factors comprising one or more of: a height, a weight and a gait (via said fig. 2: “Body measurements…Gait”).
Thus, one of ordinary skill in the art of image extraction and tracking can modify Luo’s said eqn. (8) with Nambiar’s “unique label” by:
a)	making Luo’s fig. 4 be as Nambiar’s fig. 1 by inserting Nambiar’s fig. 1: “Descriptor Generation” right after Luo’s feature extraction of fig. 4: “MSN”;
b)	making Luo’s feature extraction extract gait;
c)	combining the extracted gait with another extracted feature;
d)	assigning Nambiar’s unique label at Luo’s fig. 4: “Result=1?”: “yes”; and
e)	recognizing that the modification is predictable or looked forward to because the combination “has important applications in tracking…when…discontinuities exist” and “to improve Re-ID results” or improve the results of identifying a person and then identifying the same person again, Nambiar, pg. 33:3, 2nd para, 5th S & pg. 33:14, 2.3.4 Multi-modal Fusion, 1st S.











Claim 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Luo et al. (Pedestrian tracking in surveillance video based on modified CNN) in view of Demiralp et al. (US Patent App. Pub. No.: US 2020/0104753 A1) as applied above further in view of Madhuri et al. (Pose-Robust Recognition of Low-Resolution Face Images).
Regarding claim 5, Luo teaches the system of claim 1, wherein generating, using the supporting engine, the second label for the person depicted in the probe image comprises: 
verifying, via at least one client computing device, a correct identification for the person depicted in the probe image.  

Thus, Luo does not teach:
verifying, via at least one client computing device, a correct identification for the person depicted in the probe image.

Accordingly, Madhuri teaches:
verifying (via “server…verification”, abstract,11th S), via at least one client computing device (via “client server architecture”, id., 9th S, as shown in fig. 1: “FACE architecture”), a correct (via “ ‘correction’ ”, 2nd pg., l.col, 3rd S) identification (or matched or recognized, the past-tense, via fig. 1: “Matching” or recognition) for the person depicted in the probe image (or “Face images in the probe”, 1st pg., l. col, 1st bullet: Pose Normalization).

Thus, one of ordinary skill in the art of video surveillance, as taught by both references, can modify Luo’s tracking label with Madhuri’s “server…verification” by:
a)	making Luo’s tracking L2 label be as shown in Madhuri’s server of fig. 1: “Reliability Evaluation”, comprising a “Face exemplars…label”, Madhuri, penultimate pg., l. col, last para, 2nd S);
b)	making Luo’s surveillance/tracking cameras be as clients via the upper half of Madhuri’s fig. 1; and
c)	recognizing that the modification is predictable or looked forward to because the modification provides said “ ‘correction’ ” during video surveillance.














Claim 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Luo et al. (Pedestrian tracking in surveillance video based on modified CNN) in view of Dimiralp et al. (US Patent App. Pub. No.: US 2020/0104753 A1) as applied above further in view of Madhuri et al. (Pose-Robust Recognition of Low-Resolution Face Images) as applied in claim 5 above further in view of Howard et al. (US Patent App. Pub. No.: US 2004/0213437 A1).
Regarding claim 6, Luo as combined teaches the system of claim 5, wherein verifying the correct identification (or said matched, the past-tense, via said L2 tracking label as modified via the combination with a Face exemplars label) comprises: 
providing, for display at the at least one client computing device, the probe image and a plurality of possible (via said “a probability”) identifications for the person; and 
receiving, from the at least one client device, a selection (via “the same targeted pedestrians selected”, pg. 24045, 2nd S) of one of the possible identifications as the correct identification.
	
Luo does not teach: 
providing, for display at the at least one client computing device, the probe image and a plurality of possible identifications for the person; and 
receiving, from the at least one client device, a selection of one of the possible identifications as the correct identification.


Thus, Howard teaches:
providing, for display (via fig. 1:20: “DISPLAY”) at the at least one client computing device (via fig. 2:10: “WORKSTATION”), the probe image (via fig. 13:100: “Probe Image”) and a plurality of possible identifications for the person (as shown in fig. 13:700: a display of faces); and 
receiving (via fig. 2: arrows), from the at least one client device (via fig. 2:10: “WORKSTATION”), a selection (via “selecting at least one of the images”, [0147]) of one of the possible identifications as the correct (via “color correction”, [0144], represented in fig. 2:15: “IMAGE/DATA CAPTURE”) identification (as shown in fig. 13).  
Thus, one of ordinary skill in the art of surveillance can modify the combination’s client (a camera with normalization correction) with Howards said fig. 1:20: “DISPLAY” by:
a)	making the combination’s client be as shown in Howard’s fig. 2:15: “IMAGE/DATA CAPTURE” to be displayed via Howards figs.1,2:10,20: “DISPLAY”: “WORKSTATION”;
b)	making the server tracking reliability confidence L2 label be determined at Howard’s fig. 2:25: “FACIAL RECOGNITION SEARCH SYSTEM”; and
c)	recognizing that the modification is predictable or looked forward to because the modification provides another correction in the combination: color correction in addition to normalization.



Claim 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Luo et al. (Pedestrian tracking in surveillance video based on modified CNN) in view of Dimiralp et al. (US Patent App. Pub. No.: US 2020/0104753 A1) as applied above further in view of Madhuri et al. (Pose-Robust Recognition of Low-Resolution Face Images) as applied in claim 5 above further in view of Messer et al. (US Patent App. Pub. No.: US 2020/0105111 A1).
Regarding claim 7, Luo as combined teaches the system of claim 5, wherein the at least one client computing device (via said L2 label as modified via the combination) comprises an administrator client computing device (via said “client server architecture”) and N (or 4 boxes in the upper half of Madhuri’s fig. 1: (1)“SPSI” ; (2)“Sample Selection”;(3) “Pose Normalization”; (4) “Illumination Normalization”) employee (via the server of said “client server architecture” comprising administration comprising an office comprising a staff comprising a group as employees) client computing devices (or any one box of said “client server architecture” as shown in Madhuri’s fig. 1), wherein N (said 4) is a positive integer greater than or equal to two, wherein verifying the correct identification (being the past-tense via being said matched or recognized) comprises: 
providing the probe image to at least a portion (or any one box of said “client server architecture”) of the N employee client computing devices; 





upon receiving (via Madhuri: fig. 1:arrows), from at least M (via said 4) employee client computing devices (said or any one box), a consistent identification (via said “server… verification”) of the person: verifying (via said “server…verification”) that the consistent identification is the correct identification (being the past-tense or matched via Madhuri’s fig. 1: “Matching” that is server-verified being consistent with the match), wherein M is a positive integer between half of N and N; and 
upon failing to receive, from the at least M employee client computing devices, the consistent identification of the person: providing the probe image to the administrator client computing device for verifying the correct identification via the administrator client computing device (wherein server is defined via Dictionary.com:
CULTURAL DEFINITIONS FOR SERVER
server
Computer or software that performs administration or coordination functions within a network.

wherein “administration” is defined:
administration
noun
1	the management of any office, business, or organization; direction.

wherein “office” is defined:
office
noun
4	the staff or designated part of a staff at a commercial or industrial organization:
The whole office was at his wedding.

wherein “staff” is defined:
staff
noun, plural staffs for 1-5, 9; staves  [steyvz] or staffs for 6-8, 10, 11.
1	a group of persons, as employees, charged with carrying out the work of an establishment or executing some undertaking.).  


Thus, Luo as combined does not teach:
upon failing to receive, from the at least M employee client computing devices, the consistent identification of the person: providing the probe image to the administrator client computing device for verifying the correct identification via the administrator client computing device.  

Accordingly, Messer teaches:
	upon failing (resulting in fig. 6:602: “Facial Recognition event with poor confidence”) to receive (and thus resulting in fig. 6:603: “Prompt issued to user”), from the at least M employee client computing devices (via fig. 1:1A,1B), the consistent  identification (and thus instead resulting in fig. 6:607: “Confirmation received from operator”) of the person (represented as a face via fig. 6:602: “Facial Recognition event with poor confidence”): providing the probe image to the administrator (via fig. 1:21,22: “Server A”: “Server B”) client (via fig. 1:4: “Customer Viewing Equipment”) computing device (as shown in fig. 1:all via said upon failing, “This prompt can include the… probe image”, [0301]) for verifying (via said fig. 6:607: “Confirmation received from operator”) the correct identification (in the past-tense via said fig. 6:602: “Facial Recognition event with poor confidence” “appears to have correctly identified the subject”, [0302], penultimate S) via the administrator client computing device.  



Thus, one of ordinary skill in matching and surveillance can modify the combination’s said M=N=4 boxes with Messer’s said fig. 6:603: “Prompt issued to user” by:
a)	making said four boxes be as Messer’s fig. 1:1A,1B;
b)	installing the program of Messer’s fig. 6:601-607 into the server half of Madhuri’s fig. 1: “Matching”; and
c)	recognizing that the modification is predictable or looked forward to because the modification is used such that the “surveillance system may serve to issue a prompt to user to attempt to acquire a better image”, Messer [0297].














Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Luo et al. (Pedestrian tracking in surveillance video based on modified CNN) in view of Dimiralp et al. (US Patent App. Pub. No.: US 2020/0104753 A1) as applied above further in view of Madhuri et al. (Pose-Robust Recognition of Low-Resolution Face Images) as applied in claim 5 above further in view of Messer et al. (US Patent App. Pub. No.: US 2020/0105111 A1) as applied in claim 7 above further in view of NIKNAM et al. (EP 2 978 249 A1).
Regarding claim 8, Luo as combined teaches the system of claim 7, wherein the N employee client computing devices are selected based on a corporate department or an office geographic location of at least one of the plurality of possible identifications. 

Thus, Luo as combined does not teach claim 8 as a whole. 

Accordingly, Niknam teaches claim 8 of:
The N employee client computing devices are selected (resulting in “selecting one or more client devices”, abstract, ll. 9,10) based on a corporate department or an office (via a “server system”, id., l.8, comprising said office staff employees) geographic location (or “predetermined” “geographical location”, id, ll. 10,11) of (Markush limitation follows: at least one) at least one of the plurality of possible identifications (via “possibly… recognizing each other’s face”, [0009], penultimate S and [0010], 4th S, as shown in fig. 5(b):510: happy face).


Thus, one of ordinary skill in the art of clients and servers can modify the combination’s memory machine as modified via the combination of Madhuri as modified via Messer with Niknam’s said “selecting one or more client devices” by:
a)	making the combination’s memory machine be as Niknam’s fig. 1:110: a server and 101-105:clients;
b)	performing face recognition or identification or authentication over the internet; and
c)	recognizing that the modification is predictable or looked forward to because the modification results in a “minimum of overhead”, Nikman [0010] 1st S.














Claims 9,10,11 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Luo et al. (Pedestrian tracking in surveillance video based on modified CNN) in view of Demiralp et al. (US Patent App. Pub. No.: US 2020/0104753 A1) as applied above further in view of Bhatt et al. (Improving Cross-Resolution Face Matching Using Ensemble-Based Co-Transfer Learning).
Regarding claim 9, Luo teaches the system of claim 1, wherein generating, using the supporting engine, the second label for the person depicted in the probe image comprises: 
providing the probe image to a training dataset for a semi-supervised learning facial classification engine; 
training the semi-supervised learning facial classification engine using the training dataset; 
generating, using the semi-supervised learning facial classification engine, the second label for the person depicted in the probe image and a probability value for the second label; and 
adjusting the trained facial classification neural engine based on the trained semi-supervised learning facial classification engine.  


Thus, Luo does not teach claim 9 as a whole.



Accordingly, Bhat teaches:
providing (via figs. 4,5:arrows) the probe image (via figs. 4,5: “Unlabeled probe instances from TD”: “Training data in target domain”) to a training dataset (or fig. 4: “Knowledge learnt from SD”) for a semi-supervised learning facial classification engine (so as to use “transfer learning for face recognition as a semi-supervised approach”, pg. 5658, left col, 1st full para, last S); 
training (via figs. 4,5: “Co-training”) the semi-supervised learning facial classification engine using the training dataset; 
generating, using the semi-supervised learning facial classification engine, the second label (via fig. 5: “Pseudo labels provided by E”) for the person (as shown in figs. 1,2) depicted in the probe image and a probability value (or a “confidently predicting…distance”, pg. 5659, l. col, penultimate S) for the second label; and 
adjusting (via “Updating…to…adjust”, pg. 5662, r. col, 1st bullet, represented in fig. 5 as weight “w”) the trained facial classification neural engine (as shown in fig. 5) based on the trained semi-supervised learning facial classification engine (as shown in fig. 5).







Thus, one of ordinary skill in the art of classifiers and tracking, as taught by both references: “track the activities”, Bhatt, pg. 5654, r. col, 5th full S, can modify Luo’s generation of the L2 loss function tracking label with Bhat’s teaching of said figs. 4,5:arrows by:
a)	making Luo’s MSN be as shown in Bhatt’s fig. 5: “Classifier” or fig. 6: “Feature extraction”: “SVM classifiers” or figs. 6(a)(b): ensemble “E”;
b)	matching faces by generating the matching label via Luo’s L1 softmax loss classification function;
c)	tracking activities by generating Luo’s contrastive L2 loss function clustering and tracking label; and
d)	recognizing that the modification is predictable or looked forward to because the modification is “efficiently matching low resolution probes with high resolution gallery”, Bhatt, right column 1st bullet.










Regarding claim 10, Luo as combined teaches the system of claim 9, wherein providing the probe image to the training dataset for the semi-supervised learning facial classification engine is in response to determining (via said MSN as modified via the combination of said “confidently predicting…distance”, pg. 5659, left col, penultimate S) that a quality (or fig. 5: “View” being a feature, represented in Bhatt’s fig. 6: “Feature extraction”, wherein “feature” comprises a distinguishing quality) of the probe image exceeds a quality threshold (or “the genuine threshold”, id., wherein “feature” is defined via Dictionary.com:
feature
noun
1	a prominent or conspicuous part or characteristic:
Tall buildings were a new feature on the skyline.

wherein “characteristic” is defined:
characteristic
noun
a distinguishing feature or quality:
Generosity is his chief characteristic.).  

Regarding claim 11, Luo as combined teaches the system of claim 10, wherein the quality of the probe image is computed using a quality measuring neural engine (via Bhatt’ Algorithm 1: “Process: Train classifiers” in page 5660 and represented in figs. 5,6(b): “Co-training”, comprising said MSN).  






Regarding claim 18, claim 18 is rejected the same as claim 9. Thus, argument presented in claim 9 is equally applicable to claim 18. Accordingly, Luo as combined teaches claim 18 of the machine-readable medium of claim 16, wherein generating, using the supporting engine, the second label for the person depicted in the probe image comprises: 
providing the probe image to a training dataset for a semi-supervised learning facial classification engine; 
training the semi-supervised learning facial classification engine using the training dataset; 
generating, using the semi-supervised learning facial classification engine, the second label for the person depicted in the probe image and a probability value for the second label; and
adjusting the trained facial classification neural engine based on the trained semi-supervised learning facial classification engine.  









Claim 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Luo et al. (Pedestrian tracking in surveillance video based on modified CNN) in view of Dimiralp et al. (US Patent App. Pub. No.: US 2020/0104753 A1) as applied above further in view of Bhatt et al. (Improving Cross-Resolution Face Matching Using Ensemble-Based Co-Transfer Learning) as applied in claim 10 above further in view of Ahonen et al. (Recognition of Blurred Faces Using Local Phase Quantization).
Regarding claim 12, Luo as combined teaches the system of claim 10, wherein the quality of the probe image comprises a blurriness of the probe image.  
Thus, Luo as combined does not teach “the quality of the probe image comprises a blurriness”. 
Accordingly, Ahonen, as cited by Bhatt, teaches “Blur…quality of…imaging”, 1st pg., l.col, last S.
Thus, one of ordinary skill in the art of features can modify the combination’s feature extraction with Ahonen’s “Blur…quality of…imaging” by:
a)	making the feature extraction of the combination also include Bhatt’s LPQ as shown in Bhatt’s fig. 6: “LPQ/SIFT”; and
b)	recognizing that the modification is predictable or looked forward to because LPQ “is very robust not only to blur but also to other challenges such as lighting and facial expression variations present in real-world images”, Ahonen, last pg., r.col, 1st full para, 4th S.



Claim 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Luo et al. (Pedestrian tracking in surveillance video based on modified CNN) in view of Demiralp et al. (US Patent App. Pub. No.: US 2020/0104753 A1) as applied above further in view of Bhatt et al. (Improving Cross-Resolution Face Matching Using Ensemble-Based Co-Transfer Learning) as applied in claim 9 above further in view of Alba-Castro et al. (Audiovisual biometric verification).
Regarding claim 13, Luo as combined teaches the system of claim 9, wherein generating, using the supporting engine, the second label for the person depicted in the probe image further comprises: 
determining that the probability value (via said “confidently predicting…distance”, pg. 5659, left col, penultimate S) for the second label is below a probability threshold (via said “the genuine threshold”, id., in the rejection of claim 10); and 
in response to the probability value for the second label being below the probability threshold: verifying, via at least one client computing device, a correct identification for the person depicted in the probe image.

Thus, Luo as combined does not teach:
in response to the probability value for the second label being below the probability threshold: verifying, via at least one client computing device, a correct identification for the person depicted in the probe image.
	


Accordingly, Alba-Castro teaches:
in response to the probability value for the second label being below the probability threshold (via pg. 190: fig. 7: “Threshold” and “Thresholding”): verifying (expressing the act or result of verify via “face…verification”, pg. 180: abstract, penultimate S), via at least one client computing device (via “client-server architectures”, pg., 197, 3rd full para, 1st S), a correct identification (or “true identity”, pg. 190, last S) for the person (as shown in pg. 182: fig. 1) depicted in the probe image (via fig. 6: “Probe image”).
Thus, one of ordinary skill in the art of face matching can modify the combination’s MSN as modified via Bhatt with Alba-Castro’s teaching of said fig. 7: “Threshold” and “Thresholding” by:
a)	making said combination’s “the genuine threshold” be as shown in Alba-Castro’s said fig. 7: “Threshold” and “Thresholding”;
b)	making Luo’s 8G memory machine be as Alba-Castro’s said “client-server architecture” by installing “The algorithms…in a web”, Alba-Castro, pg. 181, 1st full para, 2nd S); and 
c)	recognizing that the modification is predictable or looked forward to because “ using a global threshold show very promising results if compared to the best results of all the tests, obtained using accuracy-based node selection with user-specific thresholds. In general, a global threshold is preferred to user specific thresholds because the system will be less database-dependent and performance should not decrease too much on actual running-time.”, Alba-Castro, pg. 199, 2nd para, 3rd S.
Suggestions
Applicant’s disclosure states [0035]:
“The authentication engine generates the second label by determining, using the trained facial classification neural engine, that at least a threshold number of the plurality of images (e.g., at least five images or at least 60% of the images) have a specified identification with a probability within a predefined high accuracy range (e.g., at least 90%).”

	This is not claimed. Thus the lack of this is an indication of obviousness.
Note that these suggestions are not provided with respect to overcoming 35 USC 101,112,102 and/or 103. These suggestion are mainly provided to seek out advantages in the disclosure regardless of 35 USC 101,112,102 and/or 103.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENNIS ROSARIO whose telephone number is (571)272-7397. The examiner can normally be reached Monday-Friday, 9AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571)272-7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DENNIS ROSARIO/Examiner, Art Unit 2667  

/MATTHEW C BELLA/Supervisory Patent Examiner, Art Unit 2667                                                                                                                                                                                                                                                                                                                                                                                                              


    
        
            
        
            
    

    
        1 label: computing a group of characters, such as a number or a word, appended to a particular statement in a program to allow its unique identification wherein program is defined:
        A organized system of instructions and data interpreted by a computer. Programming instructions are often referred to as code. See more at source code. See also programming language (Dictionary.com).
        
        2 engine: Computers. a piece or collection of software that drives a later process (used in combination, as in game engine; software engine) wherein “drives” is defined: to cause and guide the movement of (a vehicle, an animal, etc.) (Dictionary.com)
        3 train: to guide or teach (to do something), as by subjecting to various exercises or experiences (Dictionary.com)
        4 “The SoftmaxWithLoss loss function would gradually converge to zero after several backpropagations, which suggests that the training process has been completed.” pg. 24047, 3.1.3 The loss function: 1. SoftmaxWithLoss, 2nd para, last S. 
        5 “If not or the corresponding information has already been stored, the system will verify if all numbered
        pedestrian images of the current frame have been processed through MSN.”, Luo, pg. 24048, 3.2 The flow chart of pedestrian tracking in surveillance video, bullet iv), last S.
        6   “For example, if the threshold is 50%, then the processor 120 may sample the probability scores from the output data 156 with scores that are 50% or above, at time interval T, to produce the classifier labels 158.” (Dimiralp [0028], last S).
        7 label: computing a group of characters, such as a number or a word, appended to a particular statement in a program to allow its unique identification (Dictionary.com)
        8 probability: statistics a measure or estimate of the degree of confidence one may have in the occurrence of an event, measured on a scale from zero (impossibility) to one (certainty). It may be defined as the proportion of favourable outcomes to the total number of possibilities if these are indifferent (mathematical probability), or the proportion observed in a sample (empirical probability), or the limit of this as the sample size tends to infinity (relative frequency), or by more subjective criteria (subjective probability)