DETAILED ACTION
This office action is in response to the Application No. 16244240 filed on
 09/21/2022. Claims 2, 9 and 16 has been cancelled, claims 1, 3-8, 10-15 and 17-20 are presented for examination and are currently pending. Applicant’s arguments have been carefully and respectfully considered.

Response to Arguments
2.	Applicant’s arguments are moot in view of the new grounds of rejection.  The examiner is withdrawing the rejections in the previous office action 06/22/2022 because the applicant amendments necessitated the new grounds of rejection presented in this
office action. Accordingly, this action is made final.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



3.	Claims 1, 4, 6, 8, 13, 15, 17 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Madden (US20190228397 filed 01/25/2018) in view of Huang et al (US20180365564)

	Regarding claim 1, Madden teaches a computer-implemented method comprising: obtaining, with at least one processor, (In example embodiments, one or more computer systems (e.g., a standalone, client or server computer system) or one or more modules of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a module that operates to perform certain operations as described herein. [0076]; checkout analyzer process node 210-2 may correspond to a single processing script in program storage 134 executed by processor 110 [0044]) 
	first training data associated with a first set of features ( a textual input (e.g., a product UPC) V1 may be passed to input layer 304 [0054], Fig. 3A; The order inquiry may include an indication of the item(s) or product(s) the consumer intends to purchase. Herein, the product(s) may be known as “products of interest”. For example, a list of universal product codes (UPCs) [0044]. Examiner notes that V1 is associated with a user’s product of interest to purchase) and 
	second training data associated with a second set of features different than the first set of features, (V2 may correspond to the current balance of a user [0054], Fig. 3A; current balance 620 [0061], Fig. 6A) 
	wherein each of the first set of features and the second set of features is associated with a same plurality of transactions (same payment transactions such as initiating credit payments or payment transactions such as pay with Economizer, Fig. 6A) 	and a same plurality of labels for the same plurality of transactions; (… a single consumer's purchases as labeled data [0032]; Such a neural network may be trained using labeled data; i.e., data in which information pertaining to a group of users is explicitly linked to a respective set of purchases made by the group of users [0031])
	 training, with at least one processor, (checkout analyzer process node 210-2 may correspond to a single processing script in program storage 134 executed by processor 110 [0044]) 
	a first model (Multiple layers of neural network 300 may correspond to respective models. For example, hidden layers 306-1 through 306-(J-10) may correspond to spending patterns module 152 [0053], Fig. 3A; Spending patterns module 152 may be a machine learning or other model configured to quantify past spending, such as over a time interval (e.g., three months) with respect to one or more consumers [0031])
	based on the first training data, the second training data, (first training data V1, second training data V2, Fig. 3A) and 
	the same plurality of labels; (… a single consumer's purchases as labeled data [0032]; Such a neural network may be trained using labeled data; i.e., data in which information pertaining to a group of users is explicitly linked to a respective set of purchases made by the group of users [0031]) and 
	training, with at least one processor, (checkout analyzer process node 210-2 may correspond to a single processing script in program storage 134 executed by processor 110 [0044]) 
	a second model, (while hidden layers 306-(J-9) through 306-J may correspond to future profits module 154 [0053], Fig 3A; Future profits module 154 may be a machine learning or other model configured to quantify or predict future profits. For example, future profits module 154 may be an artificial neural network [0032]) 	using a loss function (loss function [0057]) 
	that depends on an output of an intermediate layer of the first model and an output of the second model, (With respect to FIG. 3B, an example neuron 320 is depicted. Neuron 320 may correspond to neuron H1 of hidden layer 306-1 in FIG. 3A … The output of the function f may be provided to any number of subsequent layers, as depicted. For example, the output of the function f may indicate a purchase category, a future monthly average purchase [0055]) 
	based on the second training data, (second training data V2, Fig. 3A)
	wherein the second model includes at least one first layer and at least one second layer, (Each model comprising multiple layers may be weighted [0054]; while hidden layers 306-(J-9) through 306-J may correspond to future profits module 154 [0053], Fig 3A;)
	 wherein the output of the second model includes an output of the at least one first layer, (so that the value(s) output by output layer 308 represent a consensus among the multiple models with respect to the accuracy of the prediction of future consumer profits to the seller [0054]; The output of the function f may be provided to any number of subsequent layers [0055]) and 
	wherein training the second model (For example, future profits module 154 may be an artificial neural network trained using a single consumer's purchases as labeled data [0032]) further comprises: 
	modifying, using the loss function that depends on the output of the intermediate layer of the first model and the output of the second model including the output of the at least one first layer, one or more parameters of the at least one first layer of the second model to learn to match the output of the intermediate layer of the first model; (The weights may be adjusted as the network is successively trained, by using one of several gradient descent algorithms, to reduce loss and to cause the values output by the network to converge to expected, or “learned”, values. In an embodiment, a regression neural network may be selected to predict pricing values which has no activation function. Therein, input data may be normalized by mean centering, and a mean squared error loss function may be used [0057]; The output of the function f may be provided to any number of subsequent layers, as depicted. For example, the output of the function f may indicate a purchase category, a future monthly average purchase [0055]) and 
	training the at least one second layer based on the output of the at least one first layer (The output of the function f may be provided to any number of subsequent layers …  For example, the output of the function f may indicate a purchase category, a future monthly average purchase [0055]) and 
	the same plurality of labels (Such a neural network may be trained using labeled data; i.e., data in which information pertaining to a group of users is explicitly linked to a respective set of purchases made by the group of users [0031]
	Madden teaches loss function (loss function … The weights may be adjusted as the network is successively trained, by using one of several gradient descent algorithms, to reduce loss and to cause the values output by the network to converge to expected, or “learned”, values [0057]) but does not explicitly teach using loss function that depends on an output of an intermediate layer of the first model and an output of the second model.
	Huang teaches a first model (selecting, by a training device, a teacher network performing the same functions of a student network [0038]; a device for training a neural network, the structure of the device is illustrated in FIG. 5, which including: a processor 51 and at least one memory 52, the at least one memory storing at least one machine executable instruction, which is executed by the processor to:[0082])
	a second model, (the processor 51 executes the at least one machine executable instruction to iteratively train the student network [0085])
	using a loss function that depends on an output of an intermediate layer of the first model (In equation (5), 
    PNG
    media_image1.png
    38
    38
    media_image1.png
    Greyscale
(ytrue,pS) refers to the cross-entropy loss function, 
    PNG
    media_image2.png
    38
    29
    media_image2.png
    Greyscale

 MMD 2 (FT, FS) refers to the distance loss function, λ refers to the weight of distance loss function, FT refers to the feature map (i.e. the features of the first middle layer) output from the first specific network layer of the teacher network given the training sample data, [0057] and [0050]-[0054]) and 
	an output of the second model, (FS refers to the feature map (i.e. the features of the second middle layer) output from the second specific network layer of the student network given the training sample data, [0057])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Madden to incorporate the teachings of Huang for the benefit of transferring knowledge of features of a middle layer of the teacher network to the student network. (Huang, abstract)

	Regarding claim 4, Modified Madden teaches the computer-implemented method of claim 1, Madden teaches further comprising: determining, with at least one processor, (checkout analyzer process node 210-2 may correspond to a single processing script in program storage 134 executed by processor 110 [0044])
	Huang teaches a plurality of information values of a plurality of intermediate layers of the first model; (the features of the first middle layer refer to feature maps output from a first specific network layer of the teacher network after the training sample data are provided to the teacher network [0041]. Examiner notes: the first middle layers are the intermediate layer. The teacher model is the first model) and
	selecting, with at least one processor, (The high-performance teacher network with the same functions of the student network could be selected from a set of preset neural network models [0039]; a device for training a neural network, the structure of the device is illustrated in FIG. 5, which including: a processor 51 and at least one memory 52, the at least one memory storing at least one machine executable instruction, which is executed by the processor to: [0082])
	the intermediate layer from the plurality of intermediate layers based on the plurality of information values.(As for the neural network training scheme provided by the embodiments of the present application, on one aspect, it can train and obtain student networks with a broader application range through aligning features of middle layers of teacher networks with those of student networks [0036].Examiner notes: middle layers of the teacher network (first model) is the intermediate layers)
	The same motivation to combine independent claim 1 applies here.

	Regarding claim 6, Modified Madden teaches the computer-implemented method of claim 1, Huang teaches wherein the first model includes a greater number of parameters than the second model (The teacher network is characterized by high performance and high accuracy; but, compared to the student network, it has some obvious disadvantages such as complex structure, a large number of parameters and weights, and low computation speed. The student network is characterized by fast computation speed, average or poor performance, and simple network structure [0066]; to train student networks (featuring a small amount of network parameters, poor performance and high-speed computation) [0005]. Examiner notes: the first model is the teacher network and the student network is the second model)

	Regarding claim 8, Madden teaches a computing system comprising: at least one processor programmed and/or configured to: obtain (In example embodiments, one or more computer systems (e.g., a standalone, client or server computer system) or one or more modules of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a module that operates to perform certain operations as described herein. [0076]; checkout analyzer process node 210-2 may correspond to a single processing script in program storage 134 executed by processor 110 [0044]) 
	 first training data associated with a first set of features (a textual input (e.g., a product UPC) V1 may be passed to input layer 304 [0054], Fig. 3A; The order inquiry may include an indication of the item(s) or product(s) the consumer intends to purchase. Herein, the product(s) may be known as “products of interest”. For example, a list of universal product codes (UPCs) [0044]. Examiner notes that V1 is associated with a user’s product of interest to purchase) and 
	second training data associated with a second set of features different than the first set of features, (V2 may correspond to the current balance of a user [0054], Fig. 3A; current balance 620 [0061], Fig. 6A) 
	wherein each of the first set of features and the second set of features is associated with a same plurality of transactions (same payment transactions such as initiating credit payments or payment transactions such as pay with Economizer, Fig. 6A) 	and a same plurality of labels for the same plurality of transactions; (… a single consumer's purchases as labeled data [0032]; Such a neural network may be trained using labeled data; i.e., data in which information pertaining to a group of users is explicitly linked to a respective set of purchases made by the group of users [0031])
	train a first model (checkout analyzer process node 210-2 may correspond to a single processing script in program storage 134 executed by processor 110 [0044]; Multiple layers of neural network 300 may correspond to respective models. For example, hidden layers 306-1 through 306-(J-10) may correspond to spending patterns module 152 [0053], Fig. 3A; Spending patterns module 152 may be a machine learning or other model configured to quantify past spending, such as over a time interval (e.g., three months) with respect to one or more consumers [0031])
	based on the first training data, the second training data, (first training data V1, second training data V2, Fig. 3A) and 
	the same plurality of labels; (… a single consumer's purchases as labeled data [0032]; Such a neural network may be trained using labeled data; i.e., data in which information pertaining to a group of users is explicitly linked to a respective set of purchases made by the group of users [0031]) and 
	train a second model, checkout analyzer process node 210-2 may correspond to a single processing script in program storage 134 executed by processor 110 [0044]; while hidden layers 306-(J-9) through 306-J may correspond to future profits module 154 [0053], Fig 3A; Future profits module 154 may be a machine learning or other model configured to quantify or predict future profits. For example, future profits module 154 may be an artificial neural network [0032]
	using a loss function (loss function [0057]) 
	that depends on an output of an intermediate layer of the first model and an output of the second model, (With respect to FIG. 3B, an example neuron 320 is depicted. Neuron 320 may correspond to neuron H1 of hidden layer 306-1 in FIG. 3A … The output of the function f may be provided to any number of subsequent layers, as depicted. For example, the output of the function f may indicate a purchase category, a future monthly average purchase [0055]) 
	based on the second training data, (second training data V2, Fig. 3A)
	wherein the second model includes at least one first layer and at least one second layer, (Each model comprising multiple layers may be weighted [0054]; while hidden layers 306-(J-9) through 306-J may correspond to future profits module 154 [0053], Fig 3A;)
	wherein the output of the second model includes an output of the at least one first layer, (so that the value(s) output by output layer 308 represent a consensus among the multiple models with respect to the accuracy of the prediction of future consumer profits to the seller [0054]; The output of the function f may be provided to any number of subsequent layers [0055]) and 
	wherein the at least one processor further trains the second model by: (For example, future profits module 154 may be an artificial neural network trained using a single consumer's purchases as labeled data [0032])
	 modifying, using the loss function that depends on the output of the intermediate layer of the first model and the output of the second model including the output of the at least one first layer, one or more parameters of the at least one first layer of the second model to learn to match the output of the intermediate layer of the first model; (The weights may be adjusted as the network is successively trained, by using one of several gradient descent algorithms, to reduce loss and to cause the values output by the network to converge to expected, or “learned”, values. In an embodiment, a regression neural network may be selected to predict pricing values which has no activation function. Therein, input data may be normalized by mean centering, and a mean squared error loss function may be used [0057]; The output of the function f may be provided to any number of subsequent layers, as depicted. For example, the output of the function f may indicate a purchase category, a future monthly average purchase [0055]) and 
	training the at least one second layer based on the output of the at least one first layer (The output of the function f may be provided to any number of subsequent layers …  For example, the output of the function f may indicate a purchase category, a future monthly average purchase [0055]) and 
	the same plurality of labels. (Such a neural network may be trained using labeled data; i.e., data in which information pertaining to a group of users is explicitly linked to a respective set of purchases made by the group of users [0031])
	Madden teaches loss function (loss function … The weights may be adjusted as the network is successively trained, by using one of several gradient descent algorithms, to reduce loss and to cause the values output by the network to converge to expected, or “learned”, values [0057]; ) but does not explicitly teach using loss function that depends on an output of an intermediate layer of the first model and an output of the second model,
	Huang teaches a first model (selecting, by a training device, a teacher network performing the same functions of a student network [0038]; a device for training a neural network, the structure of the device is illustrated in FIG. 5, which including: a processor 51 and at least one memory 52, the at least one memory storing at least one machine executable instruction, which is executed by the processor to:[0082])
	a second model, (the processor 51 executes the at least one machine executable instruction to iteratively train the student network [0085])
	using a loss function that depends on an output of an intermediate layer of the first model (In equation (5), 
    PNG
    media_image1.png
    38
    38
    media_image1.png
    Greyscale
(ytrue,pS) refers to the cross-entropy loss function, 
    PNG
    media_image2.png
    38
    29
    media_image2.png
    Greyscale

 MMD 2 (FT, FS) refers to the distance loss function, λ refers to the weight of distance loss function, FT refers to the feature map (i.e. the features of the first middle layer) output from the first specific network layer of the teacher network given the training sample data, [0057] and [0050]-[0054]) and 
	an output of the second model, (FS refers to the feature map (i.e. the features of the second middle layer) output from the second specific network layer of the student network given the training sample data, [0057])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Madden to incorporate the teachings of Huang for the benefit of transferring knowledge of features of a middle layer of the teacher network to the student network (Huang, abstract)

	Regarding claim 11, Modified Madden teaches the computing system of claim 8, 
Madden teaches wherein the at least one processor is further programmed and/or configured to: (In example embodiments, one or more computer systems (e.g., a standalone, client or server computer system) or one or more modules of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a module that operates to perform certain operations as described herein. [0076]; checkout analyzer process node 210-2 may correspond to a single processing script in program storage 134 executed by processor 110 [0044])
	Huang teaches determine a plurality of information values of a plurality of intermediate layers of the first model; (the features of the first middle layer refer to feature maps output from a first specific network layer of the teacher network after the training sample data are provided to the teacher network [0041]. Examiner notes: the first middle layers are the intermediate layer. The teacher model is the first model) and
	select the intermediate layer from the plurality of intermediate layers based on the plurality of information values. (The high-performance teacher network with the same functions of the student network could be selected from a set of preset neural network models [0039]; As for the neural network training scheme provided by the embodiments of the present application, on one aspect, it can train and obtain student networks with a broader application range through aligning features of middle layers of teacher networks with those of student networks [0036].Examiner notes: middle layers of the teacher network (first model) is the intermediate layers)
	The same motivation to combine independent claim 8 applies here.

	Regarding claim 13, Modified Madden teaches the computing system of claim 8, Huang teaches wherein the first model includes a greater number of parameters than the second model (The teacher network is characterized by high performance and high accuracy; but, compared to the student network, it has some obvious disadvantages such as complex structure, a large number of parameters and weights, and low computation speed. The student network is characterized by fast computation speed, average or poor performance, and simple network structure [0066]; to train student networks (featuring a small amount of network parameters, poor performance and high-speed computation) [0005]. Examiner notes: the first model is the teacher network and the student network is the second model)
	The same motivation to combine independent claim 8 applies here.

	Regarding claim 15, Madden teaches a computer program product comprising at least one non-transitory computer-readable medium including program instructions that, when executed by at least one processor, cause the at least one processor to: (certain embodiments are described herein as including logic or a number of routines, subroutines, applications, or instructions. These may constitute either software (code embodied on a non-transitory, tangible machine-readable medium) or hardware. In hardware, the routines, etc., are tangible units capable of performing certain operations and may be configured or arranged in a certain manner. In example embodiments, one or more computer systems (e.g., a standalone, client or server computer system) or one or more modules of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a module that operates to perform certain operations as described herein [0076])
	obtain first training data associated with a first set of features (a textual input (e.g., a product UPC) V1 may be passed to input layer 304 [0054], Fig. 3A; The order inquiry may include an indication of the item(s) or product(s) the consumer intends to purchase. Herein, the product(s) may be known as “products of interest”. For example, a list of universal product codes (UPCs) [0044]. Examiner notes that V1 is associated with a user’s product of interest to purchase) and 
	second training data associated with a second set of features different than the first set of features, (V2 may correspond to the current balance of a user [0054], Fig. 3A; current balance 620 [0061], Fig. 6A) 
	wherein each of the first set of features and the second set of features is
associated with a same plurality of transactions (same payment transactions such as initiating credit payments or payment transactions such as pay with Economizer, Fig. 6A) and 
	a same plurality of labels for the same plurality of transactions; (… a single consumer's purchases as labeled data [0032]; Such a neural network may be trained using labeled data; i.e., data in which information pertaining to a group of users is explicitly linked to a respective set of purchases made by the group of users [0031])
	train a first model (checkout analyzer process node 210-2 may correspond to a single processing script in program storage 134 executed by processor 110 [0044]; Multiple layers of neural network 300 may correspond to respective models. For example, hidden layers 306-1 through 306-(J-10) may correspond to spending patterns module 152 [0053], Fig. 3A; Spending patterns module 152 may be a machine learning or other model configured to quantify past spending, such as over a time interval (e.g., three months) with respect to one or more consumers [0031]) 
	 based on the first training data, the second training data, (first training data V1, second training data V2, Fig. 3A) and 
	the same plurality of labels; (… a single consumer's purchases as labeled data [0032]; Such a neural network may be trained using labeled data; i.e., data in which information pertaining to a group of users is explicitly linked to a respective set of purchases made by the group of users [0031]) and
	train a second model, (checkout analyzer process node 210-2 may correspond to a single processing script in program storage 134 executed by processor 110 [0044]; while hidden layers 306-(J-9) through 306-J may correspond to future profits module 154 [0053], Fig 3A; Future profits module 154 may be a machine learning or other model configured to quantify or predict future profits. For example, future profits module 154 may be an artificial neural network [0032]) 
	using a loss function (loss function [0057]) 
	 that depends on an output of an intermediate layer of the first model and an output of the second model, (With respect to FIG. 3B, an example neuron 320 is depicted. Neuron 320 may correspond to neuron H1 of hidden layer 306-1 in FIG. 3A … The output of the function f may be provided to any number of subsequent layers, as depicted. For example, the output of the function f may indicate a purchase category, a future monthly average purchase [0055]) 
	 based on the second training data, (second training data V2, Fig. 3A)

	wherein the second model includes at least one first layer and at least one
second layer, (Each model comprising multiple layers may be weighted [0054]; while hidden layers 306-(J-9) through 306-J may correspond to future profits module 154 [0053], Fig 3A;)
	wherein the output of the second model includes an output of the at least
one first layer, (so that the value(s) output by output layer 308 represent a consensus among the multiple models with respect to the accuracy of the prediction of future consumer profits to the seller [0054]; The output of the function f may be provided to any number of subsequent layers [0055]) and 
	wherein the program instructions, when executed by the at least one
processor, further cause the at least one processor (Program storage 134 may include instructions which, when executed by CPU 130, cause model training module 150 to create and train new models or to perform other operations [0029]) 	to train the second model by: (For example, future profits module 154 may be an artificial neural network trained using a single consumer's purchases as labeled data [0032])
	modifying, using the loss function that depends on the output of the
intermediate layer of the first model and the output of the second model including
the output of the at least one first layer, one or more parameters of the at least one
first layer of the second model to learn to match the output of the intermediate layer
of the first model; (The weights may be adjusted as the network is successively trained, by using one of several gradient descent algorithms, to reduce loss and to cause the values output by the network to converge to expected, or “learned”, values. In an embodiment, a regression neural network may be selected to predict pricing values which has no activation function. Therein, input data may be normalized by mean centering, and a mean squared error loss function may be used [0057]; The output of the function f may be provided to any number of subsequent layers, as depicted. For example, the output of the function f may indicate a purchase category, a future monthly average purchase [0055]) and
	training the at least one second layer based on the output of the at
least one first layer (The output of the function f may be provided to any number of subsequent layers …  For example, the output of the function f may indicate a purchase category, a future monthly average purchase [0055]) and 
	the same plurality of labels. (Such a neural network may be trained using labeled data; i.e., data in which information pertaining to a group of users is explicitly linked to a respective set of purchases made by the group of users [0031])
	Madden teaches loss function (loss function … The weights may be adjusted as the network is successively trained, by using one of several gradient descent algorithms, to reduce loss and to cause the values output by the network to converge to expected, or “learned”, values [0057]) but does not explicitly teach using loss function that depends on an output of an intermediate layer of the first model and an output of the second model,
	Huang teaches a first model (selecting, by a training device, a teacher network performing the same functions of a student network [0038]; a device for training a neural network, the structure of the device is illustrated in FIG. 5, which including: a processor 51 and at least one memory 52, the at least one memory storing at least one machine executable instruction, which is executed by the processor to:[0082])
	a second model, (the processor 51 executes the at least one machine executable instruction to iteratively train the student network [0085])
	using a loss function that depends on an output of an intermediate layer of the first model (In equation (5), 
    PNG
    media_image1.png
    38
    38
    media_image1.png
    Greyscale
(ytrue,pS) refers to the cross-entropy loss function, 
    PNG
    media_image2.png
    38
    29
    media_image2.png
    Greyscale

 MMD 2 (FT, FS) refers to the distance loss function, λ refers to the weight of distance loss function, FT refers to the feature map (i.e. the features of the first middle layer) output from the first specific network layer of the teacher network given the training sample data, [0057] and [0050]-[0054]) and 
	an output of the second model, (FS refers to the feature map (i.e. the features of the second middle layer) output from the second specific network layer of the student network given the training sample data, [0057])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Madden to incorporate the teachings of Huang for the benefit of transferring knowledge of features of a middle layer of the teacher network to the student network (Huang, abstract)

	Regarding claim 17, Modified Madden teaches the computer program product of claim 15, Madden teaches wherein the instructions further cause the at least one processor to: (certain embodiments are described herein as including logic or a number of routines, subroutines, applications, or instructions. These may constitute either software (code embodied on a non-transitory, tangible machine-readable medium) or hardware. In hardware, the routines, etc., are tangible units capable of performing certain operations and may be configured or arranged in a certain manner. In example embodiments, one or more computer systems (e.g., a standalone, client or server computer system) or one or more modules of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a module that operates to perform certain operations as described herein [0076])
	Huang teaches determine a plurality of information values of a plurality of intermediate layers of the first model; (the features of the first middle layer refer to feature maps output from a first specific network layer of the teacher network after the training sample data are provided to the teacher network [0041]. Examiner notes: the first middle layers are the intermediate layer. The teacher model is the first model) and
	select the intermediate layer from the plurality of intermediate layers based on the plurality of information values. (The high-performance teacher network with the same functions of the student network could be selected from a set of preset neural network models [0039]; As for the neural network training scheme provided by the embodiments of the present application, on one aspect, it can train and obtain student networks with a broader application range through aligning features of middle layers of teacher networks with those of student networks [0036].Examiner notes: middle layers of the teacher network (first model) is the intermediate layers)
	The same motivation to combine independent claim 15 applies here.

	Regarding claim 19, Modified Madden teaches the computer program product of claim 15, Huang teaches wherein the first model includes a greater number of parameters than the second model (The teacher network is characterized by high performance and high accuracy; but, compared to the student network, it has some obvious disadvantages such as complex structure, a large number of parameters and weights, and low computation speed. The student network is characterized by fast computation speed, average or poor performance, and simple network structure [0066]; to train student networks (featuring a small amount of network parameters, poor performance and high-speed computation) [0005]. Examiner notes: the first model is the teacher network and the student network is the second model)
	The same motivation to combine independent claim 15 applies here.

4.	Claims 5, 7, 12, 14, 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Madden (US20190228397 filed 01/25/2018) in view of Huang et al (US20180365564) and further in view of Ben Kimon et al. (US20200210849 filed 12/31/18)

	Regarding claim 5, Modified Madden teaches the computer-implemented method of claim 1, Modified Madden does not explicitly teach wherein the first set of features includes complex features, wherein the second set of features includes interpretable features
	Ben Kimon teaches wherein the first set of features includes complex features, (The combination process 110 may extract features (e.g., correlation features associated with correlations of the sub-transactions) associated with the autoencoder for anomaly detection [0024]) and 
	wherein the second set of features includes interpretable features (the input transaction 310 may have N attributes (e.g., transaction time, transaction type, payor, payee, transaction history, adjuster, age, refund amount, refund frequency, etc.) [0030])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Madden to incorporate the teachings of Ben Kimon for the benefit of extracting features (e.g., correlation features associated with correlations of the sub-transactions) associated with the autoencoder for anomaly detection (Ben Kimon, [0024])

	Regarding claim 7, Modified Madden teaches the computer-implemented method of claim 1, Madden teaches further comprising: providing, with at least one processor, (checkout analyzer process node 210-2 may correspond to a single processing script in program storage 134 executed by processor 110 [0044])
	obtaining, with at least one processor, (checkout analyzer process node 210-2 may correspond to a single processing script in program storage 134 executed by processor 110 [0044])
	the trained second model; (while hidden layers 306-(J-9) through 306-J may correspond to future profits module 154 [0053], Fig 3A; Future profits module 154 may be a machine learning or other model configured to quantify or predict future profits)
	processing, with at least one processor and using the trained second model, the input data to generate output data, (With respect to FIG. 3B, an example neuron 320 is depicted. Neuron 320 may correspond to neuron H1 of hidden layer 306-1 in FIG. 3A, according to an embodiment. Neuron 320 may accept inputs X1 through Xn, which may correspond to input neurons I1-In of FIG. 3A, and may include weights W1 through Wn. Weights may be determined during the training process, and may be initialized to random values at the outset of training, and appropriate weights for determining accurate predictions discovered via the training process … The output of the function f may be provided to any number of subsequent layers, as depicted. For example, the output of the function f may indicate a purchase category, a future monthly average purchase, a likely future upper limit, or an indication of an exceeded balance, [0055])
	Modified Madden does not explicitly teach input data associated with at least one transaction; wherein the output data includes a prediction of whether the at least one transaction is a fraudulent transaction.
	Ben Kimon teaches input data associated with at least one transaction; (The decoder 304 may uncompress that latent representation En(xi) into a reconstructed data 314 (denoted as De(En(xi))) that closely matches the input data xi 310 [0030]) 
	the input data to generate output data, wherein the output data includes a prediction of whether the at least one transaction is a fraudulent transaction. (the reconstruction difference 514 may be used to determine whether the first instruction is fraudulent (e.g., with a large reconstruction difference 514) or legitimate (e.g., with a small reconstruction difference 514). In the example of FIG. 5, an anomaly detector 506 receives the reconstruction error threshold for fraud 408 (e.g., from reconstruction error threshold for fraud generator 406) and the reconstruction difference 514 (e.g., from the trained autoencoder 300), and generates a fraud prediction 516 (e.g., a binary value, a probability, etc.) indicating the likelihood that the first transaction 510 is fraudulent [0038])
 	The same motivation to combine dependent claim 5 applies here.

	Regarding claim 12, Modified Madden teaches the computing system of claim 8, 
Modified Madden does not explicitly teach wherein the first set of features includes complex features, wherein the second set of features includes interpretable features
	Ben Kimon teaches wherein the first set of features includes complex features, (The combination process 110 may extract features (e.g., correlation features associated with correlations of the sub-transactions) associated with the autoencoder for anomaly detection [0024]) and 
	wherein the second set of features includes interpretable features (the input transaction 310 may have N attributes (e.g., transaction time, transaction type, payor, payee, transaction history, adjuster, age, refund amount, refund frequency, etc.) [0030])
	The same motivation to combine independent claim 5 applies here.

	Regarding claim 14, Modified Madden teaches the computing system of claim 8, Madden teaches wherein the at least one processor is further programmed and/or configured to: (In example embodiments, one or more computer systems (e.g., a standalone, client or server computer system) or one or more modules of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a module that operates to perform certain operations as described herein. [0076]; checkout analyzer process node 210-2 may correspond to a single processing script in program storage 134 executed by processor 110 [0044])	
	provide the trained second model; (while hidden layers 306-(J-9) through 306-J may correspond to future profits module 154 [0053], Fig 3A; Future profits module 154 may be a machine learning or other model configured to quantify or predict future profits)
	process, using the trained second model, the input data to generate output data,  (With respect to FIG. 3B, an example neuron 320 is depicted. Neuron 320 may correspond to neuron H1 of hidden layer 306-1 in FIG. 3A, according to an embodiment. Neuron 320 may accept inputs X1 through Xn, which may correspond to input neurons I1-In of FIG. 3A, and may include weights W1 through Wn. Weights may be determined during the training process, and may be initialized to random values at the outset of training, and appropriate weights for determining accurate predictions discovered via the training process … The output of the function f may be provided to any number of subsequent layers, as depicted. For example, the output of the function f may indicate a purchase category, a future monthly average purchase, a likely future upper limit, or an indication of an exceeded balance, [0055])
	Modified Madden does not explicitly teach obtain input data associated with at least one transaction; wherein the output data includes a prediction of whether the at least one transaction is a fraudulent transaction.
	Ben Kimon teaches obtain input data associated with at least one transaction; (The decoder 304 may uncompress that latent representation En(xi) into a reconstructed data 314 (denoted as De(En(xi))) that closely matches the input data xi 310 [0030]) and
	wherein the output data includes a prediction of whether the at least one transaction is a fraudulent transaction. (the reconstruction difference 514 may be used to determine whether the first instruction is fraudulent (e.g., with a large reconstruction difference 514) or legitimate (e.g., with a small reconstruction difference 514). In the example of FIG. 5, an anomaly detector 506 receives the reconstruction error threshold for fraud 408 (e.g., from reconstruction error threshold for fraud generator 406) and the reconstruction difference 514 (e.g., from the trained autoencoder 300), and generates a fraud prediction 516 (e.g., a binary value, a probability, etc.) indicating the likelihood that the first transaction 510 is fraudulent [0038])
	The same motivation to combine independent claim 5 applies here.

	Regarding claim 18, Modified Madden teaches the computer program product of claim 15, Modified Madden does not explicitly teach wherein the first set of features includes complex features, wherein the second set of features includes interpretable features
	Ben Kimon teaches wherein the first set of features includes complex features, (The combination process 110 may extract features (e.g., correlation features associated with correlations of the sub-transactions) associated with the autoencoder for anomaly detection [0024]) and 
	wherein the second set of features includes interpretable features (the input transaction 310 may have N attributes (e.g., transaction time, transaction type, payor, payee, transaction history, adjuster, age, refund amount, refund frequency, etc.) [0030])
	The same motivation to combine independent claim 15 applies here.

	Regarding claim 20, Modified Madden teaches the computer program product of claim 15, Madden teaches wherein the instructions further cause the at least one processor to: (certain embodiments are described herein as including logic or a number of routines, subroutines, applications, or instructions. These may constitute either software (code embodied on a non-transitory, tangible machine-readable medium) or hardware. In hardware, the routines, etc., are tangible units capable of performing certain operations and may be configured or arranged in a certain manner. In example embodiments, one or more computer systems (e.g., a standalone, client or server computer system) or one or more modules of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a module that operates to perform certain operations as described herein [0076])
	 provide the trained second model; (while hidden layers 306-(J-9) through 306-J may correspond to future profits module 154 [0053], Fig 3A; Future profits module 154 may be a machine learning or other model configured to quantify or predict future profits)
	and process, using the trained second model, the input data to generate output data, (With respect to FIG. 3B, an example neuron 320 is depicted. Neuron 320 may correspond to neuron H1 of hidden layer 306-1 in FIG. 3A, according to an embodiment. Neuron 320 may accept inputs X1 through Xn, which may correspond to input neurons I1-In of FIG. 3A, and may include weights W1 through Wn. Weights may be determined during the training process, and may be initialized to random values at the outset of training, and appropriate weights for determining accurate predictions discovered via the training process … The output of the function f may be provided to any number of subsequent layers, as depicted. For example, the output of the function f may indicate a purchase category, a future monthly average purchase, a likely future upper limit, or an indication of an exceeded balance, [0055])
	Modified Madden does not explicitly teach obtain input data associated with at least one transaction; wherein the output data includes a prediction of whether the at least one transaction is a fraudulent transaction.
	Ben Kimon teaches obtain input data associated with at least one transaction; (The decoder 304 may uncompress that latent representation En(xi) into a reconstructed data 314 (denoted as De(En(xi))) that closely matches the input data xi 310 [0030])
	 the input data to generate output data, wherein the output data includes a prediction of whether the at least one transaction is a fraudulent transaction. (the reconstruction difference 514 may be used to determine whether the first instruction is fraudulent (e.g., with a large reconstruction difference 514) or legitimate (e.g., with a small reconstruction difference 514). In the example of FIG. 5, an anomaly detector 506 receives the reconstruction error threshold for fraud 408 (e.g., from reconstruction error threshold for fraud generator 406) and the reconstruction difference 514 (e.g., from the trained autoencoder 300), and generates a fraud prediction 516 (e.g., a binary value, a probability, etc.) indicating the likelihood that the first transaction 510 is fraudulent [0038])
	The same motivation to combine independent claim 15 applies here.

5.	Claims 3 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Madden (US20190228397 filed 01/25/2018) in view of Huang et al (US20180365564) and further in view of Caelen et al (US20200257964 filed 07/13/2018)

	Regarding claim 3, Modified Madden teaches the computer-implemented method of claim 1, Huang teaches wherein the first model includes at least one of the following: a deep neural network, a recurrent neural network, an ensemble of a plurality of neural networks, or any combination thereof, (As for the neural network training scheme provided by the embodiments of the present application, on one aspect, it can train and obtain student networks with a broader application range through aligning features of middle layers of teacher networks with those of student networks [0036]. Examiner notes: teacher network as first model)
	wherein the first layer of the second model includes a regression neural network, (when the task of the student network is a regression task, the form of the task specific loss function is a distance loss function [0072]; the features of the second middle layer refer to feature maps output from a second specific network layer of the student network after the training sample data are provided to the student network [0068]; the second specific network layer is a middle network layer or the last network layer of the student network [0069])
	Modified Madden does not explicitly teach wherein the second layer of the second model includes a logistic regression model.
	Caelen teaches wherein the second layer of the second model includes a logistic regression model (The distribution over classes fraud and non-fraud given state st is modeled with a logistic regression output model [0041])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Madden to incorporate the teachings of Caelen for the benefit of taking into account the time elapsed between two authentication, operation or transactions (Caelen, [0017])

	Regarding claim 10, Modified Madden teaches the computer-implemented method of claim 8, Huang teaches wherein the first model includes at least one of the following: a deep neural network, a recurrent neural network, an ensemble of a plurality of neural networks, or any combination thereof, (As for the neural network training scheme provided by the embodiments of the present application, on one aspect, it can train and obtain student networks with a broader application range through aligning features of middle layers of teacher networks with those of student networks [0036]. Examiner notes: teacher network as first model)
	wherein the first layer of the second model includes a regression neural network, (when the task of the student network is a regression task, the form of the task specific loss function is a distance loss function. [0072]; the features of the second middle layer refer to feature maps output from a second specific network layer of the student network after the training sample data are provided to the student network [0068]; the second specific network layer is a middle network layer or the last network layer of the student network [0069]) and 
	Modified Madden does not explicitly teach wherein the second layer of the second model includes a logistic regression model.
	Caelen teaches wherein the second layer of the second model includes a logistic regression model (The distribution over classes fraud and non-fraud given state st is modeled with a logistic regression output model [0041])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Madden to incorporate the teachings of Caelen for the benefit of taking into account the time elapsed between two authentication, operation or transactions (Caelen, [0017])

Conclusion
	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORIAM MOSUNMOLA GODO whose telephone number is (571)272-8670. The examiner can normally be reached Monday-Friday 7:30am-5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/M.G./Examiner, Art Unit 2121                                                                                                                                                                                                        

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121