DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is responsive to the original application filed on 5/23/2019 and the Remarks and Amendments filed on 8/11/2022.  

Priority

Acknowledgment is made of applicant's attempt of a claim for foreign priority based on an application (CN201810498650.1) filed in China on 5/23/2018. 

Claim Interpretation

The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.


This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “a control unit”, “a selecting unit”, “an annotating unit”, and “an acquiring unit” in claim 9 and its dependents, “a pruning unit” in claim 11, and “a training sub-unit” and “a removing sub-unit” in claim 13 and its dependents1.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 17 and 18 are rejected under 35 U.S.C. 103 as being obvious over Liu et al (US 20180144241 A1, hereinafter “Liu”) in view of Chen et al. (US 9342759 B1, hereinafter “Chen”).


Regarding claim 17, Liu discloses [a] computer server, comprising a memory and one or more processors communicatively connected to the memory, the memory storing instructions executable by the one or more processors, the instructions, when executed by the one or more processors, causing the one or more processors to perform a method comprising: ([0007-0008]; and Abstract; and [0001]; “a method for training a neural network, and more specifically to an active learning method for training artificial neural networks”)
selecting input data automatically to obtain a set of data to be annotated; ([0055]; “Further, the annotation device 613 includes a display screen, and the display screen of the annotation device 613 is configured to display the labeling interface 106 that allows the operator to perform labeling process of unlabeled images stored in the memory 640 by showing the unlabeled image in the display region 601 with the selection area 602 having predetermined annotation boxes and predetermined labeling candidates to be selected”, which discloses selecting input data; and [0057]; “When the labeling interface 106 receives an unlabeled image of the K most important unlabeled images 105 in step S 6 of FIG. 1A, the labeling interface 106 shows the unlabeled image on the display region 601 . . . The labeling interface 106 is configured to load and show unlabeled images stored the labeling storage in the memory according to the operations by the operator”, which discloses, under a broadest reasonable interpretation of the claim language, loading or selecting of input data for annotation which is performed automatically; and Figure 1A, Element 105; the data to be annotated is “obtained” by virtue of its presence in the annotation that is performed at 106 in the figure)
annotating the set of data to be annotated to obtain a new set of annotated data; ([0057]; “In this case, the annotation box of Cat is checked by the operator (annotator) in response to the cat image shown in the selection area 602.”; and Figure 1A; Element 107; the newly labeled training images being the new set of annotated data; and [0023]; “In response to data inputs made by an operator (or annotator), the labeling interface 106 generates annotated images 107 having the ground truth labels”)
acquiring a set of newly added annotated data containing the new set of annotated data, and ([0023]; “In response to data inputs made by an operator (or annotator), the labeling interface 106 generates annotated images 107 having the ground truth labels”; and Figure 1A; Element 107; the newly labeled training images being the new set of annotated data) 
determining a union of the set of newly added annotated data and a set of training sample data for training the neural network in a previous period as a set of training sample data for a current period; and ([0023]; “The trainer 102 then retrains the network 301 by fitting the new training dataset of images 108 and obtains updated neural network parameters 401”, the new training dataset of images is the training sample data for a current period, and it is, under a broadest reasonable interpretation of the claim language, a “union” of newly added annotated data 107 and the set of training sample data in a previous period 101; and [0031]; “The selected images with newly annotated labels are added into the current (latest) labeled training set to get a new training dataset”; and [0023]; “Based on the neural network (NN) 100 with randomly initialized parameters, the trainer 102 updates network parameters by fitting the NN 100 to the initial labeled training dataset of images 101.”; and Figure 1A;  the figure discloses the training sample data for training a neural network in a previous period at element 101 in the “initial set” of labeled training images)
training the neural network iteratively based on the set of training sample data for the current period, to obtain a neural network trained in the current period: ([0023]; “The trainer 102 then retrains the network 301 by fitting the new training dataset of images 108 and obtains updated neural network parameters 401. This procedure is iterative. The updated neural network parameters 401 are used to rank the importance of the rest of the unlabeled images 103, and the K most important images 105 are sent to the labeling interface 106. Usually, this procedure is repeated several times until a predetermined preferred performance is achieved or the budget for annotations is empty”; and [0033]; “This training process is iteratively performed, and the active learning system 10 carefully adds more labeled images for gradually increasing the accuracy performance of the model on the test dataset”, the current or next iteration being in the current period; and [0032]).
Liu fails to explicitly disclose but Chen discloses wherein a video containing a plurality of sequences of frames, is inputted to the neural network, to obtain a target detection result for each frame of image, wherein target detection results for all frames of images in the video are inputted to a target tracking model to obtain a tracking result for each frame of image, and wherein the frame of image is determined as the input data in response to that the target detection result and the tracking result for a frame of image are inconsistent with each other (Column 1, Line 60- Column 2, Line 4; “Object detection results and classification results for a sequence of image frames are received as input. Each object detection result is represented by a detection box, and each classification result is represented by an object label corresponding to the object detection result. A pseudo-tracklet is formed by linking object detection results representing the same object in consecutive image frames. The system then determines whether there are any inconsistent object labels or missing object detection results in the pseudo-tracklet. Finally, the object detection results and the classification results are improved by correcting any inconsistent object labels and missing object detection results” (emphasis added), which discloses, under a BRI, a video with a sequence of frames to obtain a target or object detection result for each frame of an image, and this is inputted to a target tracking model to obtain a tracking result or pseudo-tracklet.  The frame of the image is then used when the target detection labels/results and a tracking result/pseudo-tracklet is inconsistent with each other.  Note that the pseudo-tracklet comprises the target detection result and also includes the labels.  Note further that the specification at paragraphs [0088-0090] does not appear to give a precise definition of what is a tracking result, so it is interpreted as a pseudo-tracklet as taught by Chen; and Column 9, Lines 7-10; “In the system implementation, the detector front-end (without a tracker) employs a saliency based object detection method, and the recognition engine is one based on convolutional neural networks (CNN)”, which discloses that the video is inputted into a NN; and Column 9, Line 17; “For each input image frame of a video sequence”; and Figure 5;  the figure discloses the inconsistency evaluation between target detection/labeled detections and the tracking result/pseudotracket; and Abstract; “A pseudo-tracklet is formed by linking object detection results representing the same object in consecutive image frames. The system determines whether there are any inconsistent object labels or missing object detection results in the pseudo-tracklet. Finally, the object detection results and the classification results are improved by correcting any inconsistent object labels and missing object detection results”, the correcting of the labels and detection results in the tracklet implies that this inconsistent information in an image frame is used as input data to correct the model and Column 8, Lines 4-24).
Liu and Chen are analogous art because both are concerned with machine learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in machine learning to combine the video and image processing with tracking and detection results of Chen with the computer server of Liu to yield the predictable result of wherein a video containing a plurality of sequences of frames, is inputted to the neural network, to obtain a target detection result for each frame of image, wherein target detection results for all frames of images in the video are inputted to a target tracking model to obtain a tracking result for each frame of image, and wherein the frame of image is determined as the input data in response to that the target detection result and the tracking result for a frame of image are inconsistent with each other. The motivation for doing so is to improve object detection results and the classification results by correcting any inconsistent object labels and missing object detection results (Chen; Abstract).

Regarding claim 18, the rejection of claim 17 is incorporated and Liu further discloses receiving the set of newly added annotated data from an annotation platform, or retrieving the set of newly added annotated data from an annotation database ([0057]; “In this case, the annotation box of Cat is checked by the operator (annotator) in response to the cat image shown in the selection area 602.”; and Figure 1A; Element 107; the newly labeled training images being the new set of annotated data; and [0023]; “In response to data inputs made by an operator (or annotator), the labeling interface 106 generates annotated images 107 having the ground truth labels”, the labeling interface being the annotation platform).




Claims 19 and 20 are rejected under 35 U.S.C. 103 as being obvious over Liu in view of Chen and further in view of Huang et al. (Huang et al., “Data-Driven Sparse Structure Selection for Deep Neural Networks”, Dec. 18, 2017, arXiv:1707.01213v2, pp. 1-9, hereinafter “Huang”).

Regarding claim 19, the rejection of claim 17 is incorporated and Liu fails to explicitly disclose but Huang discloses pruning the neural network trained in the current period (Page 1, Column 2; “We propose a unified framework for model training and pruning in CNNs”, which discloses pruning a NN trained in a current period; and Abstract; “By forcing some of the factors to zero, we can safely remove the corresponding structures, thus prune the unimportant parts of a CNN”; and Figure 1).
Liu, Chen, and Huang are analogous art because both are concerned with machine learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in machine learning to combine the pruning of a NN as taught by Huang with the computer server of Liu and Chen to yield the predictable result of pruning the neural network trained in the current period. The motivation for doing so is to learn and prune deep models in an end-to-end manner (Huang; Abstract).

Regarding claim 20, the rejection of claim 17 is incorporated and Liu fails to explicitly disclose but Huang discloses training iteratively and pruning the neural network based on the set of training sample data for the current period. (Page 1, Column 2; “We propose a unified framework for model training and pruning in CNNs”, which discloses pruning a NN trained in a current period; and Abstract; “By forcing some of the factors to zero, we can safely remove the corresponding structures, thus prune the unimportant parts of a CNN”; and Figure 1; and Page 3, Column 2; “where η(t)is gradient step size at iteration t”, which discloses the iterative training; and Page 4, Column 1; “Both W and λ are updated in each iteration”).
The motivation to combine Liu, Chen, and Huang is the same as discussed above with respect to claim 19.

Claims 1, 2, 9, and 10 are rejected under 35 U.S.C. 103 as being obvious over Liu in view of Chen and further in view of Ma et al. (US 20200320344 A1, hereinafter “Ma”).

	Regarding claim 1, Liu discloses [a] method for training a neural network, comprising the following process ([0001]; “a method for training a neural network, and more specifically to an active learning method for training artificial neural networks”)
selecting input data automatically to obtain a set of data to be annotated; ([0055]; “Further, the annotation device 613 includes a display screen, and the display screen of the annotation device 613 is configured to display the labeling interface 106 that allows the operator to perform labeling process of unlabeled images stored in the memory 640 by showing the unlabeled image in the display region 601 with the selection area 602 having predetermined annotation boxes and predetermined labeling candidates to be selected”, which discloses selecting input data; and [0057]; “When the labeling interface 106 receives an unlabeled image of the K most important unlabeled images 105 in step S 6 of FIG. 1A, the labeling interface 106 shows the unlabeled image on the display region 601 . . . The labeling interface 106 is configured to load and show unlabeled images stored the labeling storage in the memory according to the operations by the operator”, which discloses, under a broadest reasonable interpretation of the claim language, loading or selecting of input data for annotation which is performed automatically; and Figure 1A, Element 105; the data to be annotated is “obtained” by virtue of its presence in the annotation that is performed at 106 in the figure)
annotating the set of data to be annotated to obtain a new set of annotated data; ([0057]; “In this case, the annotation box of Cat is checked by the operator (annotator) in response to the cat image shown in the selection area 602.”; and Figure 1A; Element 107; the newly labeled training images being the new set of annotated data; and [0023]; “In response to data inputs made by an operator (or annotator), the labeling interface 106 generates annotated images 107 having the ground truth labels”)
acquiring a set of newly added annotated data containing the new set of annotated data, and ([0023]; “In response to data inputs made by an operator (or annotator), the labeling interface 106 generates annotated images 107 having the ground truth labels”; and Figure 1A; Element 107; the newly labeled training images being the new set of annotated data)
and determining a union of the set of newly added annotated data and a set of training sample data for training the neural network in a previous period as a set of training sample data for a current period; and ([0023]; “The trainer 102 then retrains the network 301 by fitting the new training dataset of images 108 and obtains updated neural network parameters 401”, the new training dataset of images is the training sample data for a current period, and it is, under a broadest reasonable interpretation of the claim language, a “union” of newly added annotated data 107 and the set of training sample data in a previous period 101; and [0031]; “The selected images with newly annotated labels are added into the current (latest) labeled training set to get a new training dataset”; and [0023]; “Based on the neural network (NN) 100 with randomly initialized parameters, the trainer 102 updates network parameters by fitting the NN 100 to the initial labeled training dataset of images 101.”; and Figure 1A;  the figure discloses the training sample data for training a neural network in a previous period at element 101 in the “initial set” of labeled training images)
training the neural network iteratively based on the set of training sample data for the current period, to obtain a neural network trained in the current period ([0023]; “The trainer 102 then retrains the network 301 by fitting the new training dataset of images 108 and obtains updated neural network parameters 401. This procedure is iterative. The updated neural network parameters 401 are used to rank the importance of the rest of the unlabeled images 103, and the K most important images 105 are sent to the labeling interface 106. Usually, this procedure is repeated several times until a predetermined preferred performance is achieved or the budget for annotations is empty”; and [0033]; “This training process is iteratively performed, and the active learning system 10 carefully adds more labeled images for gradually increasing the accuracy performance of the model on the test dataset”, the current or next iteration being in the current period; and [0032]).
Liu fails to explicitly disclose but Chen discloses wherein a video containing a plurality of sequences of frames, is inputted to the neural network, to obtain a target detection result for each frame of image, wherein target detection results for all frames of images in the video are inputted to a target tracking model to obtain a tracking result for each frame of image, and wherein the frame of image is determined as the input data in response to that the target detection result and the tracking result for a frame of image are inconsistent with each other (Column 1, Line 60- Column 2, Line 4; “Object detection results and classification results for a sequence of image frames are received as input. Each object detection result is represented by a detection box, and each classification result is represented by an object label corresponding to the object detection result. A pseudo-tracklet is formed by linking object detection results representing the same object in consecutive image frames. The system then determines whether there are any inconsistent object labels or missing object detection results in the pseudo-tracklet. Finally, the object detection results and the classification results are improved by correcting any inconsistent object labels and missing object detection results” (emphasis added), which discloses, under a BRI, a video with a sequence of frames to obtain a target or object detection result for each frame of an image, and this is inputted to a target tracking model to obtain a tracking result or pseudo-tracklet.  The frame of the image is then used when the target detection labels/results and a tracking result/pseudo-tracklet is inconsistent with each other.  Note that the pseudo-tracklet comprises the target detection result and also includes the labels.  Note further that the specification at paragraphs [0088-0090] does not appear to give a precise definition of what is a tracking result, so it is interpreted as a pseudo-tracklet as taught by Chen; and Column 9, Lines 7-10; “In the system implementation, the detector front-end (without a tracker) employs a saliency based object detection method, and the recognition engine is one based on convolutional neural networks (CNN”, which discloses that the video is inputted into a NN; and Column 9, Line 17; “For each input image frame of a video sequence”; and Figure 5;  the figure discloses the inconsistency evaluation between target detection/labeled detections and the tracking result/pseudotracket; and Abstract; “A pseudo-tracklet is formed by linking object detection results representing the same object in consecutive image frames. The system determines whether there are any inconsistent object labels or missing object detection results in the pseudo-tracklet. Finally, the object detection results and the classification results are improved by correcting any inconsistent object labels and missing object detection results”, the correcting of the labels and detection results in the tracklet implies that this inconsistent information in an image frame is used as input data to correct the model and Column 8, Lines 4-24).
Liu and Chen are analogous art because both are concerned with machine learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in machine learning to combine the video and image processing with tracking and detection results of Chen with the computer server of Liu to yield the predictable result of wherein a video containing a plurality of sequences of frames, is inputted to the neural network, to obtain a target detection result for each frame of image, wherein target detection results for all frames of images in the video are inputted to a target tracking model to obtain a tracking result for each frame of image, and wherein the frame of image is determined as the input data in response to that the target detection result and the tracking result for a frame of image are inconsistent with each other. The motivation for doing so is to improve object detection results and the classification results by correcting any inconsistent object labels and missing object detection results (Chen; Abstract).
Liu fails to explicitly disclose but Ma discloses the following process performed at a predetermined time period (Abstract; “training a second neural network model by using the annotation data that is of the service and that is generated in the specified period”, which discloses training a neural network at a predetermined time period; and Figure 3, Elements301 and 302;  the figure discloses the training of a neural network at a specified or predetermined time period).
Liu, Chen, and Ma are analogous art because both are concerned with machine learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in machine learning to combine the training of a neural network at a predetermined time period of Ma with the method Liu and Chen to yield the predictable result of [a] method for training a neural network, comprising the following process performed at a predetermined time period. The motivation for doing so that in an updated first neural network model compared with a universal model, an inference result has a higher confidence level, and a personalized requirement of a user can be better met (Ma; [0010]).

Regarding claim 9, Liu discloses [a]n apparatus for training a neural network, comprising: ([0007-0008]; and Abstract; and [0001]; “a method for training a neural network, and more specifically to an active learning method for training artificial neural networks”)
a control unit configured to trigger a selecting unit, an annotating unit, an acquiring unit and a training unit; (Figure 5; the figure discloses, under a broadest reasonable interpretation of the claim language and in view of the 112f interpretation above, the units that are generic processors 620 and an annotation unit in the form of an annotation device 613)
the selecting unit configured to select input data automatically to obtain a set of data to be annotated; ([0055]; “Further, the annotation device 613 includes a display screen, and the display screen of the annotation device 613 is configured to display the labeling interface 106 that allows the operator to perform labeling process of unlabeled images stored in the memory 640 by showing the unlabeled image in the display region 601 with the selection area 602 having predetermined annotation boxes and predetermined labeling candidates to be selected”, which discloses selecting input data; and [0057]; “When the labeling interface 106 receives an unlabeled image of the K most important unlabeled images 105 in step S 6 of FIG. 1A, the labeling interface 106 shows the unlabeled image on the display region 601 . . . The labeling interface 106 is configured to load and show unlabeled images stored the labeling storage in the memory according to the operations by the operator”, which discloses, under a broadest reasonable interpretation of the claim language, loading or selecting of input data for annotation which is performed automatically; and Figure 1A, Element 105; the data to be annotated is “obtained” by virtue of its presence in the annotation that is performed at 106 in the figure)
the annotating unit configured to annotate the set of data to be annotated to obtain a new set of annotated data; ([0057]; “In this case, the annotation box of Cat is checked by the operator (annotator) in response to the cat image shown in the selection area 602.”; and Figure 1A; Element 107; the newly labeled training images being the new set of annotated data; and [0023]; “In response to data inputs made by an operator (or annotator), the labeling interface 106 generates annotated images 107 having the ground truth labels”)
the acquiring unit configured to acquire a set of newly added annotated data containing the new set of annotated data, and ([0023]; “In response to data inputs made by an operator (or annotator), the labeling interface 106 generates annotated images 107 having the ground truth labels”; and Figure 1A; Element 107; the newly labeled training images being the new set of annotated data)
determine a union of the set of newly added annotated data and a set of training sample data for training the neural network in a previous period as a set of training sample data for a current period; and ([0023]; “The trainer 102 then retrains the network 301 by fitting the new training dataset of images 108 and obtains updated neural network parameters 401”, the new training dataset of images is the training sample data for a current period, and it is, under a broadest reasonable interpretation of the claim language, a “union” of newly added annotated data 107 and the set of training sample data in a previous period 101; and [0031]; “The selected images with newly annotated labels are added into the current (latest) labeled training set to get a new training dataset”; and [0023]; “Based on the neural network (NN) 100 with randomly initialized parameters, the trainer 102 updates network parameters by fitting the NN 100 to the initial labeled training dataset of images 101.”; and Figure 1A;  the figure discloses the training sample data for training a neural network in a previous period at element 101 in the “initial set” of labeled training images)
the training unit configured to train the neural network iteratively based on the set of training sample data for the current period, to obtain a neural network trained in the current period ([0023]; “The trainer 102 then retrains the network 301 by fitting the new training dataset of images 108 and obtains updated neural network parameters 401. This procedure is iterative. The updated neural network parameters 401 are used to rank the importance of the rest of the unlabeled images 103, and the K most important images 105 are sent to the labeling interface 106. Usually, this procedure is repeated several times until a predetermined preferred performance is achieved or the budget for annotations is empty”; and [0033]; “This training process is iteratively performed, and the active learning system 10 carefully adds more labeled images for gradually increasing the accuracy performance of the model on the test dataset”, the current or next iteration being in the current period; and [0032]).
Liu fails to explicitly disclose but Chen discloses wherein a video containing a plurality of sequences of frames, is inputted to the neural network, to obtain a target detection result for each frame of image, wherein target detection results for all frames of images in the video are inputted to a target tracking model to obtain a tracking result for each frame of image, and wherein the frame of image is determined as the input data in response to that the target detection result and the tracking result for a frame of image are inconsistent with each other (Column 1, Line 60- Column 2, Line 4; “Object detection results and classification results for a sequence of image frames are received as input. Each object detection result is represented by a detection box, and each classification result is represented by an object label corresponding to the object detection result. A pseudo-tracklet is formed by linking object detection results representing the same object in consecutive image frames. The system then determines whether there are any inconsistent object labels or missing object detection results in the pseudo-tracklet. Finally, the object detection results and the classification results are improved by correcting any inconsistent object labels and missing object detection results” (emphasis added), which discloses, under a BRI, a video with a sequence of frames to obtain a target or object detection result for each frame of an image, and this is inputted to a target tracking model to obtain a tracking result or pseudo-tracklet.  The frame of the image is then used when the target detection labels/results and a tracking result/pseudo-tracklet is inconsistent with each other.  Note that the pseudo-tracklet comprises the target detection result and also includes the labels.  Note further that the specification at paragraphs [0088-0090] does not appear to give a precise definition of what is a tracking result, so it is interpreted as a pseudo-tracklet as taught by Chen; and Column 9, Lines 7-10; “In the system implementation, the detector front-end (without a tracker) employs a saliency based object detection method, and the recognition engine is one based on convolutional neural networks (CNN”, which discloses that the video is inputted into a NN; and Column 9, Line 17; “For each input image frame of a video sequence”; and Figure 5;  the figure discloses the inconsistency evaluation between target detection/labeled detections and the tracking result/pseudotracket; and Abstract; “A pseudo-tracklet is formed by linking object detection results representing the same object in consecutive image frames. The system determines whether there are any inconsistent object labels or missing object detection results in the pseudo-tracklet. Finally, the object detection results and the classification results are improved by correcting any inconsistent object labels and missing object detection results”, the correcting of the labels and detection results in the tracklet implies that this inconsistent information in an image frame is used as input data to correct the model and Column 8, Lines 4-24).
The motivation to combine Liu and Chen is the same as discussed above with respect to claim 1.
Liu fails to explicitly disclose but Ma discloses at a predetermined time period (Abstract; “training a second neural network model by using the annotation data that is of the service and that is generated in the specified period”, which discloses training a neural network at a predetermined time period; and Figure 3, Elements301 and 302;  the figure discloses the training of a neural network at a specified or predetermined time period).
The motivation to combine Liu, Chen, and Ma is the same as discussed above with respect to claim 1.

Regarding claims 2 and 10, the rejection of claims 1 and 9 are incorporated and Liu further discloses receiving the set of newly added annotated data from an annotation platform, or retrieving the set of newly added annotated data from an annotation database ([0057]; “In this case, the annotation box of Cat is checked by the operator (annotator) in response to the cat image shown in the selection area 602.”; and Figure 1A; Element 107; the newly labeled training images being the new set of annotated data; and [0023]; “In response to data inputs made by an operator (or annotator), the labeling interface 106 generates annotated images 107 having the ground truth labels”, the labeling interface being the annotation platform).

Claims 3-8 and 11-16 are rejected under 35 U.S.C. 103 as being obvious over Liu in view of Chen, Ma, and Huang.

Regarding claims 3 and 11, the rejection of claims 1 and 9 are incorporated and Liu fails to explicitly disclose but Huang discloses pruning the neural network trained in the current period (Page 1, Column 2; “We propose a unified framework for model training and pruning in CNNs”, which discloses pruning a NN trained in a current period; and Abstract; “By forcing some of the factors to zero, we can safely remove the corresponding structures, thus prune the unimportant parts of a CNN”; and Figure 1).
Liu, Chen, Ma, and Huang are analogous art because all are concerned with machine learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in machine learning to combine the pruning of a NN as taught by Huang with the method of Liu, Chen, and Ma to yield the predictable result of pruning the neural network trained in the current period. The motivation for doing so is to learn and prune deep models in an end-to-end manner (Huang; Abstract).

Regarding claims 4 and 12, the rejection of claims 1 and 9 are incorporated and Liu fails to explicitly disclose but Huang discloses training iteratively and pruning the neural network based on the set of training sample data for the current period. (Page 1, Column 2; “We propose a unified framework for model training and pruning in CNNs”, which discloses pruning a NN trained in a current period; and Abstract; “By forcing some of the factors to zero, we can safely remove the corresponding structures, thus prune the unimportant parts of a CNN”; and Figure 1; and Page 3, Column 2; “where η(t)is gradient step size at iteration t”, which discloses the iterative training; and Page 4, Column 1; “Both W and λ are updated in each iteration”).
The motivation to combine Liu, Chen, Ma, and Huang is the same as discussed above with respect to claim 3.

Regarding claims 5 and 13, the rejection of claims 1, 4, 9, and 12 are incorporated and Liu fails to explicitly disclose but Huang discloses wherein the neural network has a plurality of particular structures each provided with a corresponding sparse scaling operator for scaling an output from the particular structure, and (Page 1, Column 2; “Particularly, we formulate it as a joint sparse regularized optimization problem by introducing scaling factors and corresponding sparse regularizations on certain structures of CNNs”; and Page 2, §3.1)
said training iteratively and pruning the neural network based on the set of training sample data for the current period comprises: training iteratively weights of the neural network and the respective sparse scaling operators for the particular structures based on the set of training sample data for the current period, and removing any particular structure having a sparse scaling operator of zero from the trained neural network (Page 3, Column 1; “To achieve this goal, we introduce a new type of parameter – scaling factor λ to scale the outputs of some specific structures (neurons, groups or blocks), and add sparsity constraint on λ during training. Our goal is to obtain a sparse λ. Namely, if λ i = 0, then we can safely remove the corresponding structure since its outputs have no contribution to subsequent computation. Fig. 1 illustrates our framework”; and Page 2, §3.1).
The motivation to combine Liu, Chen, Ma, and Huang is the same as discussed above with respect to claim 3.

Regarding claims 6 and 14, the rejection of claims 1, 4, 5, 9, 12, and 13 are incorporated and Liu fails to explicitly disclose but Huang discloses training iteratively the neural network using the sample data in the set of training sample data for the current period; (Page 3, Column 2; “where η(t)is gradient step size at iteration t”, which discloses the iterative training; and Page 4, Column 1; “Both W and λ are updated in each iteration”)
determining to stop the training when a number of training iterations has reached a threshold or when a target function associated with the neural network satisfies a predetermined convergence condition, wherein the target function comprises a loss function or a sparse regular function (Page 4, §3.2; and Page 2, §3.1; “where L(yi , C(xi ,W)) is the loss on the sample xi”, and §3.1 in general discloses the convergence criterion to stop training).
The motivation to combine Liu, Chen, Ma, and Huang is the same as discussed above with respect to claim 3.

Regarding claims 7 and 15, the rejection of claims 1, 4-6, 9, and 12-14 are incorporated and Liu fails to explicitly disclose but Huang discloses training the neural network in a number of training iterations by: optimizing the target function using a first optimization algorithm, with sparse scaling operators obtained from a previous training iteration being constants of the target function and the weights being variables of the target function, to obtain weights of a current training iteration; (Page 2, §3.1, equation 1)
optimizing the target function using a second optimization algorithm, with the weights of the current training iteration being constants of the target function and sparse scaling operators being variables of the target function, to obtain sparse scaling operators of the current training iteration; and (Page 3, §3.1, equation 2)
performing a next training iteration based on the weights and sparse scaling operators of the current training iteration. (Page 4, §3.2; and Page 2, §3.1; “where L(yi , C(xi ,W)) is the loss on the sample xi”, and §3.1 in general discloses the convergence criterion to stop training).
The motivation to combine Liu, Chen, Ma, and Huang is the same as discussed above with respect to claim 3.

Regarding claims 8 and 16, the rejection of claims 1, 4-7, 9, and 12-15 are incorporated and Liu fails to explicitly disclose but Huang discloses wherein the target function is:
min W , λ  [ 1 N  ∑ i = 1 N  ℒ  (  ( x i , W , λ ) ) +  ( W ) + s  ( λ ) ]
where Figure US20190385059A1-20191219-P00003 s(λ)=Σj=1 Kγj∥λj∥1, and
W denotes the weights of the neural network, λ denotes a vector of sparse scaling operators of the neural network, N denotes a number of pieces of sample data in the set of training sample data for the current period, Figure US20190385059A1-20191219-P00001(Figure US20190385059A1-20191219-P00002(xi, W, λ)) denotes a loss of the neural network over sample data xi, Figure US20190385059A1-20191219-P00003(W) denotes a weight regular function, Figure US20190385059A1-20191219-P00003 s(λ) denotes a sparse regular function, K denotes a number of particular structures in the neural network, λj denotes a sparse scaling operator for the j-th particular structure, and γi denotes a sparse penalty weight corresponding to the j-th particular structure and is calculated based on a computational complexity of the j-th particular structure (Page 3, Equation (2)).
The motivation to combine Liu, Chen, Ma, and Huang is the same as discussed above with respect to claim 3.




Response to Arguments

	Applicant’s arguments and amendments, filed on 8/11/2022, with respect to the objection to claims 1-20 have been fully considered and are persuasive.  The objection to claims 1-20 is withdrawn.

Applicant’s arguments and amendments, filed on 8/11/2022, with respect to the 35 USC § 112(f) interpretation of claims 9-16 have been fully considered and Examiner notes that the interpretation was acknowledged by the Applicant.  The 35 USC § 112(f) interpretation of claims 9-16 will be maintained.  

	Applicant’s arguments and amendments, filed on 8/11/2022, with respect to the 35 USC § 112(b) rejection of claims 1-20 have been fully considered and are persuasive.  The 35 USC § 112(b) rejection of claims 1-20 is withdrawn.

Applicant’s arguments and amendments, filed on 8/11/2022, with respect to the 35 USC § 102(a)(1) rejection of claims 17 and 18 and the 35 USC § 103 rejection of claims 1-16 and 19-20 have been fully considered but are moot because the arguments do not apply to any of the references being used in the current rejection to reject independent claims 1, 22, and 25.  Liu and Chen are now being used to render claim 17 obvious and Liu, Ma, and Chen are now being used to render claims 1 and 9 obvious under 35 USC § 103.

Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Brent Hoover whose telephone number is (303)297-4403. The examiner can normally be reached Monday - Friday 9-5 MST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on 571-270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRENT JOHNSTON HOOVER/Examiner, Art Unit 2127                                                                                                                                                                                                        
 i


    
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 Note that the Specification appears to provide sufficient structural support or an associated algorithm for “a control unit”, “a selecting unit”, “an annotating unit”, and “an acquiring unit” in claim 9 and its dependents, “a pruning unit” in claim 11, and “a training sub-unit” and “a removing sub-unit” in claim 13 and its dependents in paragraphs [0129] and [0131-0133], and all units appear to be generic computer processing elements in the form of hardware or software.