DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 1-20 are rejected under 35 U.S.C. 112(b), as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, regards as the invention.
Claims 1, 9, and 15 are indefinite due to improper antecedent basis. The limitation “the one or more modified anchors” should recite “one or more modified anchors.” Appropriate correction is required.
Claim 5 is indefinite due to lack of clarity. The limitation “responsive to the determining that the first identity vectors is not above the distance threshold” is unclear. Examiner recommends amending the limitation to recite “responsive to the determining that the distance between the first identity vector and the second identity vector is not above the distance threshold.” Appropriate correction is required.

Claim 13 and 19 are indefinite due to lack of clarity. The limitation “responsive to the determine that the first identity vectors is not above the distance threshold” is unclear. Examiner recommends amending the limitation to recite “responsive to the program instructions to determine that the distance between the first identity vector and the second identity vector is not above the distance threshold.” Appropriate correction is required.
Dependent claims 2-4, 6-8, 10-12, 14, 16-8, and 20 are rejected under 112(b) for inheriting and failing to cure the deficiencies of parent claims 1, 9, and 15 respectively.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 9-14 are rejected under 35 U.S.C 101 because the claimed invention is directed to non-statutory subject matter.  Claim 9 recites “a computer program product for optimizing a loss function associated with image analysis, the computer program product comprising: one or more computer readable storage media and program instructions stored on the one or more computer readable storage media.” However, the specification as originally filed does not explicitly define the computer readable medium.  The United States Patent and Trademark Office (USPTO) is obliged to give claims their broadest reasonable interpretation consistent with the specification during proceedings before the USPTO. See In re Zletz, 893 F.2d 319 (Fed. Cir. 1989) (during patent examination the pending claims must be interpreted as broadly as their terms reasonably allow).
The Official Gazette Notice 1351 OG 212 dated February 23, 2010, states, “The broadest reasonable interpretation of a claim drawn to a computer readable medium (also called machine readable medium and other such variations) typically covers forms of non-transitory tangible media and transitory propagating signals per se in view of the ordinary and customary meaning of computer readable media, particularly when the specification is silent.” “A claim drawn to such a computer readable medium that covers both transitory and non-transitory embodiments may be amended to narrow the claim to cover only statutory embodiments to avoid a rejection under 35 U.S.C. § 101 by adding the limitation "non-transitory" to the claim.” Claim 9 recites “computer readable storage media” which covers transitory signals, and therefore software. Examiner recommends amending this term to include the term “non-transitory computer readable storage media.”
Claims 9-14 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. Claims defines “computer program product”. However, the means to implement the system may be regarded as software per se. Program does not fall within one of the four categories (process, machine, manufacture or composition of matter) of invention and as a result it is not a statutory process. The claims are not tangibly embodied on any sort of physical medium and do not define structural and functional descriptive material used in interrelationship between the computer software and the hardware like a memory and a processor (i.e., “When functional descriptive material is recorded on some non-transitory computer-readable medium and execute by a processor, it becomes structurally and functionally interrelated to the medium and will be statutory in most cases since use of technology permits the function of the  descriptive material to be realized”). Thus, the claims directed to software per se and are non-statutory subject matter. Appropriate correction is required.
Dependent claims 10-14 are rejected under 101 for inheriting and failing to cure the deficiencies of parent claim 9.
 Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 6, 9, 11, 14, 15, 17, and 20 are rejected under 35 U.S.C 103 as being unpatentable over U.S. Patent Application Publication No.: 2021/0383533 (Zhao et al.) (hereinafter Zhao) in view of U.S. Patent Application Publication No.: 2021/0012089 (Shlens et al.) (hereinafter Shlens).
Regarding claim 1, Zhao teaches a computer-implemented method for optimizing a loss function associated with image analysis, the computer-method comprising: (Zhao, para. [0053]: “at least one embodiment provides a 3D multi-organ detection algorithm which is robust to training data with incomplete labels and, in some examples, aims to choose box with highest Intersection over Union (“IoU”) rather than highest classification score; at least one embodiment provides features such as: a new parameter-free and efficient anchor generator that improves training of system, a loss function and a training scheme that is tolerant to incomplete labels in training data, and an IoU prediction loss that replaces classification loss”)
identifying one or more ground truth entities based on one or more image data (Zhao, para. [0058]: “FIG. 1 illustrates an example of a medical image with a number organs located, in accordance with an embodiment; medical image is a head and neck scan, showing bounding boxes that locate various organs and structures in head; in at least one embodiment, boxes denoted by solid white lines denote ground truth labels for image, and boxes denoted with dashed lines denote organs identified using techniques described herein”);
representing, using a machine-learning model, one or more anchors associated with the ground truth entities (Zhao, para. [0061], lines 1-9, para. [0062], lines 1-9: "(1) Anchor generator: In at least one embodiment, RPN (Regional Proposal Network) takes two user-defined parameters: base anchor size S (S=32 in Faster R-CNN (Region Based Convolutional Neural Networks)) and A aspect ratios [r1, r2, . . ., rA] ([0.5,1,2] in some examples); in at least one embodiment, with 2D detectors, for aspect ratio ra, corresponding anchor shape is                         
                            
                                
                                    2
                                
                                
                                    i
                                
                            
                            S
                            
                                
                                    
                                        
                                            r
                                        
                                        
                                            a
                                        
                                        
                                            
                                                
                                                    1
                                                
                                                
                                                    2
                                                
                                            
                                        
                                    
                                    ×
                                    
                                        
                                            r
                                        
                                        
                                            a
                                        
                                        
                                            
                                                
                                                    -
                                                    1
                                                
                                                
                                                    2
                                                
                                            
                                        
                                    
                                
                            
                        
                    ; in at least one embodiment, there are in total (A×W×H)/(2ik) anchors generated for level i and each organ; (2) RPN predictor: in at least one embodiment, during training, anchors that overlap with ground truth (“GT”) boxes with an IoU above a user-defined threshold are labeled as positive; in at least one embodiment, anchors that have no or little overlap with GT boxes are labeled as negative (for example, overlap below a threshold value) and used to train a binary classifier using cross entropy (“CE”) loss, while positive anchors are used to train a box regressor in order to match them to GT box using smoothed L1 loss”)
calculating a penalty value; and optimizing a loss function of the machine learning model based on the penalty value (Zhao, page 3, para. [0072]-[0073]: “in at least one embodiment, instead of doing classification in RPN and ROI head, at least one embodiment directly regresses IoU between anchors/proposals and best matched ground truth box; in at least one embodiment, IoU prediction loss LpredIoU between network output IôU and ground truth IoU is LpredIoU=SmoothedL1(IôU, IOU); in at least one embodiment, replacing classification loss with LpredIoU has several advantages: (1) for boxes with much higher IôU than threshold, their CLSs are all close to 1 and have difficulty providing guidance on box selection (FIG. 1(b)), whereas IôU provides more helpful guidance; ... in at least one embodiment, L1 loss is used for box regression; in at least one embodiment, L1 does not have a negative linearity with IoU; at least one embodiment uses loss                         
                            
                                
                                    L
                                
                                
                                    D
                                    I
                                    o
                                    U
                                
                            
                            =
                            1
                            -
                            I
                            o
                            U
                            +
                            
                                
                                    
                                        
                                            
                                                
                                                    b
                                                    -
                                                    
                                                        
                                                            b
                                                        
                                                        
                                                            g
                                                            t
                                                        
                                                    
                                                
                                            
                                        
                                        
                                            2
                                        
                                    
                                
                                
                                    
                                        
                                            c
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                    , b and bgt denotes central points of predicted box and ground truth box respectively, and c is diagonal length of smallest enclosing box covering two boxes";                         
                            
                                
                                    
                                        
                                            
                                                
                                                    b
                                                    -
                                                    
                                                        
                                                            b
                                                        
                                                        
                                                            g
                                                            t
                                                        
                                                    
                                                
                                            
                                        
                                        
                                            2
                                        
                                    
                                
                                
                                    
                                        
                                            c
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                     term in equation is the penalty term; see Footnote 1).
Zhao fails to teach 
creating, on top of a classification of the machine learning model, a first identity vector and a second identity vector based on the one or more modified anchors.
Shlens teaches 
creating, on top of a classification of the machine learning model, a first identity vector and a second identity vector based on the one or more modified anchors (Shlens, page 6-7, para. [0047], lines 1-28: “in
some implementations, to generate the perception output 232, the object detection neural network 270 projects each feature representation included in the feature representations 262 to generate multiple feature vectors for multiple anchor offsets, respectively, and processes the multiple feature vectors to generate an object detection output for each of the multiple anchor offsets; that is, for each proposal location, the neural network 270 generates a respective feature vector for each anchor offset and then processes the feature vector for the anchor offset to generate the object detection output for the anchor offset”; Shlens, page 6, para. [0049]: “the loss function used for the training of these neural networks can be an object detection loss that measures the quality of object detection outputs generated by the these neural networks relative to the ground truth object detection outputs, e.g., smoothed L1 losses for regressed values and cross entropy losses for classification outputs.).
It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to add the step of creating, on top of a classification of the machine learning model, a first identity vector and a second identity vector based on the one or more modified anchors, as taught by Shlens, to the computer-implemented method for optimizing a loss function associated with image analysis, as taught by Zhao. Further, it would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify the step of calculating a penalty value, as taught by Zhao, by basing it on the first identity and the second identity vector, as taught by Shlens. The penalty value is calculated in Zhao via geometric distances between multiple bounding boxes (i.e. anchors), which is information encapsulated in the first and second vectors of Shlens;
The suggestion/motivation for doing so would have been to provide representing an encoding identity simply via a 1-dimensional vector of scalars rather than complex anchors, and to provide vector comparison which is simpler than anchor comparison.
Therefore, it would have been obvious to combine Zhao with Shlens to obtain the invention as specified in claim 1.
Regarding claim 3, Zhao, in view of Shlens, teaches the computer-implemented method of claim 1, wherein representing, using a machine-learning model, one or more anchors associated with the ground truth entities further comprises: modifying one or more existing anchors with one or more bounding boxes (Zhao, para. [0062]; para. [0063], lines 1-3: “(2) RPN predictor: in at least one embodiment, during training, anchors that overlap with ground truth (“GT”) boxes with an IoU above a user-defined threshold are labeled as positive; in at least one embodiment, anchors that have no or little overlap with GT boxes are labeled as negative (for example, overlap below a threshold value) and used to train a binary classifier using cross entropy (“CE”) loss, while positive anchors are used to train a box regressor in order to match them to GT box using smoothed L1 loss; (3) RPN Box selector: in at least one embodiment, box selector selects appropriate regressed anchors as output of RPN and selected anchors can be called proposals”).
Regarding claim 6, Zhao, in view of Shlens, teaches the computer-implemented method of claim 1, wherein optimizing the loss function based on the penalty value further comprises: using the penalty value to minimize the loss function during training of a neural network image detection system (Zhao, para. [0082]: “comparison methods and evaluation; experiment compares an embodiment of Med R-CNN to an embodiment of Faster R-CNN, with ResNet-50 as backbone, backbone base stem downsample ratio k=4, and ROI matcher IoU threshold=0.4; experiment compares four methods: (1) an embodiment of “Faster_missing”: one Faster R-CNN using all training data, where 66.7% labels are missing; (2) an embodiment of “Faster_no_missing”: three Faster R-CNN, one for each organ, where no label is missing; (Faster R-CNNs has anchor size S=24, aspect ratios being [0.55,0.77,1,1.3,1.8]) (3) an embodiment of “Med_missing”: one Med R-CNN using all of training data, with binary entropy loss and 11 loss as in YoLo v3; (4) an embodiment of “Med_missing_newLoss”: one Med R-CNN using training data, with proposed predIoU loss as well as DIoU loss; in at least one embodiment, 10% of training data is used as validation data in order to decide stopping point of training”).
Regarding claim 9, Zhao teaches a computer program product for optimizing a loss function associated with image analysis, the computer program product comprising: (Zhao, para. [0053]: “at least one embodiment provides a 3D multi-organ detection algorithm which is robust to training data with incomplete labels and, in some examples, aims to choose box with highest Intersection over Union (“IoU”) rather than highest classification score; at least one embodiment provides features such as: a new parameter-free and efficient anchor generator that improves training of system, a loss function and a training scheme that is tolerant to incomplete labels in training data, and an IoU prediction loss that replaces classification loss”)
one or more computer readable storage media and program instructions stored on the one or more computer readable storage media, the program instructions comprising: (Zhao, para. [0100]: “in at least one embodiment, any portion of code and/or data storage 1101 may be internal or external to one or more processors or other hardware logic devices or circuits; in at least one embodiment, code and/or code and/or data storage 1101 may be cache memory, dynamic randomly addressable memory (“DRAM”), static randomly addressable memory (“SRAM”), non-volatile memory (e.g., Flash memory), or other storage”)
program instructions to identify one or more ground truth entities based on one or more image data (Zhao, para. [0058]: “FIG. 1 illustrates an example of a medical image with a number organs located, in accordance with an embodiment; medical image is a head and neck scan, showing bounding boxes that locate various organs and structures in head; in at least one embodiment, boxes denoted by solid white lines denote ground truth labels for image, and boxes denoted with dashed lines denote organs identified using techniques described herein”);
program instructions to represent, using a machine-learning model, one or more anchors associated with the ground truth entities (Zhao, para. [0061], lines 1-9, para. [0062], lines 1-9: "(1) Anchor generator: In at least one embodiment, RPN (Regional Proposal Network) takes two user-defined parameters: base anchor size S (S=32 in Faster R-CNN (Region Based Convolutional Neural Networks)) and A aspect ratios [r1, r2, . . ., rA] ([0.5,1,2] in some examples); in at least one embodiment, with 2D detectors, for aspect ratio ra, corresponding anchor shape is                         
                            
                                
                                    2
                                
                                
                                    i
                                
                            
                            S
                            
                                
                                    
                                        
                                            r
                                        
                                        
                                            a
                                        
                                        
                                            
                                                
                                                    1
                                                
                                                
                                                    2
                                                
                                            
                                        
                                    
                                    ×
                                    
                                        
                                            r
                                        
                                        
                                            a
                                        
                                        
                                            
                                                
                                                    -
                                                    1
                                                
                                                
                                                    2
                                                
                                            
                                        
                                    
                                
                            
                        
                    ; in at least one embodiment, there are in total (A×W×H)/(2ik) anchors generated for level i and each organ; (2) RPN predictor: in at least one embodiment, during training, anchors that overlap with ground truth (“GT”) boxes with an IoU above a user-defined threshold are labeled as positive; in at least one embodiment, anchors that have no or little overlap with GT boxes are labeled as negative (for example, overlap below a threshold value) and used to train a binary classifier using cross entropy (“CE”) loss, while positive anchors are used to train a box regressor in order to match them to GT box using smoothed L1 loss”);
program instructions to calculate a penalty value; and program instructions to optimize a loss function of the machine learning model based on the penalty value (Zhao, para. [0072]-[0073]: “in at least one embodiment, instead of doing classification in RPN and ROI head, at least one embodiment directly regresses IoU between anchors/proposals and best matched ground truth box; in at least one embodiment, IoU prediction loss LpredIoU between network output IôU and ground truth IoU is LpredIoU=SmoothedL1(IôU, IOU); in at least one
embodiment, replacing classification loss with LpredIoU has several advantages: (1) for boxes with much higher IôU than threshold, their CLSs are all close to 1 and have difficulty providing guidance on box selection (FIG. 1(b)), whereas IôU provides more helpful guidance; ... in at least one embodiment, L1 loss is used for box regression; in at least one embodiment, L1 does not have a negative linearity with IoU; at least one embodiment uses loss                         
                            
                                
                                    L
                                
                                
                                    D
                                    I
                                    o
                                    U
                                
                            
                            =
                            1
                            -
                            I
                            o
                            U
                            +
                            
                                
                                    
                                        
                                            
                                                
                                                    b
                                                    -
                                                    
                                                        
                                                            b
                                                        
                                                        
                                                            g
                                                            t
                                                        
                                                    
                                                
                                            
                                        
                                        
                                            2
                                        
                                    
                                
                                
                                    
                                        
                                            c
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                    , b and bgt denotes central points of predicted box and ground truth box respectively, and c is diagonal length of smallest enclosing box covering two boxes";                         
                            
                                
                                    
                                        
                                            
                                                
                                                    b
                                                    -
                                                    
                                                        
                                                            b
                                                        
                                                        
                                                            g
                                                            t
                                                        
                                                    
                                                
                                            
                                        
                                        
                                            2
                                        
                                    
                                
                                
                                    
                                        
                                            c
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                     term in equation is the penalty term; see Footnote 1).
Zhao fails to teach 
program instructions to create, on top of a classification of the machine learning model, a first identity vector and a second identity vector based on the one or more modified anchors.
Shlens teaches 
program instructions to create, on top of a classification of the machine learning model, a first identity vector and a second identity vector based on the one or more modified anchors (Shlens, para. [0047], lines 1-28: “in some implementations, to generate the perception output 232, the object detection neural network 270 projects each feature representation included in the feature representations 262 to generate multiple feature vectors for multiple anchor offsets, respectively, and processes the multiple feature vectors to generate an object detection output for each of the multiple anchor offsets; that is, for each proposal location, the neural network 270 generates a respective feature vector for each anchor offset and then processes the feature vector for the anchor offset to generate the object detection output for the anchor offset”; Shlens, para. [0049]: “the loss function used for the
training of these neural networks can be an object detection loss that measures the quality of object detection outputs generated by the these neural networks relative to the ground truth object detection outputs, e.g., smoothed L1 losses for regressed values and cross entropy losses for classification outputs; Shlens, para. [0060], lines 7-12: “embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non transitory storage medium for execution by, or to control the operation of, data processing apparatus).
It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to add the step of creating, on top of a classification of the machine learning model, a first identity vector and a second identity vector based on the one or more modified anchors, as taught by Shlens, to the computer-implemented method for optimizing a loss function associated with image analysis, as taught by Zhao. Further, it would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify the step of calculating a penalty value, as taught by Zhao, by basing it on the first identity and the second identity vector, as taught by Shlens. The penalty value is calculated in Zhao via geometric distances between multiple bounding boxes (i.e. anchors), which is information encapsulated in the first and second vectors of Shlens;
The suggestion/motivation for doing so would have been to provide representing an encoding identity simply via a 1-dimensional vector of scalars rather than complex anchors, and to provide vector comparison which is simpler than anchor comparison.
Therefore, it would have been obvious to combine Zhao with Shlens to obtain the invention as specified in claim 9.
Regarding claim 11, Zhao, in view of Shlens, teaches the computer program product of claim 9, wherein program instructions to represent, using a machine-learning model, one or more anchors associated with the ground truth entities further comprises: program instructions to modify one or more existing anchors with one or more bounding boxes (Zhao, para. [0062]; para. [0063], lines 1-3: “(2) RPN predictor: in at least one embodiment, during training, anchors that overlap with ground truth (“GT”) boxes with an IoU above a user-defined threshold are labeled as positive; in at least one embodiment, anchors that have no or little overlap with GT boxes are labeled as negative (for example, overlap below a threshold value) and used to train a binary classifier using cross entropy (“CE”) loss, while positive anchors are used to train a box regressor in order to match them to GT box using smoothed L1 loss; (3) RPN Box selector: in at least one embodiment, box selector selects appropriate regressed anchors as output of RPN and selected anchors can be called proposals”).
Regarding claim 14, Zhao, in view Shlens, teaches the computer program product of claim 9, wherein program instructions to optimize the loss function based on the penalty value further comprises: program instructions to use the penalty value to minimize the loss function during training of a neural network image detection system (Zhao, page 4, para. [0082]: “comparison methods and evaluation; experiment compares an embodiment of Med R-CNN to an embodiment of Faster R-CNN, with ResNet-50 as backbone, backbone base stem downsample ratio k=4, and ROI matcher IoU threshold=0.4; experiment compares four methods: (1) an embodiment of “Faster_missing”: one Faster R-CNN using all training data, where 66.7% labels are missing; (2) an embodiment of “Faster_no_missing”: three Faster R-CNN, one for each organ, where no label is missing; (Faster R-CNNs has anchor size S=24, aspect ratios being [0.55,0.77,1,1.3,1.8]) (3) an embodiment of “Med_missing”: one Med R-CNN using all of training data, with binary entropy loss and 11 loss as in YoLo v3; (4) an embodiment of “Med_missing_newLoss”: one Med R-CNN using training data, with proposed predIoU loss as well as DIoU loss; in at least one embodiment, 10% of training data is used as validation data in order to decide stopping point of training”).
Regarding claim 15, Zhao teaches a computer system for optimizing a loss function associated with image analysis, the computer system comprising: (Zhao, para. [0053]: “at least one embodiment provides a 3D multi-organ detection algorithm which is robust to training data with incomplete labels and, in some examples, aims to choose box with highest Intersection over Union (“IoU”) rather than highest classification score; at least one embodiment provides features such as: a new parameter-free and efficient anchor generator that improves training of system, a loss function and a training scheme that is tolerant to incomplete labels in training data, and an IoU prediction loss that replaces classification loss”)
one or more computer processors; one or more computer readable storage media; program instructions stored on the one or more computer readable storage media for execution by at least one of the one or more computer processors, the program instructions comprising: (Zhao, para. [0100]: “in at least one embodiment, any portion of code and/or data storage 1101 may be internal or external to one or more processors or other hardware logic devices or circuits; in at least one embodiment, code and/or code and/or data storage 1101 may be cache memory, dynamic randomly addressable memory (“DRAM”), static randomly addressable memory (“SRAM”), non-volatile memory (e.g., Flash memory), or other storage”)
program instructions to identify one or more ground truth entities based on one or more image data (Zhao, para. [0058]: “FIG. 1 illustrates an example of a medical image with a number organs located, in accordance with an embodiment; medical image is a head and neck scan, showing bounding boxes that locate various organs and structures in head; in at least one embodiment, boxes denoted by solid white lines denote ground truth labels for image, and boxes denoted with dashed lines denote organs identified using techniques described herein”);
program instructions to represent, using a machine-learning model, one or more anchors associated with the ground truth entities (Zhao, para. [0061], lines 1-9, para. [0062], lines 1-9: "(1) Anchor generator: In at least one embodiment, RPN (Regional Proposal Network) takes two user-defined parameters: base anchor size S (S=32 in Faster R-CNN (Region Based Convolutional Neural Networks)) and A aspect ratios [r1, r2, . . ., rA] ([0.5,1,2] in some examples); in at least one embodiment, with 2D detectors, for aspect ratio ra, corresponding anchor shape is                         
                            
                                
                                    2
                                
                                
                                    i
                                
                            
                            S
                            
                                
                                    
                                        
                                            r
                                        
                                        
                                            a
                                        
                                        
                                            
                                                
                                                    1
                                                
                                                
                                                    2
                                                
                                            
                                        
                                    
                                    ×
                                    
                                        
                                            r
                                        
                                        
                                            a
                                        
                                        
                                            
                                                
                                                    -
                                                    1
                                                
                                                
                                                    2
                                                
                                            
                                        
                                    
                                
                            
                        
                    ; in at least one embodiment, there are in total (A×W×H)/(2ik) anchors generated for level i and each organ; (2) RPN predictor: in at least one embodiment, during training, anchors that overlap with ground truth (“GT”) boxes with an IoU above a user-defined threshold are labeled as positive; in at least one embodiment, anchors that have no or little overlap with GT boxes are labeled as negative (for example, overlap below a threshold value) and used to train a binary classifier using cross entropy (“CE”) loss, while positive anchors are used to train a box regressor in order to match them to GT box using smoothed L1 loss”);
program instructions to calculate a penalty value; and program instructions to optimize a loss function of the machine learning model based on the penalty value (Zhao, page 3, para. [0072]-[0073]: “in at least one embodiment, instead of doing classification in RPN and ROI head, at least one embodiment directly regresses IoU between anchors/proposals and best matched ground truth box; in at least one embodiment, IoU prediction loss LpredIoU between network output IôU and ground truth IoU is LpredIoU=SmoothedL1(IôU, IOU); in at least one
embodiment, replacing classification loss with LpredIoU has several advantages: (1) for boxes with much higher IôU than threshold, their CLSs are all close to 1 and have difficulty providing guidance on box selection (FIG. 1(b)), whereas IôU provides more helpful guidance; ... in at least one embodiment, L1 loss is used for box regression; in at least one embodiment, L1 does not have a negative linearity with IoU; at least one embodiment uses loss                         
                            
                                
                                    L
                                
                                
                                    D
                                    I
                                    o
                                    U
                                
                            
                            =
                            1
                            -
                            I
                            o
                            U
                            +
                            
                                
                                    
                                        
                                            
                                                
                                                    b
                                                    -
                                                    
                                                        
                                                            b
                                                        
                                                        
                                                            g
                                                            t
                                                        
                                                    
                                                
                                            
                                        
                                        
                                            2
                                        
                                    
                                
                                
                                    
                                        
                                            c
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                    , b and bgt denotes central points of predicted box and ground truth box respectively, and c is diagonal length of smallest enclosing box covering two boxes";                         
                            
                                
                                    
                                        
                                            
                                                
                                                    b
                                                    -
                                                    
                                                        
                                                            b
                                                        
                                                        
                                                            g
                                                            t
                                                        
                                                    
                                                
                                            
                                        
                                        
                                            2
                                        
                                    
                                
                                
                                    
                                        
                                            c
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                     term in equation is the penalty term; see Footnote 1).
Zhao fails to teach 
program instructions to create, on top of a classification of the machine learning model, a first identity vector and a second identity vector based on the one or more modified anchors.
Shlens teaches 
program instructions to create, on top of a classification of the machine learning model, a first identity vector and a second identity vector based on the one or more modified anchors (Shlens, para. [0047], lines 1-28: “in some implementations, to generate the perception output 232, the object detection neural network 270 projects each feature representation included in the feature representations 262 to generate multiple feature vectors for multiple anchor offsets, respectively, and processes the multiple feature vectors to generate an object detection output for each of the multiple anchor offsets; that is, for each proposal location, the neural network 270 generates a respective feature vector for each anchor offset and then processes the feature vector for the anchor offset to generate the object detection output for the anchor offset”; Shlens, para. [0049]: “the loss function used for the training of these neural networks can be an object detection loss that measures the quality of object
detection outputs generated by the these neural networks relative to the ground truth object detection outputs, e.g., smoothed L1 losses for regressed values and cross entropy losses for classification outputs; Shlens, para. [0060], lines 7-12: “embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non transitory storage medium for execution by, or to control the operation of, data processing apparatus).
It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to add the step of creating, on top of a classification of the machine learning model, a first identity vector and a second identity vector based on the one or more modified anchors, as taught by Shlens, to the computer-implemented method for optimizing a loss function associated with image analysis, as taught by Zhao. Further, it would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify the step of calculating a penalty value, as taught by Zhao, by basing it on the first identity and the second identity vector, as taught by Shlens. The penalty value is calculated in Zhao via geometric distances between multiple bounding boxes (i.e. anchors), which is information encapsulated in the first and second vectors of Shlens;
The suggestion/motivation for doing so would have been to provide representing an encoding identity simply via a 1-dimensional vector of scalars rather than complex anchors, and to provide vector comparison which is simpler than anchor comparison.
Therefore, it would have been obvious to combine Zhao with Shlens to obtain the invention as specified in claim 15.
Regarding claim 17, Zhao, in view of Shlens, teaches the computer system of claim 15, wherein program instructions to represent, using a machine-learning model, one or more anchors associated with the ground truth entities further comprises: program instructions to modify one or more existing anchors with one or more bounding boxes (Zhao, para. [0062]; para. [0063], lines 1-3: “(2) RPN predictor: in at least one embodiment, during training, anchors that overlap with ground truth (“GT”) boxes with an IoU above a user-defined threshold are labeled as positive; in at least one embodiment, anchors that have no or little overlap with GT boxes are labeled as negative (for example, overlap below a threshold value) and used to train a binary classifier using cross entropy (“CE”) loss, while positive anchors are used to train a box regressor in order to match them to GT box using smoothed L1 loss; (3) RPN Box selector: in at least one embodiment, box selector selects appropriate regressed anchors as output of RPN and selected anchors can be called proposals”).
Regarding claim 20, Zhao, in view of Shlens, teaches the computer system of claim 15, wherein program instructions to optimize the loss function based on the penalty value further comprises: program instructions to use the penalty value to minimize the loss function during training of a neural network image detection system (Zhao, para. [0082]: “comparison methods and evaluation; experiment compares an embodiment of Med R-CNN to an embodiment of Faster R-CNN, with ResNet-50 as backbone, backbone base stem downsample ratio k=4, and ROI matcher IoU threshold=0.4; experiment compares four methods: (1) an embodiment of “Faster_missing”: one Faster R-CNN using all training data, where 66.7% labels are missing; (2) an embodiment of “Faster_no_missing”: three Faster R-CNN, one for each organ, where no label is missing; (Faster R-CNNs has anchor size S=24, aspect ratios being [0.55,0.77,1,1.3,1.8]) (3) an embodiment of “Med_missing”: one Med R-CNN using all of training data, with binary entropy loss and 11 loss as in YoLo v3; (4) an embodiment of “Med_missing_newLoss”: one Med R-CNN using training data, with proposed predIoU loss as well as DIoU loss; in at least one embodiment, 10% of training data is used as validation data in order to decide stopping point of training”).
Claims 2, 7, 8, 10, and 16 are rejected under 35 U.S.C 103 as being unpatentable over Zhao, in view of Shlens, and further in view of International Patent Application Publication No.: WO 2021126370 (Bengtsson et al.) (hereinafter Bengtsson).
Regarding claim 2, Zhao, in view of Shlens, teaches the computer-implemented method of claim 1.
Zhao, in view of Shlens, fails to teach 
wherein identifying one or more ground truth entities based on image data further comprises: detecting one or more lesions based on the image data; and tagging the one or more lesions as the one or more ground truth entities.
Bengtsson teaches
wherein identifying one or more ground truth entities based on image data further comprises: detecting one or more lesions based on the image data; and tagging the one or more lesions as the one or more ground truth entities (Bengtsson, para. [0127], lines 1-5: “for each of the organ-specific segmentation networks, the input was a 256x256x256 CT volume (of a concatenation of the axial slices from Steps 1-2) resampled to a voxel size of 2x2x2mm; the output of each of the organ-specific segmentation networks was an organ mask of the same size for each organ; ground truth for each network was a 256x256x256 corresponding organ mask with same voxel size”; Bengtsson, para. [0147]-[0148]; para. [0149], lines 1-2: “the detection step utilized a bounding-box detection network, implemented as a RetinaNet, and identified both target-and non-target lesions using bounding boxes and lesion tags ... in the segmentation step, based only on the 2D segmentations of the target lesions, a tumor segmentation network, implemented as a set of probabilistic UNets, produced an ensemble of plausible axial lesion segmentations; tumor segmentation for metastatic cancer subjects is prone to reader subjectivity and thus there may not be a single ground truth for a given lesion”).
It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to add the step of detecting one or more lesions based on the image data; and tagging the one or more lesions as the one or more ground truth entities, as taught by Bengtsson, to the step of identifying one or more ground truth entities based on image data, as taught by Zhao, in view of Shlens.
The suggestion/motivation for doing so would have been to allow the method for optimizing a loss function associated with image analysis to be used in cancer detection for subjects.
Therefore, it would have been obvious to combine Zhao and Shlens, with Bengtsson, to obtain the invention as specified in claim 2.
Regarding claim 7, Zhao, in view of Shlens, teaches the computer-implemented method of claim 1.
Zhao, in view of Shlens, fails to teach 
wherein the one or more image data further comprises of multiple medical images associated with radiology.
Bengtsson teaches 
wherein the one or more image data further comprises of multiple medical images associated with radiology (Bengtsson, para. [0157], lines 1-4: “a training dataset was constructed using RECIST target lesions segmentations from 2 radiologists per scan and 3D segmentation for some scans; the images were resampled to 0.7x0.7mm in-plane resolution and patches of 256x256x3 pixels were constructed around these lesions”).
It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to add multiple medical images associated with radiology, as taught by Bengtsson, to the image data taught by Zhao, in view of Shlens.
The suggestion/motivation for doing so would have been assisting diagnosing and treating diseases such as cancer.
Therefore, it would have been obvious to combine Zhao and Shlens, with Bengtsson, to obtain the invention as specified in claim 7.
Regarding claim 8, Zhao, in view of Shlens, teaches the computer-implemented method of claim 1.
Zhao, in view of Shlens, fails to teach
wherein the one or more ground truth entities further comprises a lesion, cancer and tumor.
Bengtsson teaches
wherein the one or more ground truth entities further comprises a lesion, cancer and tumor (Bengtsson, page 29, para. [0127], lines 1-5: “for each of the organ-specific segmentation networks, the input was a 256x256x256 CT volume (of a concatenation of the axial slices from Steps 1-2) resampled to a voxel size of 2x2x2mm; the output of each of the organ-specific segmentation networks was an organ mask of the same size for each organ; ground truth for each network was a 256x256x256 corresponding organ mask with same voxel size”; Bengtsson, page 35, para. [0147]-[0148]; para. [0149], lines 1-2: “the detection step utilized a bounding-box detection network, implemented as a RetinaNet, and identified both target-and non-target lesions using bounding boxes and lesion tags”).
It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to add lesion, cancer, and tumor, as taught by Bengtsson, to the one or more ground truth entities, as taught by Zhao in view of Shlens.
	The suggestion/motivation for doing so would have been diagnosing and treating diseases such as cancer.
Regarding claim 10, Zhao, in view of Shlens, teaches the computer program product of claim 9.
Zhao, in view of Shlens, fails to teach 
wherein program instructions to identify one or more ground truth entities based on image data further comprises: program instructions to detect one or more lesions based on the image data; and program instructions to tag the one or more lesions as the one or more ground truth entities.
Bengtsson teaches 
wherein program instructions to identify one or more ground truth entities based on image data further comprises: program instructions to detect one or more lesions based on the image data; and program instructions to tag the one or more lesions as the one or more ground truth entities (Bengtsson, para. [0127], lines 1-5: “for each of the organ-specific segmentation networks, the input was a 256x256x256 CT volume (of a concatenation of the axial slices from Steps 1-2) resampled to a voxel size of 2x2x2mm; the output of each of the organ-specific segmentation networks was an organ mask of the same size for each organ; ground truth for each network was a 256x256x256 corresponding organ mask with same voxel size”; Bengtsson, para. [0147]-[0148]; para. [0149], lines 1-2: “the detection step utilized a bounding-box detection network, implemented as a RetinaNet, and identified both target-and non-target lesions using bounding boxes and lesion tags ... in the segmentation step, based only on the 2D segmentations of the target lesions, a tumor segmentation network, implemented as a set of probabilistic UNets, produced an ensemble of plausible axial lesion segmentations; tumor segmentation for metastatic cancer subjects is prone to reader subjectivity and thus there may not be a single ground truth for a given lesion”).
Regarding claim 16, Zhao, in view of Shlens, teaches the computer system of claim 15.
Zhao, in view of Shlens, fails to teach 
wherein program instructions to identify one or more ground truth entities based on image data further comprises: program instructions to detect one or more lesions based on the image data; and program instructions to tag the one or more lesions as the one or more ground truth entities.
Bengtsson teaches
wherein program instructions to identify one or more ground truth entities based on image data further comprises: program instructions to detect one or more lesions based on the image data; and program instructions to tag the one or more lesions as the one or more ground truth entities (Bengtsson, page 29, para. [0127], lines 1-5: “for each of the organ-specific segmentation networks, the input was a 256x256x256 CT volume (of a concatenation of the axial slices from Steps 1-2) resampled to a voxel size of 2x2x2mm; the output of each of the organ-specific segmentation networks was an organ mask of the same size for each organ; ground truth for each network was a 256x256x256 corresponding organ mask with same voxel size”; Bengtsson, page 35, para. [0147]-[0148]; para. [0149], lines 1-2: “the detection step utilized a bounding-box detection network, implemented as a RetinaNet, and identified both target-and non-target lesions using bounding boxes and lesion tags ... in the segmentation step, based only on the 2D segmentations of the target lesions, a tumor segmentation network, implemented as a set of probabilistic UNets, produced an ensemble of plausible axial lesion segmentations; tumor segmentation for metastatic cancer subjects is prone to reader subjectivity and thus there may not be a single ground truth for a given lesion”).
It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to add the step of detecting one or more lesions based on the image data; and tagging the one or more lesions as the one or more ground truth entities, as taught by Bengtsson, to the step of identifying one or more ground truth entities based on image data, as taught by Zhao, in view of Shlens.
The suggestion/motivation for doing so would have been to allow the method for optimizing a loss function associated with image analysis to be used in cancer detection for subjects.
Therefore, it would have been obvious to combine Zhao and Shlens, with Bengtsson, to obtain the invention as specified in claim 16.
Claims 4, 12, and 18 are rejected under 35 U.S.C 103 as being unpatentable over Zhao, in view of Shlens, and further in view of International Patent Application Publication No.: WO 2021138749 (Roshtkhari et al.) (hereinafter Roshtkhari). (see U.S. Provisional Application 62/959,561 attached with Office Action)
Regarding claim 4, Zhao, in view of Shlens, teaches the computer-implemented method of claim 1.
Zhao, in view of Shlens, fails to teach 
wherein creating a first identity vector and a second identity vector based on the one or more anchors further comprises: embedding identity information in the one or more anchors.
Roshtkhari teaches 
wherein creating a first identity vector and a second identity vector based on the one or more anchors further comprises: embedding identity information in the one or more anchors (Roshtkhari, page 6-7, para. [0032]: “figure 4 illustrates examples of metric learning for reidentification with the triplet loss; in the training phase for parameter tuning of the image feature extractor (10), the triplet loss (24) is calculated using three images from the training data (22) at each iteration; for every selected sample, the anchor (26); a positive example (28), which corresponds to the same identity as the anchor image; and a negative sample (30), which corresponds to a different identity than the anchor image (26) is selected; the triplet loss can be calculated to minimize the distance, defined using a mathematical distance function, between the anchor and the positive example image feature and at the same time maximize the distance between the anchor and the negative image features; the distance metric can be any pseudo-distance function such as Euclidean distance; the resulting transformation from the images to the feature vectors for reidentification can be referred to as embedding”).
It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to add the step of embedding identity information in the one or more anchors, as taught by Roshtkhari, to the step of creating a first identity vector and a second identity vector based on the one or more anchors, as taught by Zhao, in view of Shlens.
The suggestion/motivation for doing so would have been to easily know what ground-truth entity is connected to what vector for one or more image data.
Therefore, it would have been obvious to combine Zhao and Shlens, with Roshtkhari, to obtain the invention as specified in claim 4.
Regarding claim 12, Zhao, in view of Shlens, teaches the computer program product of claim 9.
Zhao, in view of Shlens, fails to teach 
wherein program instructions to create a first identity vector and a second identity vector based on the one or more anchors further comprises: program instructions to embed identity information in the one or more anchors.
Roshtkhari teaches 
wherein program instructions to create a first identity vector and a second identity vector based on the one or more anchors further comprises: program instructions to embed identity information in the one or more anchors (Roshtkhari, page 6-7, para. [0032]: “figure 4 illustrates examples of metric learning for reidentification with the triplet loss; in the training phase for parameter tuning of the image feature extractor (10), the triplet loss (24) is calculated using three images from the training data (22) at each iteration; for every selected sample, the anchor (26); a positive example (28), which corresponds to the same identity as the anchor image; and a negative sample (30), which corresponds to a different identity than the anchor image (26) is selected; the triplet loss can be calculated to minimize the distance, defined using a mathematical distance function, between the anchor and the positive example image feature and at the same time maximize the distance between the anchor and the negative image features; the distance metric can be any pseudo-distance function such as Euclidean distance; the resulting transformation from the images to the feature vectors for reidentification can be referred to as embedding”).
It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to add the step of embedding identity information in the one or more anchors, as taught by Roshtkhari, to the step of creating a first identity vector and a second identity vector based on the one or more anchors, as taught by Zhao, in view of Shlens.
The suggestion/motivation for doing so would have been to easily know what ground-truth entity is connected to what vector for one or more image data.
Therefore, it would have been obvious to combine Zhao and Shlens, with Roshtkhari, to obtain the invention as specified in claim 12.
Regarding claim 18, Zhao, in view of Shlens, teaches the computer system of claim 15.
Zhao, in view of Shlens, fails to teach
wherein program instructions to create a first identity vector and a second identity vector based on the one or more anchors further comprises: program instructions to embed identity information in the one or more anchors.
Roshtkhari teaches 
wherein program instructions to create a first identity vector and a second identity vector based on the one or more anchors further comprises: program instructions to embed identity information in the one or more anchors (Roshtkhari, page 6-7, para. [0032]: “figure 4 illustrates examples of metric learning for reidentification with the triplet loss; in the training phase for parameter tuning of the image feature extractor (10), the triplet loss (24) is calculated using three images from the training data (22) at each iteration; for every selected sample, the anchor (26); a positive example (28), which corresponds to the same identity as the anchor image; and a negative sample (30), which corresponds to a different identity than the anchor image (26) is selected; the triplet loss can be calculated to minimize the distance, defined using a mathematical distance function, between the anchor and the positive example image feature and at the same time maximize the distance between the anchor and the negative image features; the distance metric can be any pseudo-distance function such as Euclidean distance; the resulting transformation from the images to the feature vectors for reidentification can be referred to as embedding”).
It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to add the step of embedding identity information in the one or more anchors, as taught by Roshtkhari, to the step of creating a first identity vector and a second identity vector based on the one or more anchors, as taught by Zhao, in view of Shlens.
The suggestion/motivation for doing so would have been to easily know what ground-truth entity is connected to what vector for one or more image data.
Therefore, it would have been obvious to combine Zhao and Shlens, with Roshtkhari, to obtain the invention as specified in claim 18.


Claim Objection
Claims 5, 13, and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, and if rewritten to overcome the rejections under 35 U.S.C. 112(b), set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
The most similar invention in the prior art record to the claimed invention of claims 5, 13, and 19 is U.S. Patent Application Publication No.: 2021/0383533 (Zhao). Zhao describes an object detection system that uses a neural network to identify and/or locate a set of organs in a medical image, and when training to identify and/or locate a particular organ, a subset of incompletely-labeled training images is used that excludes training images for which labels associated with particular organ are unavailable (see Abstract). However, Zhao fails to teach or suggest the limitation: “calculate(ing) the penalty value based on a penalty formula, new distance = original_distance + distance between identity vectors.” Therefore, claims 5, 13, and 19 are allowable under 35 U.S.C. § 102 and § 103.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL A SHARIFF whose telephone number is (571)272-9741.  The examiner can normally be reached on M-TH 7:30 AM EST – 5:30 PM EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SUMATI LEFKOWITZ can be reached at 571-272-3638 or through e-mail at sumati.lefkowitz@uspto.gov.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 

/MICHAEL ADAM SHARIFF/
Examiner, Art Unit 2662
 
/MD K TALUKDER/
Examiner, Art Unit 2648