DETAILED ACTION
Response to Arguments
The amendment filed 3/17/2022 have been entered and made of record.

The Applicant has included newly added claim(s) 21-29.
The application has pending claim(s) 1-9 and 17-29.

In response to the amendments filed on 3/17/2022:
The “Claim rejections under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph” have been entered and therefore the Examiner withdraws the rejections under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph.

Applicant's arguments filed 3/17/2022 have been fully considered but they are not persuasive.
The Applicant alleges, “Independent Claim 17 …” in pages 13-14, and states respectively that the Office Action does not properly analyze independent claim 17 under 35 U.S.C. 112(f) by omitting how the cited art used to reject independent claim 17 under 35 U.S.C. 102 teaches the structure of the “step for” limitation recited in the specification as filed.  Firstly the Examiner disagrees because the claim limitation “a step for utilizing a multi-modal object selection neural network to generate an object segmentation output corresponding to the target object based on the plurality of user inputs” as recited in lines 7-8 of claim 17 were not and are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph because the claim limitation(s) recite(s) sufficient structure, materials, or acts to entirely perform the recited function.  Because this/these claim limitation(s) is/are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are not being interpreted to cover only the corresponding structure, material, or acts described in the specification as performing the claimed function, and equivalents thereof.  If applicant intends to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to remove the structure, materials, or acts that performs the claimed function; or (2) present a sufficient showing that the claim limitation(s) does/do not recite sufficient structure, materials, or acts to perform the claimed function.  Secondly the Examiner disagrees because the prior art rejection nevertheless cited e.g. page 26 at lines 8-12 and page 49 at lines 22-40 of the prior art reference Wang which teaches the structure and equivalents thereof for performing the utilization step respectively.  Further discussions are addressed in the prior art rejection section below.  Therefore claims 17-20 are still not in condition for allowance because they are still not patentably distinguishable over the prior art reference(s).

The Applicant's arguments with respect to claims 1-8 and 21-29 have been considered but are moot in view of the new ground(s) of rejection because the Applicant has amended at least independent claim 1 and has added new claims 21-29.
Applicant’s arguments, see “Independent Claim 1 …” in pages 12-13 and “Independent Claim 21 …” in pages 14-15, filed 3/17/2022, with respect to the rejection(s) of claim(s) 1-8 under 35 U.S.C. 102 have been fully considered and are persuasive.  Therefore the rejections have been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made at least in further view of Guo et al (“Deep Learning-Based Image Segmentation on Multimodal Medical Imaging” – IEEE – March 1, 2019 – pages 162-169, previously cited on the PTO-892 dated 12/29/2021).  Further discussions are addressed in the prior art rejection section below.  Therefore claims 1-8 and 21-29 are not in condition for allowance because they are not patentably distinguishable over the prior art references.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 17-20 is/are rejected under 35 U.S.C. 102(a)(1) / 102(a)(2) as being anticipated by Wang et al (WO 2018/229490 A1, as applied in previous Office Action).
Re Claim 17: Wang discloses in a digital medium environment for editing digital visual media, a computer-implemented method of identifying digital objects portrayed within the digital visual media using a multi-modal deep learning network, the computer-implemented method comprising: identifying a digital image portraying a target object (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, the testing image); receiving a plurality of user inputs corresponding to a plurality of input modalities (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, identifies a first and second user input on the testing image via at least the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles); and a step for utilizing a multi-modal object selection neural network to generate an object segmentation output corresponding to the target object based on the plurality of user inputs (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, identifies a first and second user input on the testing image via at least the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles, processed by Neural Network to generate a final refined segmentation).

Re Claim 18: Wang further discloses wherein the plurality of input modalities comprise at least two of: a regional input modality, a boundary input modality, or a verbal input modality (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, identifies a first and second user input on the testing image via at least the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles).

Re Claim 19: Wang further discloses wherein the plurality of user inputs comprise a first user input corresponding to a regional input modality and a second user input corresponding to a boundary input modality (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, identifies a first and second user input on the testing image via at least the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles).

Re Claim 20: Wang further discloses wherein the first user input corresponding to the regional input modality comprises a user input of a foreground pixel or a user input of a background pixel and wherein the second user input corresponding to the boundary input modality comprises a user input of an edge pixel (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, identifies a first and second user input on the testing image via at least the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-8 and 21-29 is/are rejected under 35 U.S.C. 103 as being unpatentable over Guo et al (“Deep Learning-Based Image Segmentation on Multimodal Medical Imaging” – IEEE – March 1, 2019 – pages 162-169, previously cited on the PTO-892 dated 12/29/2021) in view of Wang et al.
Re Claim 1: Guo discloses a GPU system (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14) to generate a first neural network input from the first user input corresponding to the first input modality and a second neural network input from the second user input corresponding to the second input modality (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14, each of the multiple modality image inputs each with their own manual human annotation ground truth’s are transformed into a tensor); and generate an object segmentation from the digital image by utilizing a first input channel of a multi-modal object selection neural network to analyze the first neural network input corresponding to the first input modality and a second input channel of the multi-modal object selection neural network to analyze the second neural network input corresponding to the second input modality (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14, each of the multiple modality image inputs each with their own manual human annotation ground truth’s are transformed into a tensor and then input / processed by the multimodal neural network to generate a tumor segmentation output).
	Although Guo discloses multiple modality image inputs each with their own manual human annotation ground truth’s (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14), Guo however fails to explicitly disclose that such manual human annotations are generated by: identify, for a digital image, a first user input corresponding to a first input modality, the first input modality comprising one of a regional input modality, a boundary input modality, a language input modality, or a bounding box input modality; identify, for the digital image, a second user input corresponding to a second input modality, the second input modality comprising another of a regional input modality, a boundary input modality, a language input modality, or a bounding box input modality, wherein the second input modality differs from the first input modality.  Guo also fails to explicitly disclose a non-transitory computer-readable medium storing instructions that, when executed by at least one processor, cause a computer system to perform the algorithm.
	Wang discloses a non-transitory computer-readable medium storing instructions that, when executed by at least one processor, cause a computer system to (see Wang, e.g. page 26 at lines 8-12, page 49 at lines 22-40): identify, for a digital image, a first user input corresponding to a first input modality, the first input modality comprising one of a regional input modality, a boundary input modality, a language input modality, or a bounding box input modality (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, identifies a first and second user input on the testing image via one of the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles); identify, for the digital image, a second user input corresponding to a second input modality, the second input modality comprising another of a regional input modality, a boundary input modality, a language input modality, or a bounding box input modality, wherein the second input modality differs from the first input modality (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, identifies a first and second user input on the testing image via one of the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles).
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Guo’s system using Wang’s teachings by including the computer system to Guo’s system and by including the first and second user input as Guo’s ground truth inputs for the multimodal neural network in order to improve the segmentation output by providing improved user annotation interaction capabilities (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40). 

Re Claim 2: Guo as modified by Wang further discloses further comprising instructions that, when executed by the at least one processor, cause the computer system to (see Wang, e.g. page 26 at lines 8-12, page 49 at lines 22-40): generate the first neural network input from the first user input by generating a first distance map reflecting distances between pixels of the digital image and the first user input corresponding to the first input modality (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 13 at lines 1-9 and 32-41, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, identifies a first and second user input on the testing image via one of the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles - distance maps are created for each segment [e.g. foreground, background] and represent the distance from the pixel to the user provided input indications); and generate the second neural network input from the second user input by generating a second distance map reflecting distances between the pixels of the digital image and the second user input corresponding to the second input modality (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 13 at lines 1-9 and 32-41, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, identifies a first and second user input on the testing image via one of the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles - distance maps are created for each segment [e.g. foreground, background] and represent the distance from the pixel to the user provided input indications).  See claim 1 for obviousness and motivation statements.

Re Claim 3: Guo as modified by Wang further discloses further comprising instructions that, when executed by the at least one processor, cause the computer system to (see Wang, e.g. page 26 at lines 8-12, page 49 at lines 22-40): generate a third neural network input from colors of the digital image (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14, the PET color coded modality image input with its own manual human annotation ground truth is transformed into a tensor); and generate the object segmentation from the digital image by utilizing a third input channel of the multi-modal object selection neural network to analyze third neural network input from the colors of the digital image (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14, each of the multiple modality image inputs [including the PET color coded modality image input] each with their own manual human annotation ground truth’s are transformed into a tensor and then input / processed by the multimodal neural network to generate a tumor segmentation output).  See claim 1 for obviousness and motivation statements.

Re Claim 4: Wang further discloses wherein: the first input modality comprises a regional input modality (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, identifies a first and second user input on the testing image via one of the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles); the first user input indicates a first position relative to a target object portrayed in a digital image (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, identifies a first and second user input on the testing image via one of the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles); the second input modality comprises a boundary input modality (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, identifies a first and second user input on the testing image via one of the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles); and the second user input indicates a second position relative to the target object portrayed in the digital image (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, identifies a first and second user input on the testing image via one of the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles).  See claim 1 for obviousness and motivation statements.

Re Claim 5: Wang further discloses further comprising instructions that, when executed by the at least one processor, cause the computer system to (see Wang, e.g. page 26 at lines 8-12, page 49 at lines 22-40) provide for display, via a user interface, a plurality of input modality selectable elements comprising at least one regional input modality element and at least one boundary input modality element (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, GUI for user interactions: input on the displayed testing image on the GUI via at least the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles).
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Guo’s system using Wang’s teachings by including the GUI to Guo’s system in order to improve the segmentation output by providing improved user annotation interaction capabilities (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40). 

Re Claim 6: Wang further discloses further comprising instructions that, when executed by the at least one processor, cause the computer system to (see Wang, e.g. page 26 at lines 8-12, page 49 at lines 22-40): identify the first user input corresponding to the regional input modality by identifying a first user interaction with the at least one regional input modality element and a first selection of a pixel within the digital image (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, identifies a first and second user input on the testing image via one of the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles – GUI for user interactions); and identifying the second user input corresponding to the boundary input modality by identifying a second user interaction with the at least one boundary input modality element and a second selection of a pixel within the digital image (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, identifies a first and second user input on the testing image via one of the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles – GUI for user interactions).  See claim 5 for obviousness and motivation statements.

Re Claim 7: Guo as modified by Wang further discloses further comprising instructions that, when executed by the at least one processor, cause the computer system to (see Wang, e.g. page 26 at lines 8-12, page 49 at lines 22-40): utilize the multi-modal object selection neural network to generate an initial object segmentation based on the first user input corresponding to the regional input modality (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14, each of the multiple modality image inputs each with their own manual human annotation ground truth’s [as modified by Wang to include a first and second user input on the testing image via one of the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles] are transformed into a tensor and then input / processed by the multimodal neural network to generate a tumor segmentation output); and provide the initial object segmentation for display with the digital image (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, page 57 at lines 22-24, the segmentation output is provided to a display).
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Guo’s system using Wang’s teachings by including the display to Guo’s system in order to improve the user interaction experience by providing the user a display of the segmentation output (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, page 57 at lines 22-24). 

Re Claim 8: Guo as modified by Wang further discloses further comprising instructions that, when executed by the at least one processor, cause the computer system to (see Wang, e.g. page 26 at lines 8-12, page 49 at lines 22-40): in response to identifying the second user input corresponding to the boundary input modality, utilize the multi-modal object selection neural network to generate the object segmentation based on the first user input corresponding to the regional input modality and the second user input corresponding to the boundary input modality (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14, each of the multiple modality image inputs each with their own manual human annotation ground truth’s [as modified by Wang to include a first and second user input on the testing image via one of the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles] are transformed into a tensor and then input / processed by the multimodal neural network to generate a tumor segmentation output); and provide the object segmentation for display with the digital image (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, page 57 at lines 22-24, the segmentation output is provided to a display).  See claim 7 for obviousness and motivation statements.

Re Claim 21: Guo discloses a system comprising: one or more processors configured to cause the system to (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14, GPU system): generate an object segmentation by utilizing a first input channel of a multi-modal object selection neural network to analyze the first user input corresponding to the first input modality and utilizing a second input channel of the multi-modal object selection neural network to analyze the second user input corresponding to the second input modality (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14, each of the multiple modality image inputs each with their own manual human annotation ground truth’s are transformed into a tensor and then input / processed by the multimodal neural network to generate a tumor segmentation output).
Although Guo discloses multiple modality image inputs each with their own manual human annotation ground truth’s (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14), Guo however fails to explicitly disclose that such manual human annotations are generated by: identify, for the digital image, a first user input corresponding to a first input modality, the first input modality comprising one of a regional input modality, a boundary input modality, a language input modality, or a bounding box input modality; identify, for the digital image, a second user input corresponding to a second input modality, the second input modality comprising another of a regional input modality, a boundary input modality, a language input modality, or a bounding box input modality, wherein the second input modality differs from the first input modality.  Guo also fails to explicitly disclose one or more memory devices comprising a digital image.
Wang discloses one or more memory devices comprising a digital image (see Wang, e.g. page 26 at lines 8-12, page 49 at lines 22-40); one or more processors configured to cause the system to (see Wang, e.g. page 26 at lines 8-12, page 49 at lines 22-40): identify, for the digital image, a first user input corresponding to a first input modality, the first input modality comprising one of a regional input modality, a boundary input modality, a language input modality, or a bounding box input modality (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, identifies a first and second user input on the testing image via one of the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles); identify, for the digital image, a second user input corresponding to a second input modality, the second input modality comprising another of a regional input modality, a boundary input modality, a language input modality, or a bounding box input modality, wherein the second input modality differs from the first input modality (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, identifies a first and second user input on the testing image via one of the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles).
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Guo’s system using Wang’s teachings by including the computer memory device system to Guo’s system and by including the first and second user input as Guo’s ground truth inputs for the multimodal neural network in order to improve the segmentation output by providing improved user annotation interaction capabilities (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40).

Re Claim 22: Guo further discloses wherein the one or more processors are further configured to cause the system to (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14, GPU system): generate a first neural network input from the first user input corresponding to the first input modality (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14, each of the multiple modality image inputs each with their own manual human annotation ground truth’s are transformed into a tensor); and generate a second neural network input from the second user input corresponding to the second input modality (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14, each of the multiple modality image inputs each with their own manual human annotation ground truth’s are transformed into a tensor).

Re Claim 23: Guo as modified by Wang further discloses wherein the one or more processors are further configured to cause the system to (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14, GPU system) generate the first neural network input by generating a distance map reflecting distances between pixels of the digital image and the first user input corresponding to the first input modality (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 13 at lines 1-9 and 32-41, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, identifies a first and second user input on the testing image via one of the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles - distance maps are created for each segment [e.g. foreground, background] and represent the distance from the pixel to the user provided input indications).  See claim 21 for obviousness and motivation statements.

Re Claim 24: Guo further discloses wherein the one or more processors are further configured to cause the system to (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14, GPU system) generate the object segmentation by utilizing the first input channel of the multi-modal object selection neural network to analyze the first neural network input and utilizing the second input channel of the multi-modal object selection neural network to analyze the second neural network input (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14, each of the multiple modality image inputs each with their own manual human annotation ground truth’s are transformed into a tensor and then input / processed by the multimodal neural network to generate a tumor segmentation output).

Re Claim 25: Guo further discloses wherein the one or more processors are further configured to (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14, GPU system): generate a third neural network input from colors of the digital image (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14, the PET color coded modality image input with its own manual human annotation ground truth is transformed into a tensor); and generate the object segmentation by utilizing a third input channel of the multi-modal object selection neural network to analyze the third neural network input (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14, each of the multiple modality image inputs [including the PET color coded modality image input] each with their own manual human annotation ground truth’s are transformed into a tensor and then input / processed by the multimodal neural network to generate a tumor segmentation output).

Re Claim 26: Wang further discloses wherein the one or more processors are further configured to cause the system to provide for display (see Wang, e.g. page 26 at lines 8-12, page 49 at lines 22-40), via a user interface , a plurality of input modality selectable elements comprising two or more of: a regional input modality element, a boundary input modality element, a language input modality element, or a bounding box input modality element (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, GUI for user interactions: input on the displayed testing image on the GUI via at least the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles).
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Guo’s system using Wang’s teachings by including the GUI to Guo’s system in order to improve the segmentation output by providing improved user annotation interaction capabilities (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40).

Re Claim 27: Wang further discloses wherein the one or more processors are further configured to cause the system to (see Wang, e.g. page 26 at lines 8-12, page 49 at lines 22-40) identify the first user input by receiving an indication of a user selection of a first input modality selectable element from the plurality of input modality selectable elements and one or more pixels of the digital image (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, identifies a first and second user input on the testing image via one of the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles – GUI for user interactions).  See claim 26 for obviousness and motivation statements.

Re Claim 28: Wang further discloses wherein the one or more processors are further configured to cause the system to (see Wang, e.g. page 26 at lines 8-12, page 49 at lines 22-40) identify the second user input by receiving an additional indication of an additional user selection of a second input modality selectable element from the plurality of input modality selectable elements and an additional one or more pixels of the digital image (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, identifies a first and second user input on the testing image via one of the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles – GUI for user interactions).  See claim 26 for obviousness and motivation statements.

Re Claim 29: Guo as modified by Wang further discloses wherein the one or more processors are further configured to (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14, GPU system): identify, for the digital image, a third user input corresponding to a third input modality different than the first input modality and the second input modality (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40, identifies a first, second, and third user input on the testing image via one of the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles) (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14, each of the three modality image inputs each with their own manual human annotation ground truth’s [as modified by Wang to include via one of the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles] are transformed into a tensor); and generate the object segmentation by utilizing a third input channel of the multi-modal object selection neural network to analyze the third user input corresponding to the third input modality (see Guo, Figs. 1 and 2, Sections B and C, and also Section E at line 14, each of the three modality image inputs each with their own manual human annotation ground truth’s [as modified by Wang to include via one of the user provided foreground and background labels, the user provided bounding box, and the user provided scribbles] are transformed into a tensor and then input / processed by the multimodal neural network to generate a tumor segmentation output).  
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Guo’s system using Wang’s teachings by including the first, second, and third user input as Guo’s ground truth inputs for the multimodal neural network in order to improve the segmentation output by providing improved user annotation interaction capabilities (see Wang, page 2 at lines 36-42, page 9 at lines 26-31, page 11 at lines 17-19 and 37-42, page 12 at lines 1-29, page 18 at lines 33-37, page 26 at lines 8-12, page 29 at lines 26-33, page 30 at lines 12-30, page 32 at lines 27-37, page 49 at lines 22-40).

Allowable Subject Matter
Claim 9 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Applicant's amendment [with regards to claims 1-8 and 21-29] necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BERNARD KRASNIC whose telephone number is (571)270-1357.  The examiner can normally be reached on Mon. - Thur. and every other Friday from 8am - 4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached on (571) 272-8243.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Bernard Krasnic/Primary Examiner, Art Unit 2661                                                                                                                                                                                                        June 8, 2022