Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 07/28th/2022. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Amendments
Applicant’s amendments, see Remarks page 8, filed 07/28th/2022, with respect to the specification objection have been fully considered and are persuasive. The specification objection has been withdrawn.

Applicant’s amendments, see Remarks page 8, filed 07/28th/2022, with respect to claims 2-13 objection have been fully considered and are persuasive. The claim objection has been withdrawn.

Applicant’s amendments, see Remarks pages 8-10, filed 07/28th/2022, with respect to claims 1-8, 10-12, 14-15 rejection under 35 U.S.C. § 102 have been fully considered and are moot in light of the new rejection shown below.

Applicant’s amendments, see Remarks pages 10, filed 07/28th/2022, with respect to claims 9 and 13 rejection under 35 U.S.C. § 103 have been fully considered and are moot in light of the new rejection shown below.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-8, 10-12, and 14-15 are rejected under 35 U.S.C. 102(b)(1) as being anticipated by Wang (Gaze latent support vector machine for image classification improved by weakly supervised region selection) in view of Kurzhals (Gaze Stripes Image-Based Visualization of Eye Tracking Data).

Regarding claim 1, Wang teaches A system for training a neural network model, the system comprising: a memory comprising instruction data representing a set of instructions;  a processor configured to communicate with the memory and to execute the set of instructions, wherein the set of instructions, when executed by the processor, cause the processor to: acquire training data, the training data comprising: annotated data, an annotation for the annotated data as determined by a user and auxiliary data, the auxiliary data describing first locations of interest and second locations of interest in the annotated data, as considered by the user when determining the annotation for the annotated data; ([Page 64, Para 04] UPMC-G20 content. UPMC-G20 is a food-related gaze annotated dataset based on a multi-modal large scale food dataset UPMC-food 101 [40] . We select 20 food categories from UPMC- food 101, resulting in 2,000 images. The images selected do not contain text, because it’s verified that texts attract attention most [59]. For each image, about 15 fixations across 3 subjects (in average) with a total duration of 2.5s are collected. In total, we have collected 31104 fixations. The examiner notes that Wang teaches a training dataset (data) that is annotated based on a multi-modal large scale food dataset UPMC-food 101 (annotation for the data as determined by a user) and for each image, about 15 fixations across 3 subjects (in average) with a total duration of 2.5s are collected (auxiliary data, the auxiliary data describing at least one location of interest in the data, as considered by the user when determining the annotation for the data)).
train the neural network model using the training data, wherein causing the processor to train the neural network model comprises causing the processor to: minimise an auxiliary loss function that compares the first and second locations of interest to an output of one or more layers of the neural network model ([Page 61, Para 06] This model generalizes latent SVM [15] by biasing the selection of latent regions based on the gaze information during the training scheme. The training objective of G + LSVM is as follows:

    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale
where zi is the region with the maximum total duration of fixations, 
interpreted as the relevant region selected by our model. For each training example, Eq. (3) includes a classification hinge loss and a gaze loss δg , with a scalar trade-off parameter γ≥0. The examiner notes that Wang teaches a gaze loss function (δg) that compares a region based on gaze info to the region interpreted to be that region by the model as the auxiliary loss).
minimise a main loss function that compares the annotation for the annotated data as determined by the user to an annotation produced by the neural network model, and wherein causing the processor to minimise the auxiliary loss function comprises causing the processor to update weights of the neural network model so as to give increased significance to the first locations of interest as compared to the second locations of interest based on the first locations of interest being considered by the user during the one or more of the initial time interval or the final time interval, and the second locations of interest being considered by the user during the middle time interval ([Page 61, Para 06] This model generalizes latent SVM [15] by biasing the selection of latent regions based on the gaze information during the training scheme. The training objective of G + LSVM is as follows:

    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale
where zi is the region with the maximum total duration of fixations, 
interpreted as the relevant region selected by our model. For each training example, Eq. (3) includes a classification hinge loss and a gaze loss δg , with a scalar trade-off parameter γ≥0. The examiner notes that Wang teaches a classification hinge loss function                         
                            
                                
                                    ∑
                                    
                                        i
                                        =
                                        1
                                    
                                    
                                        n
                                    
                                
                                
                                    
                                        
                                            ∆
                                        
                                        
                                            c
                                        
                                    
                                
                            
                        
                    used to generalize a model that selects relevant regions as predicted by the model as the main loss function. The examiner also notes that Wang teaches a gaze loss function that is minimized based on the selected areas of interest according to the gaze information).
	However, Wang fails to explicitly teach wherein the first locations of interest are considered by the user during one or more of an initial time interval when determining the annotation for the annotated data or a final time interval when determining the annotation for the annotated data, and further wherein the second locations of interest are considered by the user during a middle time interval when determining the annotation for the annotated data.
	On the other hand, Kurzhals teaches wherein the first locations of interest are considered by the user during one or more of an initial time interval when determining the annotation for the annotated data or a final time interval when determining the annotation for the annotated data, and further wherein the second locations of interest are considered by the user during a middle time interval when determining the annotation for the annotated data ([Page 1007, Fig. 2] The examiner notes that Kurzhals teaches in Fig. 2(b) a scarf plot showing the locations of interest indicated by the gaze of a participant in a time step sequence where the participant showed interest in a sky location followed by sea locations then followed again by sky locations. The examiner also notes that Wang and Kurzhals are both considered to be analogous because they are in the same field of image classification. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wang’s image classifier to incorporate wherein the first locations of interest are considered by the user during one or more of an initial time interval when determining the annotation for the annotated data or a final time interval when determining the annotation for the annotated data, and further wherein the second locations of interest are considered by the user during a middle time interval when determining the annotation for the annotated data as taught by Kurzhals [Page 1007, Fig. 2] to reduce the number of displayed time steps on demand, in order to obtain a better overview [Page 1006, Section 3.1]).

    PNG
    media_image3.png
    743
    578
    media_image3.png
    Greyscale



Regarding claim 2, Wang teaches The system as in claim 1, wherein the auxiliary data comprises eye gaze data and the first and second locations of interest comprises at least one location in the annotated data observed by the user when determining the annotation for the annotated data. ([Page 64, Para 04] UPMC-G20 content. UPMC-G20 is a food-related gaze annotated dataset based on a multi-modal large scale food dataset UPMC-food 101 [40] . We select 20 food categories from UPMC- food 101, resulting in 2,000 images. The images selected do not contain text, because it’s verified that texts attract attention most [59]. For each image, about 15 fixations across 3 subjects (in average) with a total duration of 2.5s are collected. In total, we have collected 31104 fixations. The examiner notes that Wang teaches training datasets that have been annotated by users [Page 61, Figure 2, LHS] and contain data about multiple fixations in each image).

Regarding claim 3, Wang teaches The system as in claim 2 wherein the eye gaze data comprises one or more of: information indicative of which portions of the annotated data the user looked at when determining the annotation for the annotated data; information indicative of an amount of time the user spent looking at each portion of the annotated data when determining the annotation for the annotated data; and information indicative of the order in which the user looked at different portions of the data when determining the annotation for the annotated data. ([Page 64, Para 04] UPMC-G20 content. UPMC-G20 is a food-related gaze annotated dataset based on a multi-modal large scale food dataset UPMC-food 101 [40] . We select 20 food categories from UPMC- food 101, resulting in 2,000 images. The images selected do not contain text, because it’s verified that texts attract attention most [59]. For each image, about 15 fixations across 3 subjects (in average) with a total duration of 2.5s are collected. In total, we have collected 31104 fixations. The examiner notes that Wang teaches measuring the duration of user fixation on certain image regions of interest).

Regarding claim 4, Wang teaches The system as in claim 1, wherein causing the processor to minimise the auxiliary loss function comprises causing the processor to update weights of the model so as to give increased significance to the first and second locations of interest in the annotated data, compared to locations in the annotated data that are not locations of interest. ([Page 61, Para 6] The novelty in our training scheme is the introduction of a gaze loss δg defined as: 
    PNG
    media_image4.png
    200
    400
    media_image4.png
    Greyscale
where g(xi, z) is the density of fixations in the region z for image xi. The examiner notes that Wang teaches the use of a loss function that utilizes the density of fixations in a region in an image).

Regarding claim 5, Wang teaches The system as in claim 1 wherein causing the processor to minimise the auxiliary loss function comprises causing the processor to update weights of the model so as to give increased significance to locations of interest considered by the user for longer periods of time compared to locations of interest that are considered by the user for shorter periods of time. ([Page 61, Para 5] This model generalizes latent SVM [15] by biasing the selection of latent regions based on the gaze information during the training scheme. The training objective of G + LSVM is as follows:
 
    PNG
    media_image5.png
    200
    400
    media_image5.png
    Greyscale

where zi is the region with the maximum total duration of fixations. The examiner notes that Wang uses a loss function that accounts for the region with the maximum total duration of fixations).

Regarding claim 6, Wang teaches The system as in claim 1 wherein causing the processor to minimise the auxiliary loss function comprises causing the processor to update weights of the model so as to give increased significance to locations of interest in the annotated data that are considered a plurality of times by the user when determining the annotation for the annotated data. ([Page 61, Para 06] This model generalizes latent SVM [15] by biasing the selection of latent regions based on the gaze information during the training scheme. The examiner notes that Wang teaches model generalization based on region selection using gaze information. Wang also teaches as shown previously in [Page 64, Para. 04] and [Page 61, Figure 2] that multiple fixations per image are recorded).

Regarding claim 7, Wang teaches The system as in claim 1 wherein the auxiliary data comprises image data, image components of the image data corresponding to a portion of the annotated data ([Page 64, Para 04] UPMC-G20 content. UPMC-G20 is a food-related gaze annotated dataset based on a multi-modal large scale food dataset UPMC-food 101 [40] . We select 20 food categories from UPMC- food 101, resulting in 2,000 images. The examiner notes that Wang teaches the selection of datasets that include data describing regions within the image based on gaze data [Page 61, Fig. 2]).

Regarding claim 8, Wang teaches The system as in claim 7, wherein the image data comprises a heat map, and wherein values of image components in the heat map are correlated with whether each image component corresponds to a location of interest in the annotated data and/or a duration that the user spent considering each corresponding location of the annotated data when determining the annotation for the annotated data. ([Page 61, Para. 6] Fig. 2 illustrates the proposed gaze loss. In this example, when the color of heatmap is closer to red, the density of gaze is higher. The region contains the maximum density of gaze is shown as zi (shown as the green rectangle). The gaze loss of zi is thus defined as 0. The red region z1 contains a smaller density of gaze with respect to the blue region z2, leading to a larger gaze loss).

Regarding claim 10, Wang teaches The system as in claim 1 wherein causing the processor to minimise the auxiliary loss function comprises causing the processor to compare the auxiliary data to an output of one or more dense layers of the model. ([Page 61, Para 06] region-specific labels are unknown during training. Our prediction takes the maximum score over the latent variables:

    PNG
    media_image6.png
    200
    400
    media_image6.png
    Greyscale

The examiner notes that the inner product multiplication of W and Φ is interpreted by the examiner to be a dense layer).

Regarding claim 11, Wang teaches The system as in claim 1 wherein causing the processor to train the model comprises causing the processor to minimise one or more of: the auxiliary loss function and the main loss function in parallel; the auxiliary loss function before minimising the main loss function; and the auxiliary loss function to within a predetermined threshold, after which the model is further trained using the main loss function. ([Page 61, Eq (3)] The examiner notes that Wang minimizes the sum of hinge and gaze loss functions being minimize, i.e. minimizing the loss functions in parallel).

Regarding claim 12, Wang teaches The system as in claim 1 wherein the set of instructions, when executed by the processor, further cause the processor to: calculate a combined loss function, the combined loss function comprising a weighted combination of the main loss function and the auxiliary loss function ([Page 61, Para. 7] This model generalizes latent SVM [15] by biasing the selection of latent regions based on the gaze information during the training scheme. The training objective of G + LSVM is as follows:

    PNG
    media_image7.png
    200
    400
    media_image7.png
    Greyscale

The examiner notes that Wang teaches a combined loss function that comprises a weighted sum of a classification hinge loss function and a gaze loss function).
	adjust one or more weights associated with the weighted combination of the combined loss function, so as to change the emphasis of the training between minimising the main loss function and minimising the auxiliary loss function. ([Page 68, Para. 1] We investigate the impact of the three hyper-parameters in our model: trade-off parameters γ+, γ− and k. The impact of the parameter γ+ of G+LSVM is shown in Fig. 9 for small scale 50%, with k set to be 1. The performances in Fig. 9 are shown on average for all categories. For all three datasets, mAP reaches the peak when γ+ is in the interval [0.1, 0.3]. Note that when γ+ gets too high, mAP gets even lower than not adding gaze ( Fig. 9). The examiner notes that Wang teaches trade-off parameter γ which examines the trade-off effects between the two loss functions that make up the combined loss function).

Regarding claim 14, Wang teaches A method of training a neural network model, the method comprising: acquiring training data, the training data comprising: annotated data, an annotation for the annotated data as determined by a user and auxiliary data, the auxiliary data describing first locations of interest and second locations of interest in the annotated data, as considered by the user when determining the annotation for the annotated data; ([Page 64, Para 04] UPMC-G20 content. UPMC-G20 is a food-related gaze annotated dataset based on a multi-modal large scale food dataset UPMC-food 101 [40] . We select 20 food categories from UPMC- food 101, resulting in 2,000 images. The images selected do not contain text, because it’s verified that texts attract attention most [59]. For each image, about 15 fixations across 3 subjects (in average) with a total duration of 2.5s are collected. In total, we have collected 31104 fixations. The examiner notes that Wang teaches a training dataset (annotated data) that is annotated based on a multi-modal large scale food dataset UPMC-food 101 (annotation for the annotated data as determined by a user) and for each image, about 15 fixations across 3 subjects (in average) with a total duration of 2.5s are collected (auxiliary data, the auxiliary data describing first locations of interest and second locations of interest in the annotated data, as considered by the user when determining the annotation for the annotated data)).
training the neural network model using the training data, wherein causing the processor to train the neural network model comprises causing the processor to: minimise an auxiliary loss function that compares the first and second locations of interest to an output of one or more layers of the neural network model ([Page 61, Para 06] This model generalizes latent SVM [15] by biasing the selection of latent regions based on the gaze information during the training scheme. The training objective of G + LSVM is as follows:

    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale
where zi is the region with the maximum total duration of fixations, 
interpreted as the relevant region selected by our model. For each training example, Eq. (3) includes a classification hinge loss and a gaze loss δg , with a scalar trade-off parameter γ≥0. The examiner notes that Wang teaches a gaze loss function (δg) that compares a region based on gaze info to the region interpreted to be that region by the model as the auxiliary loss).
minimising a main loss function that compares the annotation for the data as determined by the user to an annotation produced by the neural network model, and wherein minimising the auxiliary loss function comprises includes updating weights of the neural network model so as to give increased significance to the first locations of interest as compared to the second locations of interest based on the first locations of interest being considered by the user during the one or more of the initial time interval or the final time interval, and the second locations of interest being considered by the user during the middle time interval ([Page 61, Para 06] This model generalizes latent SVM [15] by biasing the selection of latent regions based on the gaze information during the training scheme. The training objective of G + LSVM is as follows:

    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale
where zi is the region with the maximum total duration of fixations, 
interpreted as the relevant region selected by our model. For each training example, Eq. (3) includes a classification hinge loss and a gaze loss δg , with a scalar trade-off parameter γ≥0. The examiner notes that Wang teaches a classification hinge loss function                         
                            
                                
                                    ∑
                                    
                                        i
                                        =
                                        1
                                    
                                    
                                        n
                                    
                                
                                
                                    
                                        
                                            ∆
                                        
                                        
                                            c
                                        
                                    
                                
                            
                        
                    used to generalize a model that selects relevant regions as predicted by the model as the main loss function. The examiner also notes that Wang teaches a gaze loss function that is minimized based on the selected areas of interest according to the gaze information).
	However, Wang fails to explicitly teach wherein the first locations of interest are considered by the user during one or more of an initial time interval when determining the annotation for the annotated data or a final time interval when determining the annotation for the annotated data, and further wherein the second locations of interest are considered by the user during a middle time interval when determining the annotation for the annotated data.
	On the other hand, Kurzhals teaches wherein the first locations of interest are considered by the user during one or more of an initial time interval when determining the annotation for the annotated data or a final time interval when determining the annotation for the annotated data, and further wherein the second locations of interest are considered by the user during a middle time interval when determining the annotation for the annotated data ([Page 1007, Fig. 2] The examiner notes that Kurzhals teaches in Fig. 2(b) a scarf plot showing the locations of interest indicated by the gaze of a participant in a time step sequence where the participant showed interest in a sky location followed by sea locations then followed again by sky locations. The examiner also notes that Wang and Kurzhals are both considered to be analogous because they are in the same field of image classification. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wang’s image classifier to incorporate wherein the first locations of interest are considered by the user during one or more of an initial time interval when determining the annotation for the annotated data or a final time interval when determining the annotation for the annotated data, and further wherein the second locations of interest are considered by the user during a middle time interval when determining the annotation for the annotated data as taught by Kurzhals [Page 1007, Fig. 2] to reduce the number of displayed time steps on demand, in order to obtain a better overview [Page 1006, Section 3.1]).

    PNG
    media_image3.png
    743
    578
    media_image3.png
    Greyscale


Regarding claim 15, Wang teaches A computer program product comprising a computer readable medium, the computer readable medium having computer readable code embodied therein, the computer readable code being configured such that, on execution by a suitable computer or processor, the computer or processor is caused to: acquire training data, the training data comprising: data, an annotation for the data as determined by a user and auxiliary data, the auxiliary data describing at least one location of interest in the data, as considered by the user when determining the annotation for the data; ([Page 64, Para 04] UPMC-G20 content. UPMC-G20 is a food-related gaze annotated dataset based on a multi-modal large scale food dataset UPMC-food 101 [40] . We select 20 food categories from UPMC- food 101, resulting in 2,000 images. The images selected do not contain text, because it’s verified that texts attract attention most [59]. For each image, about 15 fixations across 3 subjects (in average) with a total duration of 2.5s are collected. In total, we have collected 31104 fixations. The examiner notes that Wang teaches a training dataset (data) that is annotated based on a multi-modal large scale food dataset UPMC-food 101 (annotation for the data as determined by a user) and for each image, about 15 fixations across 3 subjects (in average) with a total duration of 2.5s are collected (auxiliary data, the auxiliary data describing at least one location of interest in the data, as considered by the user when determining the annotation for the data)).
train the model using the training data, the training comprises: minimising an auxiliary loss function that compares the at least one location of interest to an output of one or more layers of the model ([Page 61, Para 06] This model generalizes latent SVM [15] by biasing the selection of latent regions based on the gaze information during the training scheme. The training objective of G + LSVM is as follows:

    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale
where zi is the region with the maximum total duration of fixations, 
interpreted as the relevant region selected by our model. For each training example, Eq. (3) includes a classification hinge loss and a gaze loss δg , with a scalar trade-off parameter γ≥0. The examiner notes that Wang teaches a gaze loss function (δg) that compares a region based on gaze info to the region interpreted to be that region by the model).
	minimise a main loss function that compares the annotation for the data as determined by the user to an annotation produced by the model. ([Page 61, Para 06] This model generalizes latent SVM [15] by biasing the selection of latent regions based on the gaze information during the training scheme. The training objective of G + LSVM is as follows:

    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale
where zi is the region with the maximum total duration of fixations, 
interpreted as the relevant region selected by our model. For each training example, Eq. (3) includes a classification hinge loss and a gaze loss δg , with a scalar trade-off parameter γ≥0. The examiner notes that Wang teaches a hinge loss function used to generalize a model that selects relevant regions as predicted by the model).

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Wang (Gaze latent support vector machine for image classification improved by weakly supervised region selection), in view of Kurzhals (Gaze Stripes Image-Based Visualization of Eye Tracking Data), further in view of Li (Medical image classification with convolutional neural network).

Regarding claim 9, Wang teaches The system as in claim 7. However, Wang fails to explicitly teach wherein causing the processor to minimise the auxiliary loss function comprises causing the processor to compare the image data to an output of one or more convolutional layers of the model.
On the other hand, Li teaches wherein causing the processor to minimise the auxiliary loss function comprises causing the processor to compare the image data to an output of one or more convolutional layers of the model ([Page 845, Para 5] Convolutional neuron layers are the key component of CNN. In image classification tasks, one or more 2D matrices (or channels) are treated as the input to the convolutional layer and multiple 2D matrices are generated as the output. The number of input and output matrices may be different. The process to compute a single output matrix is defined as:

    PNG
    media_image8.png
    93
    495
    media_image8.png
    Greyscale

Firstly each input matrix Ii is convoluted with a corresponding kernel matrix Kij . Then the sum of all convoluted matrices is computed and a bias value Bj is added to each element of the resulting matrix. Finally a non-linear activation function f is applied to each element of the previous matrix to produce one output matrix Aj. Each set of kernel matrices represents a local feature extractor that extracts regional features from the input matrices. The aim of the learning procedure is to find sets of kernel matrices K that extract good discriminative features to be used for image classification. The back propagation algorithm that optimizes neural network connection weights can be applied
here to train the kernel matrices and biases as shared neuron connection weights. The examiner notes that Li teaches a convolutional neural network as an image classifier. The examiner also notes that Wang and Li are both considered analogous because they are in the same field of computational neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wang’s training models to incorporate wherein causing the processor to minimise the auxiliary loss function comprises causing the processor to compare the image data to an output of one or more convolutional layers of the model as taught by Li [Page 845, Para. 5] in order to automatically and efficiently learn the intrinsic image features [Page 844, Para 1]).

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Wang (Gaze latent support vector machine for image classification improved by weakly supervised region selection), in view of Kurzhals (Gaze Stripes Image-Based Visualization of Eye Tracking Data), further in view of University of Freiburg (U-Net Convolutional Networks for Biomedical Image Segmentation).

Regarding claim 13, Wang teaches The system as in claim 1. However, Wang fails to explicitly teach wherein the model comprises a modified U-Net architecture.
On the other hand, University of Freiburg teaches wherein the model comprises a modified U-Net architecture ([Page 1, Para. 1] The u-net is convolutional network architecture for fast and precise segmentation of images. The examiner notes that according to the claim’s specification [Page 9, Line 5] the model claimed could be a convolutional model. The examiner also notes that Wang and University of Freiburg are both considered analogous because they are in the same field of computational neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wang’s training models to incorporate wherein the model comprises a modified U-Net architecture as taught by University of Freiburg [Page 1, Para. 1] in order to achieve fast and precise segmentation of images [Page 1, Para 1]).

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Li - Learning to Predict Gaze in Egocentric Video – 2013
“Li teaches a model that uses behavior to predict gaze in video frames ”
Rodriguez-Serrano - US20170083792A1
“Rodriguez-Serrano teaches a model that identifies objects  in an image”
Chang - US20120089552A1
“Chang teaches a system that recognizes a wide range of targets at a high speed”

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAMCY ALGHAZZY whose telephone number is (571)272-8824. The examiner can normally be reached Monday-Friday 7:30am-4:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on (571) 272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SHAMCY ALGHAZZY/           Examiner, Art Unit 2128 

/OMAR F FERNANDEZ RIVAS/           Supervisory Patent Examiner, Art Unit 2128