Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are pending in this application.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/15/2017 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, all the copies of the information disclosure statement are being considered by the examiner.
Drawings
The drawings were received on 11/15/2017. These drawings are acceptable for examination purposes.
Specification
The disclosure is objected to because of the following informalities:
Paragraph [0004]: “in the layout the content” should be “in the layout of the content”. Appropriate correction is required.
Paragraph [0005]: “discernable” is spelled incorrectly. The correct spelling is discernible.
Paragraph [0005]: “for at multiple zoom levels” should be “for multiple zoom levels”
Appropriate correction is required.
Paragraph [0019]: “The content saliency neural network neural” should be “ The content saliency neural network”
Appropriate correction is required.
#616 in Figure 6 is missing from specification.
Appropriate correction is required.
Claim Interpretation
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification.
For the following terms/phrases found in the claims such as:
“content item” can be a document, email, webpage, poster, pamphlet and contain one text based element, according to Paragraph [0004]
“saliency” is the state or quality by which an element stands out relative to its neighbors, according to Paragraph [0005]
“saliency map” represents saliency of content elements, according to Paragraph [0006]
“element” can be a title, heading, image, paragraph, text, text box, button, link, etc… according to Paragraph [0020]
“DOM” stands for Document Object Model which is a data structure for content items according to Paragraph [0020]
“saliency score” is a measure of how much the element stands out relative to its neighbors. The higher this score is the more likely it will attract the viewer’s attention, according to Paragraph [0021]
“eye gaze data” refers to data collected based on locations viewers gaze at, according to paragraph [0021]
“simple feature” is an attribute of an element such as width, height, color area, etc… according to Paragraph [0023]
“pixel-level feature vectors” refers to saliency matrix of an image according to Paragraph [0023]
“zoom level” refers to low, intermediate and high zoom levels according to Paragraph [0023]
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 5, 6, 8, 9, 10, 12, 16, 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Pilu (GB 2415562 A), hereinafter “Pilu”, in view of Xu et al. (“Spatio-Temporal Modeling and Prediction of Visual Attention in Graphical User Interfaces”), hereinafter “Xu”, and further in view of Fred Stentiford (“Attention Based Auto Image Cropping”), hereinafter “Stentiford”.
Regarding Claim 1, Pilu teaches a computer-implemented method for training a 
content saliency neural network, the method comprising: (Pilu teaches in abstract: “method of processing image data is claimed wherein the image data relates to at least one image, the method comprising using data relating to disposition and saliency of selected portions of the or each image and generating at least one saliency vector for respective said portions.”)
computing a vector of simple features for the element, the simple features being computed from attributes of the element (Pilu teaches on page 22 lines 19-23: “A dimension of a zoomed-in window corresponds to the width X shown on the figure. The lines 905,907 therefore represent the path followed by a side of a zoomed-in window of height x which follows the path 901.” Pilu teaches in abstract: “…the method comprising using data relating to disposition and saliency of selected portions of the or each image and generating at least one saliency vector for respective said portions.”)
and the vector of simple features; (Pilu teaches on page 22 lines 19-23: “A dimension of a zoomed-in window corresponds to the width X shown on the figure. The lines 905,907 therefore represent the path followed by a side of a zoomed-in window of height x which follows the path 901.”)
Pilu does not teach obtaining, using at least one processor, eye gaze data for each of a plurality of email messages, wherein the eye gaze data for each email message in the plurality of email messages includes data from a plurality of viewers of the email message and wherein the eye gaze data includes, for each email-viewer pair, a set of coordinates that correspond to a location in the email message; for each email message in the plurality of email messages, the email message including a respective set of elements: computing a first pixel-level vector for the email message the first pixel-level vector being a saliency matrix for an image that represents the email message; and for each element in the respective set of elements: determining a saliency score for the element by determining a proportion of coordinates from the set of coordinates for the email message that correspond to the element computing a second pixel-level vector for the element, the second pixel- level vector being a saliency matrix for an image that represents the element, computing a third pixel-level vector for an intermediate context of the element, the third pixel-level vector being a saliency matrix for an image that represents the intermediate context of the element, and training, by the at least one processor, the content saliency neural network to predict the saliency score for the element, given the first pixel- level vector, the second pixel-level vector, the third pixel-level vector, and providing the content saliency neural network for use in generating an element-level saliency map for a later-drafted email message.
Xu teaches obtaining, using at least one processor, eye gaze data for each of a plurality of email messages, (Xu teaches on page 4 under [Participants and Apparatus]: “Gaze data was recorded using a Tobii TX300 stationary eye tracker running at 300 Hz and providing an accuracy of 0:5.)
wherein the eye gaze data for each email message in the plurality of email messages includes data from a plurality of viewers of the email message (Xu teaches under [Participants and Apparatus]: “We recruited 18 participants (6 females and 12 males, aged between 20 and 30 years) through mailing lists.” Under [Procedure] Xu teaches: “Before each task we provided participants with a general and vague hint as to what they could write about (e.g. “Please write a blog entry about your hometown”, “Please write an email to invite your friend for dinner”) to reduce the time and effort required to contemplate the content.”)
and wherein the eye gaze data includes, for each email-viewer pair, a set of coordinates that correspond to a location in the email message; (Xu teaches under [Abstract]: “For example, several works demonstrated that mouse position can be used as a proxy for gaze location [1, 2, 21, 24, 37] and mouse click positions can be even used to calibrate an eye tracker [49].” The locations can be a set of coordinates.)
for each email message in the plurality of email messages, the email message including a respective set of elements: (Xu teaches in Figure 2 on page 3: “Components are grouped into eight categories: title/short description (G1), main content (G2), text formatting (G3), meta information/setting (G4), finish button (G5), profile icon (G6), image (G7), and new window (G8)”. See also Figure 7 on page 5.)
computing a first pixel-level vector for the email message (Xu teaches under [Spatio-Temporal Modeling of Visual Attention]: “For each pixel location of the target UI space, we compute a feature vector m based on the feature channels extracted from the raw data.” Feature vectors are saliency matrices.)
and for each element in the respective set of elements: determining a saliency score for the element by determining a proportion of coordinates from the set of coordinates for the email message that correspond to the element (See Figure 9 on page 9. Xu teaches on page 8 Normalized Scan-Path Saliency score (NSS). Xu teaches “this measure is calculated as the mean value of the normalized attention map s at n fixation locations…” See tables 2 & 3 on page 8)
computing a second pixel-level vector for the element (Xu teaches under [Spatio-Temporal Modeling of Visual Attention]: “For each pixel location of the target UI space, we compute a feature vector m based on the feature channels extracted from the raw data.” Feature vectors are saliency matrices.)
computing a third pixel-level vector for an intermediate context of the element (Xu teaches under [Spatio-Temporal Modeling of Visual Attention]: “For each pixel location of the target UI space, we compute a feature vector m based on the feature channels extracted from the raw data.” Feature vectors are saliency matrices.)
and training, by the at least one processor, the content saliency neural network to predict the saliency score for the element (Xu teaches under [Abstract]: “We present a computational model to predict users’ spatiotemporal visual attention on WIMP-style (windows, icons, menus, pointer) graphical user interfaces.)
and providing the content saliency neural network for use in generating an element-level saliency map for a later-drafted email message. (Xu teaches on page 4 under [Procedure]: “Before each task we provided participants with a general and vague hint as to what they could write about (e.g. “Please write a blog entry about your hometown”, “Please write an email to invite your friend for dinner”) to reduce the time and effort required to contemplate the content.” This suggests an email being drafted by participants. Xu teaches under [Spatio-Temporal Modeling of Visual Attention]: “Our model takes information about the interface as well as users’ mouse and keyboard actions as input, computes individual feature channels from the raw data recorded over time, and predicts joint spatiotemporal attention maps, which indicates the likelihood of users’ attention focusing on each location over time.)
Xu does not teach the first pixel-level vector being a saliency matrix for an image that represents the email message; the second pixel- level vector being a saliency matrix for an image that represents the element, the third pixel-level vector being a saliency matrix for an image that represents the intermediate context of the element, given the first pixel- level vector, the second pixel-level vector, the third pixel-level vector,
Stentiford teaches the first pixel-level vector being a saliency matrix for an image that represents the email message; (Stentiford teaches a system that provides average attention scores for an image based on different zoom levels in Fig. 6 on page 7. This attention score is comparable to saliency score.)
the second pixel- level vector being a saliency matrix for an image that represents the element, (Stentiford teaches a system that provides average attention scores for an image based on different zoom levels in Fig. 6 on page 7. This attention score is comparable to saliency score.)
the third pixel-level vector being a saliency matrix for an image that represents the intermediate context of the element, (Stentiford teaches a system that provides average attention scores for an image based on different zoom levels in Fig. 6 on page 7. This attention score is comparable to saliency score.)
given the first pixel- level vector, the second pixel-level vector, the third pixel-level vector, (Stentiford teaches a system that provides average attention scores for an image based on different zoom levels in Fig. 6 on page 7. This attention score is comparable to saliency score.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Pilu’s teachings with those of Xu and Stentiford for what they teach as analogous art relating to content saliency. Pilu teaches a system and method of processing images to produce saliency map based on salient regions of the image. Xu teaches a saliency model that uses device inputs such as mouse, keyboard and eye tracker system to study salient regions of user interface applications like Gmail and social media apps to produce attention maps based on the data collected. Stentiford teaches method of automatically cropping visual material with different zoom levels based upon a new measure of visual attention that reflects the informativeness of the image. One would have been motivated to combine these methods in order to produce a system that uses email content at different zoom levels and eye tracking data to train a neural network to predict and produce saliency maps based on saliency scores for future drafted emails.
Regarding Claim 2, the rejection of Claim 1 is incorporated.
Xu teaches the method of Claim 1, wherein the eye gaze data is obtained from a viewer of the plurality of viewers as a crowdsource task. (Xu teaches under [Participants and Apparatus]: “We recruited 18 participants (6 females and 12 males, aged between 20 and 30 years) through mailing lists.” Under [Procedure] Xu teaches: “Before each task we provided participants with a general and vague hint as to what they could write about (e.g. “Please write a blog entry about your hometown”, “Please write an email to invite your friend for dinner”) to reduce the time and effort required to contemplate the content.”)
At the time of filing, it would have been obvious to a person of ordinary skill to modify the method of Claim 1 to include data gathering from a plurality of people because more data will provide for a more accurate (reliable) saliency map than a map based on data from one person only. The task in Xu is ‘crowdsourced’ because the same task is assigned to a plurality of users.
Regarding Claim 5, the rejection of Claim 1 is incorporated.
Xu teaches the method of Claim 1, wherein computing the first pixel-level vector for the email includes: (Xu teaches under [Spatio-Temporal Modeling of Visual Attention]: “For each pixel location of the target UI space, we compute a feature vector m based on the feature channels extracted from the raw data.” Feature vectors are saliency matrices.)
providing the email to a natural-image saliency neural network; (Xu teaches on page 8 under [Static Attention Prediction]: “Figure 9 shows sample attention maps for static attention prediction using our model, individual user inputs, and established bottom-up attention models for the email writing task.”)
and receiving the first pixel-level vector from the natural-image saliency neural network. (Xu teaches on page 6 under [Spatio-Temporal Modeling of Visual Attention]: “For each pixel location of the target UI space, we compute a feature vector m based on the feature channels extracted from the raw data.”)
At the time of filing, it would have been obvious to a person of ordinary skill to modify the method of Claim 1 to compute the feature vectors for the email to “use the sampled pixels for training the model” (See page 6 under [Features]).
Regarding Claim 6, the rejection of Claim 1 is incorporated.
Xu teaches the method of Claim 1, wherein the element has a bounding box and the intermediate context represents an area of the email message determined by a halfway point between each edge of the bounding box and the edges of the image representing the email message. (Xu teaches on page 6 under [Features]: “For the bounding boxes of UI groups, we create a binary map for each of them based on the location of the bounding box (1 for pixels inside the box, 0 for others).”)
At the time of filing, it would have been obvious to a person of ordinary skill to modify the method of Claim 1 to include bounding boxes to “train a generalized model for tasks with a different set of UI groups” (See page 7 under [Static Attention Prediction]).
Regarding Claim 8, Pilu teaches for each element in the set of elements: computing a vector of simple features for the element, the simple features being computed from attributes of the element (Pilu teaches on page 22 lines 19-23: “A dimension of a zoomed-in window corresponds to the width X shown on the figure. The lines 905,907 therefore represent the path followed by a side of a zoomed-in window of height x which follows the path 901.” Pilu teaches in abstract: “…the method comprising using data relating to disposition and saliency of selected portions of the or each image and generating at least one saliency vector for respective said portions.”)
and the vector of simple features to the neural network (Pilu teaches on page 22 lines 19-23: “A dimension of a zoomed-in window corresponds to the width X shown on the figure. The lines 905,907 therefore represent the path followed by a side of a zoomed-in window of height x which follows the path 901.”)
Pilu does not teach a computer system comprising: at least one processor; memory storing a neural network trained to predict, for a given element of an email message, a saliency score for the element; and memory storing instructions that, when executed by the at least one processor, causes the computer system to perform operations including: determining, using the at least one processor, a set of elements in a draft email message provided by a requestor, computing a first pixel-level vector for the email message, the first pixel-level vector, computing a second pixel-level vector for the element, computing a third pixel-level vector for an intermediate context of the element, being a saliency matrix for an image that represents the email message, the second pixel- level vector being a saliency matrix for an image that represents the element, the third pixel-level vector being a saliency matrix for an image that represents the intermediate context of the element, and providing the first pixel-level vector, the second pixel-level vector, the third pixel-level vector, the neural network providing a saliency score for the element, and generating an element-level saliency map of the email message using the respective saliency scores for the set of elements, and providing the element-level saliency map to the requestor.
Xu teaches a computer system comprising: at least one processor; memory storing a neural network trained to predict, for a given element of an email message, a saliency score for the element; and memory storing instructions that, when executed by the at least one processor, causes the computer system to perform operations including: (Xu teaches under [Abstract]: “We present a computational model to predict users’ spatiotemporal visual attention on WIMP-style (windows, icons, menus, pointer) graphical user interfaces.)
determining, using the at least one processor, a set of elements in a draft email message provided by a requestor, computing a first pixel-level vector for the email message, (Xu teaches on page 4 under [Procedure]: “Before each task we provided participants with a general and vague hint as to what they could write about (e.g. “Please write a blog entry about your hometown”, “Please write an email to invite your friend for dinner”) to reduce the time and effort required to contemplate the content.” This suggests an email being drafted by participants. Xu teaches on page 6 under [Spatio-Temporal Modeling of Visual Attention]: “For each pixel location of the target UI space, we compute a feature vector m based on the feature channels extracted from the raw data.”)
the first pixel-level vector (Xu teaches under [Spatio-Temporal Modeling of Visual Attention]: “For each pixel location of the target UI space, we compute a feature vector m based on the feature channels extracted from the raw data.” Feature vectors are saliency matrices.)
computing a second pixel-level vector for the element (Xu teaches under [Spatio-Temporal Modeling of Visual Attention]: “For each pixel location of the target UI space, we compute a feature vector m based on the feature channels extracted from the raw data.” Feature vectors are saliency matrices.)
computing a third pixel-level vector for an intermediate context of the element, (Xu teaches under [Spatio-Temporal Modeling of Visual Attention]: “For each pixel location of the target UI space, we compute a feature vector m based on the feature channels extracted from the raw data.” Feature vectors are saliency matrices.)
the neural network providing a saliency score for the element, and generating an element-level saliency map of the email message using the respective saliency scores for the set of elements, (Xu teaches on page 3 under [Real-World UI Samples]: “To cover the most frequent uses of text editing in everyday life, we first sampled 8 web application examples from the most popular websites: writing an email (Gmail)” Xu teaches on page 7 under [Offline Model]: “Therefore the attention prediction at each time t is directly computed by M(t).” Xu teaches under [Dynamic Attention Prediction] on page 7: “Both scores directly use fixation positions (instead of attention maps, as ground truth) to evaluate the accuracy of proposed models.” Xu teaches under [Computational Modeling of Visual Attention] on page 2: “To incorporate temporal information, more recent models fuse static and dynamic attention maps…” Tables 2 and 3 show the prediction scores these attention maps are based on.)
and providing the element-level saliency map to the requestor. (Xu teaches on page 4 under [Procedure]: “Before each task we provided participants with a general and vague hint as to what they could write about (e.g. “Please write a blog entry about your hometown”, “Please write an email to invite your friend for dinner”) to reduce the time and effort required to contemplate the content.” This suggests an email being drafted by participants. Xu teaches under [Abstract]: “We then show that our model predicts attention maps more accurately than state-of-the- art methods.”)
Xu does not teach being a saliency matrix for an image that represents the email message, the second pixel- level vector being a saliency matrix for an image that represents the element, the third pixel-level vector being a saliency matrix for an image that represents the intermediate context of the element, and providing the first pixel-level vector, the second pixel-level vector, the third pixel-level vector.
Stentiford teaches being a saliency matrix for an image that represents the email message (Stentiford teaches a system that provides average attention scores for an image based on different zoom levels in Fig. 6 on page 7. This attention score is comparable to saliency score.)
the second pixel- level vector being a saliency matrix for an image that represents the element, (Stentiford teaches a system that provides average attention scores for an image based on different zoom levels in Fig. 6 on page 7. This attention score is comparable to saliency score.)
the third pixel-level vector being a saliency matrix for an image that represents the intermediate context of the element, (Stentiford teaches a system that provides average attention scores for an image based on different zoom levels in Fig. 6 on page 7. This attention score is comparable to saliency score.)
and providing the first pixel-level vector, the second pixel-level vector, the third pixel-level vector, (Stentiford teaches a system that provides average attention scores for an image based on different zoom levels in Fig. 6 on page 7.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Pilu’s teachings with those of Xu and Stentiford since they all teach models based on saliency. Pilu teaches a system and method of processing images to produce saliency map based on salient regions of the image. Xu teaches a saliency model that uses device inputs such as mouse, keyboard and eye tracker system to study salient regions of user interface applications like Gmail and social media apps to produce attention maps based on the data collected. Stentiford teaches method of automatically cropping visual material with different zoom levels based upon a new measure of visual attention that reflects the informativeness of the image. One would have been motivated to combine these methods in order to produce a system that uses email content at different zoom levels and eye tracking data to train a neural network to predict and produce saliency maps based on saliency scores for future drafted emails.
	Regarding Claim 9, the rejection of Claim 8 is incorporated.
	Xu teaches the system as in Claim 8, wherein computing the first pixel-level vector for the email message includes: (Xu teaches under [Spatio-Temporal Modeling of Visual Attention]: “For each pixel location of the target UI space, we compute a feature vector m based on the feature channels extracted from the raw data.” Feature vectors are saliency matrices.)
providing the email message to a pixel-level saliency neural network; (Xu teaches on page 8 under [Static Attention Prediction]: “Figure 9 shows sample attention maps for static attention prediction using our model, individual user inputs, and established bottom-up attention models for the email writing task.”)
and receiving the first pixel-level vector from the pixel-level saliency neural network. (Xu teaches on page 6 under [Spatio-Temporal Modeling of Visual Attention]: “For each pixel location of the target UI space, we compute a feature vector m based on the feature channels extracted from the raw data.”)
At the time of filing, it would have been obvious to a person of ordinary skill to modify the system of Claim 8 to compute the feature vectors for the email to “use the sampled pixels for training the model” (See page 6 under [Features]).
Regarding Claim 10, the rejection of Claim 8 is incorporated.
Xu teaches the system as in Claim 8, wherein computing the third pixel-level vector for the intermediate context includes: (Xu teaches under [Spatio-Temporal Modeling of Visual Attention]: “For each pixel location of the target UI space, we compute a feature vector m based on the feature channels extracted from the raw data.” Feature vectors are saliency matrices.)
providing the intermediate context to a natural-image saliency neural network; (Xu teaches on page 8 under [Static Attention Prediction]: “Figure 9 shows sample attention maps for static attention prediction using our model, individual user inputs, and established bottom-up attention models for the email writing task.”)
and receiving the third pixel-level vector from the natural-image saliency neural network. (Xu teaches under [Spatio-Temporal Modeling of Visual Attention]: “For each pixel location of the target UI space, we compute a feature vector m based on the feature channels extracted from the raw data.” Feature vectors are saliency matrices.)
At the time of filing, it would have been obvious to a person of ordinary skill to modify the system of Claim 8 to compute the feature vectors for the email to “use the sampled pixels for training the model” (See page 6 under [Features]).
Regarding Claim 12, the rejection of Claim 8 is incorporated.
Pilu teaches the system of Claim 8, wherein the simple features include a height of the element and a width of the element. (Pilu teaches on page 22 lines 19-23: “A dimension of a zoomed-in window corresponds to the width X shown on the figure. The lines 905,907 therefore represent the path followed by a side of a zoomed-in window of height x which follows the path 901.”)
At the time of filing, it would have been obvious to a person of ordinary skill to modify the system of Claim 8 to include specified size like height and width to account for “zoomed-in window of a specific size” corresponding “to the area of the image” (See page 22 lines 10-13).
Regarding Claim 16, Pilu teaches a computer program product embodied on a non-transitory computer-readable storage medium comprising a content saliency neural network and instructions that, when executed by a computing device, are configured to cause the computing device to: (Pilu teaches in abstract: “method of processing image data is claimed wherein the image data relates to at least one image, the method comprising using data relating to disposition and saliency of selected portions of the or each image and generating at least one saliency vector for respective said portions.”)
for each element of the plurality of elements: generate a vector of simple features from attributes of the element, (Pilu teaches on page 22 lines 19-23: “A dimension of a zoomed-in window corresponds to the width X shown on the figure. The lines 905,907 therefore represent the path followed by a side of a zoomed-in window of height x which follows the path 901.” Pilu teaches in abstract: “…the method comprising using data relating to disposition and saliency of selected portions of the or each image and generating at least one saliency vector for respective said portions.”)
obtain, from the content saliency neural network, a respective saliency score for the element, (Pilu teaches on page 15 lines 19-22: “Also, MSR-TR-2003-49 in particular requires that interesting regions of an image and their extent are marked. Saliency is only used to score regions of interest.” Regions of interest cannot be determined unless the image is provided to the network.)
the saliency score being based on the vector of simple features (Pilu teaches on page 22 lines 19-23: “A dimension of a zoomed-in window corresponds to the width X shown on the figure. The lines 905,907 therefore represent the path followed by a side of a zoomed-in window of height x which follows the path 901.”)
and provide the element-level saliency map to the requestor. (Pilu teaches on page 33 Claim 31 that the device comprises “a user operated interface”. So it is assumed the saliency map is provided to a user or requestor.)
Pilu does not teach receive a draft email message from a requestor, the email message including a plurality of elements; generate a pixel-level vector for each of at least three different zoom levels for the element by providing an image of each zoom level to a neural network trained to provide a pixel-level saliency score given an image, and the pixel-level vector for each of the at least three different zoom levels; generate an element-level saliency map for the email message based on the respective saliency scores;
Xu teaches receive a draft email message from a requestor, the email message including a plurality of elements; (Xu teaches on page 4 under [Procedure]: “Before each task we provided participants with a general and vague hint as to what they could write about (e.g. “Please write a blog entry about your hometown”, “Please write an email to invite your friend for dinner”) to reduce the time and effort required to contemplate the content.” This suggests an email being drafted by participants.)
generate an element-level saliency map for the email message based on the respective saliency scores; (Xu teaches on page 3 under [Real-World UI Samples]: “To cover the most frequent uses of text editing in everyday life, we first sampled 8 web application examples from the most popular websites: writing an email (Gmail)” Xu teaches on page 7 under [Offline Model]: “Therefore the attention prediction at each time t is directly computed by M(t).” Xu teaches under [Dynamic Attention Prediction] on page 7: “Both scores directly use fixation positions (instead of attention maps, as ground truth) to evaluate the accuracy of proposed models.” Xu teaches under [Computational Modeling of Visual Attention] on page 2: “To incorporate temporal information, more recent models fuse static and dynamic attention maps…” Tables 2 and 3 show the prediction scores these attention maps are based on.)
Xu does not teach generate a pixel-level vector for each of at least three different zoom levels for the element by providing an image of each zoom level to a neural network trained to provide a pixel-level saliency score given an image, and the pixel-level vector for each of the at least three different zoom levels;
Stentiford teaches generate a pixel-level vector for each of at least three different zoom levels for the element by providing an image of each zoom level to a neural network trained to provide a pixel-level saliency score given an image, (Stentiford teaches a system that provides average attention scores for an image based on different zoom levels in Fig. 6 on page 7. This attention score is comparable to saliency score. A person skilled in the art would know that images and text can be found in emails, therefore the same method can be applied to emails.)
and the pixel-level vector for each of the at least three different zoom levels; (Stentiford teaches a system that provides average attention scores for an image based on different zoom levels in Fig. 6 on page 7. This attention score is comparable to saliency score. A person skilled in the art would know that images and text can be found in emails, therefore the same method can be applied to emails.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Pilu’s teachings with those of Xu and Stentiford since they all teach models based on saliency. Pilu teaches a system and method of processing images to produce saliency map based on salient regions of the image. Xu teaches a saliency model that uses device inputs such as mouse, keyboard and eye tracker system to study salient regions of user interface applications like Gmail and social media apps to produce attention maps based on the data collected. Stentiford teaches method of automatically cropping visual material with different zoom levels based upon a new measure of visual attention that reflects the informativeness of the image. One would have been motivated to combine these methods in order to produce a system that uses email content at different zoom levels and eye tracking data to train a neural network to predict and produce saliency maps based on saliency scores for future drafted emails.
Regarding Claim 18, the rejection of Claim 16 is incorporated.
Xu teaches the computer program product of Claim 16, wherein the content saliency neural network was trained on historical email content. (Xu teaches in [Abstract]: “We present a computational model to predict users’ spatiotemporal visual attention on WIMP-style graphical user interfaces.” Xu teaches on page 3 under [Real-World UI Samples]: “To cover the most frequent uses of text editing in everyday life, we first sampled 8 web application examples from the most popular websites: writing an email (Gmail)” Xu teaches on page 4 under [Data Collection]: “We used our method to synthesize 30 different – yet functionally equivalent – user interface layouts derived from real world interfaces, such as Gmail, Facebook, and GitHub (see Figure 2)… This approach allowed us to collect large amounts of data that covers realistic daily-life interaction scenarios.”)
At the time of filing, it would have been obvious to a person of ordinary skill to modify the product of Claim 16 to include past real world data to “maximize the generality of the model” (See page 3 under [Modeling Interaction Patterns].)
Regarding Claim 19, the rejection Claim 16 is incorporated.
Stentiford teaches the computer program product of Claim 16, wherein the three zoom levels include dimensions of the element and dimensions of an intermediate zoom level that is inversely proportional to the dimensions of the element. (Stentiford teaches under [3.3 Zoom factor] on page 4: “The informativeness of an image of a distant object should increase as the object comes closer and then decrease again when perhaps a featureless surface of the object occupies a large proportion of the image. A series of 320x240 photos in Fig. 5 were taken of a red rectangle at various focal distances.”)
Stentiford does not teach dimensions of the email message.
Xu teaches dimensions of the email message (Xu teaches under [Layout Synthesis] on page 4: “We first identify key UI components for each application example and match the components across different applications; for example, a “send” button for sending an email is matched with a “publish” button for publishing a blog…For UI appearance we used a consistent design for the size, texture and icon background to remove the bias caused by the specific samples that we chose.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Pilu’s teachings with those of Xu and Stentiford since they all teach models based on saliency. Pilu teaches a system and method of processing images to produce saliency map based on salient regions of the image. Xu teaches a saliency model that uses device inputs such as mouse, keyboard and eye tracker system to study salient regions of user interface applications like Gmail and social media apps to produce attention maps based on the data collected. Stentiford teaches method of automatically cropping visual material with different zoom levels based upon a new measure of visual attention that reflects the informativeness of the image. One would have been motivated to combine these methods in order to produce a system that uses email content at different zoom levels and eye tracking data to train a neural network to predict and produce saliency maps based on saliency scores for future drafted emails.
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Pilu, Xu, Stentiford and further in view of Judd et al. (“Learning to Predict Where Humans Look”), hereinafter “Judd”.
Regarding Claim 3, the rejection of Claim 1 is incorporated.
Pilu, Xu and Stentiford teach all of the elements of the current invention as stated
above except they do not teach the method of Claim 1, wherein the eye gaze data obtained from a viewer represents a predetermined number of seconds for which the viewer viewed the email message from the plurality of email messages.
Judd teaches the method of Claim 1, wherein the eye gaze data obtained from a viewer represents a predetermined number of seconds for which the viewer viewed the email message from the plurality of email messages. (Judd teaches under [2.1. Data gathering protocol] on page 2: “An eye tracker recorded their gaze path on a separate computer as they viewed each image at full resolution for 3 seconds separated by 1 second of viewing a gray screen.” An email message is simply another content item containing images and/or text. A person skilled in the art would know that elements such as images and text can be found in emails, therefore the same method can be applied to emails.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Claim 1 to incorporate the teachings of Judd. Judd teaches a saliency model based on eye tracking data of “15 viewers on 1003 images” (See [Abstract]). One would have been motivated to combine these methods in a system that uses emails and eye tracking data to train a neural network to predict and produce saliency maps based on saliency scores for future drafted emails.
Claims 4, 7, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Pilu, Xu, Stentiford and further in view of Cerf et al. (“Using semantic content as cues for better scanpath prediction”), hereinafter “Cerf”.
Regarding Claim 4, the rejection of Claim 1 is incorporated.
Pilu, Xu and Stentiford teach all of the elements of the current invention as stated
above except they do not teach the method of Claim 1, wherein the set of coordinates is obtained from a set of video frames recorded using a front-facing camera.
	Cerf teaches the method of Claim 1, wherein the set of coordinates is obtained from a set of video frames recorded using a front-facing camera. (Cerf teaches on page 2: “Eye-position data was acquired at 1000 Hz using an Eyelink1000 (SR Research, Osgoode, Canada) eye-tracking device. The images were presented on a CRT screen (120 Hz), using Matlab’s psychophysics and eyelink toolbox extensions…Data was acquired from the right eye alone.” The camera would need to face user and use video frames to track the user’s eye.)
At the time of filing, it would have been obvious to a person of ordinary skill to modify the method of Claim 1 to incorporate the teachings of Cerf in order to “improve the predictive performance” of the saliency algorithm. (See [3.2 Assessing the modified saliency model]). One would have been motivated to combine these methods in a system that uses emails and eye tracking data to train a neural network to predict and produce saliency maps based on saliency scores for future drafted emails.
Regarding Claim 7, the rejection of Claim 1 is incorporated.
Pilu, Xu and Stentiford teach all of the elements of the current invention as stated
above except they do not teach the method of Claim 1, wherein the set of coordinates is obtained from a set of video frames recorded using a front-facing camera.
Cerf teaches the method of Claim 1, wherein the attributes of the element are selected from a group including color distribution, size of the element, and position of the element. (Cerf teaches under [2.1 Experimental procedures] on page 1: “The modified images were to look as if they had not been manipulated at all. The size, font, color, orientation and shape remained the same, only that the text in the scene was scrambled, such that it had no meaning.” Cerf teaches on page 2 under [2.1 Experimental procedures]: “Face images included faces in various skin colors, age groups, and positions.”)
At the time of filing, it would have been obvious to a person of ordinary skill to modify the method of Claim 1 to incorporate the teachings of Cerf to “improve the predictive performance” of the saliency algorithm, by studying “the effects of semantic information on fixation allocation”. (See [3.2 Assessing the modified saliency model]). One would have been motivated to combine these methods in a system that uses emails and eye tracking data to train a neural network to predict and produce saliency maps based on saliency scores for future drafted emails.
Regarding Claim 14, the rejection of Claim 8 is incorporated.
Pilu, Xu and Stentiford teach all of the elements of the current invention as stated
above except they do not teach the system of Claim 8, wherein the simple features include a position of the element within the email message.
	Cerf teaches the system of Claim 8, wherein the simple features include a position of the element within the email message. (Cerf teaches under [2.1 Experimental procedures] on page 1: “The modified images were to look as if they had not been manipulated at all. The size, font, color, orientation and shape remained the same, only that the text in the scene was scrambled, such that it had no meaning.” Cerf teaches on page 2 under [2.1 Experimental procedures]: “Face images included faces in various skin colors, age groups, and positions.”)
At the time of filing, it would have been obvious to a person of ordinary skill to modify the system of Claim 8 to incorporate the teachings of Cerf to “improve the predictive performance” of the saliency algorithm, by studying “the effects of semantic information on fixation allocation”. (See [3.2 Assessing the modified saliency model]). One would have been motivated to combine these methods in a system that uses emails and eye tracking data to train a neural network to predict and produce saliency maps based on saliency scores for future drafted emails.
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Pilu, Xu, Stentiford and further in view of Nguyen et al. (“Static Saliency vs. Dynamic Saliency: A Comparative Study”), hereinafter “Nguyen”.
Regarding Claim 11, the rejection of Claim 8 is incorporated.
Stentiford teaches The system of Claim 8, wherein providing the first pixel-level vector, the second pixel-level vector, and the third pixel-level vector to the neural network (Stentiford teaches a system that provides average attention scores for an image based on different zoom levels in Fig. 6 on page 7. This attention score is comparable to saliency score.)
Pilu, Xu and Stentiford teach all of the elements of the current invention as stated
above except they do not teach the system includes stacking the first pixel-level vector, the second pixel-level vector, and the third pixel-level vector depth-wise.
	Nguyen teaches includes stacking the first pixel-level vector, the second pixel-level vector, and the third pixel-level vector depth-wise. (Nguyen teaches on page under [5.2 CMASS for Dynamic Saliency Detection]: “Assuming a linear relationship between feature vector f and saliency map s, we solve the following optimization problem to obtain the linear model W: min FW – S2 + ƛW2, where F and S are matrices by column-wisely stacking the vectors f and s of the training data.”)
At the time of filing, it would have been obvious to a person of ordinary skill to modify the system of Claim 8 to incorporate the teachings of Nguyen of stacking vectors to generate saliency matrices for “a novel camera motion and image saliency aware model for dynamic saliency prediction”. (See [Abstract]). One would have been motivated to combine these methods in a system that uses emails and eye tracking data to train a neural network to predict and produce saliency maps based on saliency scores for future drafted emails.
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Pilu, Xu, Stentiford and further in view of Deng et al. (CN 103793717 A), hereinafter “Deng”.
Regarding Claim 13, the rejection of Claim 8 is incorporated.
Pilu, Xu and Stentiford teach all of the elements of the current invention as stated
above except they do not teach the system of Claim 8, wherein the simple features include a first color moment for each color channel and a second color moment for each color channel.
	Deng teaches the system of Claim 8, wherein the simple features include a first color moment for each color channel and a second color moment for each color channel. (Deng teaches in [0055]: “Because the Lab space closer to the uniformity of visual perception of human beings, so in the embodiment of the present invention, by calculating colour moment of the image (first moment and second moment) in the Lab space, so as to obtain the colour feature vector.”)
At the time of filing, it would have been obvious to a person of ordinary skill to modify the system of Claim 8 to incorporate Deng’s “method and system for training a classifier for judging image main significance” using visual saliency. (See [Abstract]). One would have been motivated to combine these methods in a system that uses emails and eye tracking data to train a neural network to predict and produce saliency maps based on saliency scores for future drafted emails.
Claims 15 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Pilu, Xu, Stentiford and further in view of Shen et al. (“Webpage Saliency”), hereinafter “Shen”.
Regarding Claim 15, the rejection of Claim 8 is incorporated.
Pilu, Xu and Stentiford teach all of the elements of the current invention as stated
above except they do not teach The system of Claim 8, wherein determining the set of elements includes selecting elements from a document object model.
Shen teaches the system of Claim 8, wherein determining the set of elements includes selecting elements from a document object model. (Shen teaches on page 2 under [1.1 Visual Attention Models on Webpages]: “In [4], the authors first collected data when users were engaged in information foraging and page recognition tasks on 361 webpages from 4 categories (cars, diabetes, kite surfing, wind energy). They then performed a linear regression on features extracted from DOM and generated a model for predicting visual attention on webpages using decision trees.”)
At the time of filing, it would have been obvious to a person of ordinary skill to modify the system of Claim 8 to incorporate Shen’s studies of “how humans deploy their attention when viewing webpages” and proposed “computational model that is designed to predict webpage saliency” (See [Abstract]). One would have been motivated to combine these methods in a system that uses emails and eye tracking data to train a neural network to predict and produce saliency maps based on saliency scores for future drafted emails.
Regarding Claim 17, the rejection of Claim 16 is incorporated.
Pilu, Xu and Stentiford teach all of the elements of the current invention as stated
above except they do not teach the computer program product of Claim 16, wherein the instructions that, when executed by the at least one computing device, are also configured to: determine the plurality of elements based on a document object model for the email message.
	Shen teaches the computer program product of Claim 16, wherein the instructions that, when executed by the at least one computing device, are also configured to: determine the plurality of elements based on a document object model for the email message. (Shen teaches on page 2 under [1.1 Visual Attention Models on Webpages]: “In [4], the authors first collected data when users were engaged in information foraging and page recognition tasks on 361 webpages from 4 categories (cars, diabetes, kite surfing, wind energy). They then performed a linear regression on features extracted from DOM and generated a model for predicting visual attention on webpages using decision trees.” Shen teaches [1. Introduction]: “For example, webpages are usually rich in visual media, such as text, pictures, logos and animations” Email platforms are also web pages.)
At the time of filing, it would have been obvious to a person of ordinary skill to modify the product of Claim 16 to incorporate Shen’s studies of “how humans deploy their attention when viewing webpages” and proposed “computational model that is designed to predict webpage saliency” (See [Abstract]). One would have been motivated to combine these methods in a system that uses emails and eye tracking data to train a neural network to predict and produce saliency maps based on saliency scores for future drafted emails.
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Pilu, Xu, Stentiford and further in view of Cholakkal et al. (“Webpage Saliency”), hereinafter “Cholakkal”.
Regarding Claim 20, the rejection of Claim 16 is incorporated.
Pilu, Xu and Stentiford teach all of the elements of the current invention as stated
above except they do not teach the computer program product of Claim 16, wherein each element of the plurality of elements has a bounding box and a coloration of each of the bounding boxes is dependent on the respective saliency score for the element corresponding to the bounding box.
	Cholakkal teaches the computer program product of Claim 16, wherein each element of the plurality of elements has a bounding box and a coloration of each of the bounding boxes is dependent on the respective saliency score for the element corresponding to the bounding box. (Cholakkal teaches in Figure 1 on page 3: “The visual, spatial and neighborhood saliency values for yellow, green and red colored image boxes in (b) are shown in their respective colors. The spatial prior is a 2-D distribution of horse patches in a 4 × 4 spatial grid with white indicating high probability and black indicating low probability. The horse’s head (yellow box) has high correlation to the horse visual prior, resulting in large visual saliency. Similarly, it is less likely to find a horse patch at the position of the red box, resulting in lower spatial saliency.”)
At the time of filing, it would have been obvious to a person of ordinary skill to modify the product of Claim 16 to incorporate Cholakkal’s methods of using rectangular boxes of high and low probability to “extract regions of probable objects in an image to reduce computations in a proposed “framework for top-down salient object detection that incorporates a tightly coupled image classification module.” (See [Abstract]). One would have been motivated to combine these methods in a system that uses emails and eye tracking data to train a neural network to predict and produce saliency maps based on saliency scores with rectangular boxes corresponding to those scores for future drafted emails.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FRANCOIS A NDIAYE whose telephone number is (571)272-9952.  The examiner can normally be reached on M-F 8:30AM-6:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571) 270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/FRANCOIS A NDIAYE/Examiner, Art Unit 2124   

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124