DETAILED ACTION

Notice of AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 3/15/2021 has been entered.

Claims 1-20 are pending.

Response to Arguments
The arguments/remarks filed by the applicant on 3/15/2021 have been fully considered and are responded in the following.

Applicant’s arguments, “the specification describes types of file metadata other than the limited list identified in the Office Action in sufficient detail that it would be clearly recognizable to a person skilled in the art that Applicant had possession of the claimed invention, thus satisfying the written description requirement of 35 U.S.C. § 112(a)”, see p. 8, ¶3 - p. 10, ¶2, filed 3/15/2021, with respect to 

Applicant’s arguments, ‘McDougal, Grzymala-Busse, and Treat fail to disclose and would not have rendered obvious: "determining, by the traffic analysis service, a sensitivity score for the electronic file based on the file metadata by using the file metadata as input to a machine learning-based classifier that has been trained to label a given electronic file with a given sensitivity score, wherein the sensitivity score is indicative of a probability of the electronic file containing sensitive or protected information," as recited in claim 1, and similarly recited in claims 10 and 19 (as amended)’, see p. 13, ¶2, filed 3/15/2021, with respect to the amended claims overcoming the cited prior art references of the rejection of claims 1, 10, and 19 under 35 USC § 103 have been fully considered and are persuasive. Therefore, the rejection has been withdrawn; however, upon further search and consideration, a new grounds of rejection – as necessitated by amendment – is made in view of newly cited prior art Jean-Louis. Please refer to "Claim Rejections - 35 USC § 103" section below for detail analysis.

Claim Objections
Claims 3 and 12 are objected to because of the following informalities: 
Claims 3 and 12 recite “using a machine learning-based classifier to predict a plaintext data size of the traffic;” The expression "a machine learning-based classifier" has already been defined previously in the claims and should therefore be referred to using a definite article.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1, 2, 4, 8, 10, 11, 13, 17 and 19 are rejected under 35 U.S.C. 103 as being unpatentable McDougal (US 20130081142 A1) in view of Jean-Louis (US 20190347429 A1, PRO 62/670,741 with filing date 5/12/2018).

Regarding claim 1, McDougal teaches a method comprising:
obtaining, by a traffic analysis service that monitors a network, file metadata regarding an electronic file; (FIG. 6: intercept communication (step 605); extract meta data (step 610);) Here McDougal discloses intercepting a communication at a first node of a security system. The method also includes extracting communication metadata associated with the communication. The communication metadata comprises a plurality of different fields. The method additionally includes determining if the communication comprises an attached file and, if so, extracting file metadata associated with the file. (¶2)
determining, by the traffic analysis service, a sensitivity score for the electronic file based on the file metadata; (determine score for each field of meta data (step 615); combine score from each 
detecting, by the traffic analysis service, the electronic file within traffic in the network; and ([0029] FIG. 2: File type module 220, may be configured to determine the type of file that ingest module 210 receives. File type module 220 may, also be configured to determine the metadata for a communication. File type module 220 may determine the type of a file. For example, file type module 220 may examine an extension associated with the file to determine the type of the file. As another example, file type module 220 may examine portions of the file in order to determine its type. File type module 220 may look at where it received the file to determine its type. File type module 220 may look at characters (magic numbers) in a header of a file to determine its type. In this manner, file type module 220 may detect the correct type of the file even if the file's extension has been removed or changed. As another example, certain types of files may be determined based on both magic number(s) and the file extension. File type module 220 may parse out and store each field value for each metadata field.) Here McDougal discloses detailed examples on detecting file within traffic in the network.
causing, by the traffic analysis service, performance of a mitigation action regarding the detection of the electronic file within the traffic, based on the sensitivity score of the electronic file. (FIG. 6: generate prediction classification for communication (step 630); receive indication of whether communication is malicious communication (step 635); identify meta data fields or field values indicative of malicious communication (step 650);) Here McDougal discloses that “the identification of the metadata fields or field values may be provided in a report to a user, or may be kept internally as part of an update to the algorithm or database (¶85)”, as possible “mitigation action”.

McDougal teaches determining a sensitivity score, but does not explicitly teach determining, by the traffic analysis service, a sensitivity score for the electronic file based on the file metadata by using 
However, Jean-Louis in an analogous art explicitly teaches 
determining, by the traffic analysis service, a sensitivity score for the electronic file based on the file metadata by using the file metadata as input to a machine learning-based classifier that has been trained to label a given electronic file with a given sensitivity score, wherein the sensitivity score is indicative of a probability of the electronic file containing sensitive or protected information; ([0009-0013] scanning a computer location to select the electronic document; in the electronic document, scanning contents of the electronic document and metadata of the electronic document; identifying each occurrence of sensitive data by classifying each portion of the contents forming the electronic document as sensitive, or not sensitive, per se; for each occurrence of the sensitive data, determining a type of the sensitive data and determining a risk score associated to the type of the sensitive data; using the risk score of each occurrence of the sensitive data to determine an exposure risk score of the electronic document. [0102- 0103] Various machine learning algorithms can be used for the training phase. Once the classifier is trained, it is applied on all the enterprise documents of a data location. For the enterprise documents that have been labelled as containing sensitive information by the classifier, the confidence score is used to represent the exposure risk score of that document.) Here Jean-Louis defines “sensitive data/private information” as “any information about an individual maintained by an agency, including (1) any information that can be used to distinguish or trace an individual's identity, such as name, social security number, date and place of birth, mother's maiden name, or biometric records; and (2) any other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information” (¶3). Jean-Louis considers “the metadata associated with the enterprise document as part of that document. Example of metadata are author's name, various timestamps (e.g., date of creation, last modification date, etc.), type of enterprise document, etc.” (¶7).
causing, by the traffic analysis service, performance of a mitigation action regarding the detection of the electronic file within the traffic, based on the sensitivity score of the electronic file. ([0110] The risk exposure provides a way to detect the enterprise documents that have sensitive content. To mitigate the risk, the system is used to define some actions that are executed on the enterprise documents when a risk level/score is reached. Examples of actions include but are not limited to, send a notification to a system administrator or a user (e.g. the owner of the file), move the document to a different data location, change the ACLs, and the like.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the “classifying communications” concept of McDougal, and the “managing electronic documents based on sensitivity of information” approach of Jean-Louis, to improve computer security by determining information sensitivity in electronic documents then taking action to mitigate the risk accordingly (Jean-Louis [0002, 0110]).

Regarding claim 2, McDougal in view of Jean-Louis teaches all the features with respect to claim 1, as outlined above. McDougal further teaches wherein the mitigation action comprises sending an alert to a user interface that identifies the electronic file and a sender of the traffic. ([0060, 0085, 0076, 0020] if the communication is classified as potentially containing malicious code, adjudication and disposition module 410 may cause an alert to be sent to an analyst or system administrator that the communication has been characterized as potentially containing malicious code. The identification of the metadata fields or field values may be provided in a report to a user. The metadata may be file version, author, rendering software, time stamp, etc.) of the communication. The database may contain field values associated with metadata extracted from previous communications intercepted at the first node of the security system. Based on a database of previous metadata fields and field values, for a particular sender the database or collection of information may identify whether the sender is likely to send malicious or non-malicious communications.) In summary, file identification and sender/author, as part of the metadata, are sent in an alert in any potentially malicious situation.

Regarding claim 4, McDougal in view of Jean-Louis teaches all the features with respect to claim 1, as outlined above. McDougal further teaches wherein the file metadata comprises user profile information associated with the electronic file. ([0076] The metadata may be associated with the communication itself (e.g., the communication's header, hidden fields, etc.) or with the attachment (e.g., file version, author, rendering software, time stamp, etc.) of the communication.)

Regarding claim 8, McDougal in view of Jean-Louis teaches all the features with respect to claim 1, as outlined above. McDougal further teaches wherein the sensitivity score for the electronic file is determined based further on whether malware was detected on an endpoint on which the electronic file is hosted. ([0020] The score derived from the database or collection of information may change over time based on the success or failure of previous classifications. For example, a sender may initially be identified as not likely to send malicious emails. However, if the next five emails from the sender are all classified as malicious, then the database or collection of information may be updated to associate the sender as someone likely to send malicious emails.) Here McDougal discloses that score indicative of a 

Regarding claim 10 and 19, the scope of the claim is similar to that of claim 1. Accordingly, the claim is rejected using a similar rationale.

Regarding claim 11, the scope of the claim is similar to that of claim 2. Accordingly, the claim is rejected using a similar rationale.

Regarding claim 13, the scope of the claim is similar to that of claim 4. Accordingly, the claim is rejected using a similar rationale.

Regarding claim 17, the scope of the claim is similar to that of claim 8. Accordingly, the claim is rejected using a similar rationale.

Claim 3, 5, 9, 12, 14, 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable McDougal (US 20130081142 A1) in view of Jean-Louis (US 20190347429 A1, PRO 62/670,741 with filing date 5/12/2018) and Treat (US 9641544 B1).

Regarding claim 3, McDougal in view of Jean-Louis teaches all the features with respect to claim 1, as outlined above. But McDougal does not teach wherein the traffic is encrypted, and wherein detecting, by the traffic analysis service, the electronic file within traffic in the network comprises: using 
However, Treat in an analogous art explicitly teaches wherein the traffic is encrypted, ([Col 6: Line 23-27] monitoring network communications, including SSL or other encrypted network communications) and wherein detecting, by the traffic analysis service, the electronic file within traffic in the network comprises: 
using a machine learning-based classifier ([Col 6: Line 46-48] machine learning techniques (MLT) to generate real-time, rapidly evolving behavior profiles for users across the enterprise network) to predict a plaintext data size of the traffic; and ([Col 6: Line 18-21, Col 8: Line 45-51] Various new and distinct forms of preventive, predictive, and prescriptive models are introduced to effectively counter internal attackers and achieve insider threat prevention. For example, assume that Alice is an employee of ACME Company who has access to an enterprise network for ACME Company, and assume that an example security policy for ACME Company (e.g., insider threat prevention (ITP) policy) caps file transfers to 10 megabytes (MB) within a predefined period of time to an offsite site (e.g., Box, Gmail, or other apps/web services). In this case, a network device can detect a file transfer activity that violates this example ITP policy.) Here Treat predicts 10 megabytes (MB) as plaintext data size of the traffic.
matching a file size of the electronic file to the predicted plaintext data size of the traffic. ([Col 11: Line 46-53, Col 26: Line 2-14] provide real-time content scanning, such as for monitoring and/or controlling file transfer activities (including data limits on file transfers and/or destination-based restrictions on such file transfers), and/or other information to match signatures (e.g., file-based, protocol-based, and/or other types/forms of signatures for detecting malware or suspicious behavior). Inputs, which can be utilized by prevention controller 902 for performing the disclosed insider threat prevention techniques, can include the following, but can also include additional inputs for these or file size, number of file transfers, destination for a file transfer, file type, and file name.) In summary, Treat discloses that file size can be utilized to compare/match to the signatures/activities, which can be the data size of the traffic.
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the “classifying communications” concept of McDougal, and the “threat prevention” approach of Treat, so various new and distinct forms of preventive, predictive, and prescriptive models are introduced to effectively counter internal attackers and achieve insider threat prevention (Treat [Col 6: Line 18-21]).

Regarding claim 5, McDougal in view of Jean-Louis teaches all the features with respect to claim 1, as outlined above. McDougal in view of Jean-Louis and Treat further teaches wherein determining the sensitivity score for the electronic file based on the file metadata comprises:
using the machine learning-based classifier to classify the file metadata, wherein the machine learning-based classifier is trained using a training dataset that comprises file metadata for a plurality of files that has been labeled with sensitivity scores. ([Treat Col 7: Line 61 - Col 8: Line 10] correlation of metrics and/or monitored network activities to determine that a given network activity is an anomalous activity. For example, a static default value can be utilized for thresholds for metrics to provide a baseline user behavior profile. These thresholds and/or metrics can be updated (e.g., trained) based on monitored activities for one or more users. As an example, in a learning mode/training mode, these thresholds and/or metrics can be dynamically tuned to adjust such default values for thresholds for metrics to provide a dynamically generated baseline user behavior profile. In an example implementation, one or more Machine Learning Techniques (MLT) can be utilized for implementing trainings and applications of these user behavior profiles.) In addition, reference McDougal discloses a training example of “for a particular sender the database or collection of information may identify whether the sender is likely to send malicious or non-malicious communications. The score derived from the database or collection of information may change over time based on the success or failure of previous classifications. For example, a sender may initially be identified as not likely to send malicious emails. However, if the next five emails from the sender are all classified as malicious, then the database or collection of information may be updated to associate the sender as someone likely to send malicious emails (¶ 20).” Therefore, the combination discloses the whole limitation.

Regarding claim 9, McDougal in view of Jean-Louis teaches all the features with respect to claim 1, as outlined above. McDougal in view of Jean-Louis and Treat further teaches wherein obtaining the file metadata regarding an electronic file comprises:
receiving, at the traffic analysis service, the file metadata from an agent executed by an endpoint on which the electronic file is hosted. ([Treat Col 8: Line 36-39, Col 26: Line 2-14] endpoint security agents that can be deployed and executed on client devices to monitor network traffic, applications, and user activities on an enterprise network. Communications that can include inputs (analogous to claim limitation “metadata”) from endpoint security agents executed on each of the client devices to prevention controller 902. Inputs can include the following: user identification (ID), file transfer application (app ID), file size, number of file transfers, destination for a file transfer, file type, and file name.)

Regarding claim 12, the scope of the claim is similar to that of claim 3. Accordingly, the claim is rejected using a similar rationale.

Regarding claim 14 and 20, the scope of the claim is similar to that of claim 5. Accordingly, the claim is rejected using a similar rationale.

Regarding claim 18, the scope of the claim is similar to that of claim 9. Accordingly, the claim is rejected using a similar rationale.

Claim 6 and 15 are rejected under 35 U.S.C. 103 as being unpatentable McDougal (US 20130081142 A1) in view of Jean-Louis (US 20190347429 A1, PRO 62/670,741 with filing date 5/12/2018) and Manadhata (US 8621233 B1).

Regarding claim 6, McDougal in view of Jean-Louis teaches all the features with respect to claim 1, as outlined above. But McDougal does not teach wherein the sensitivity score is determined further based on a frequency of the file appearing on endpoints across at least a portion of the network. This aspect of the claim is identified as a difference.
However, Manadhata in an analogous art explicitly teaches wherein the sensitivity score is determined further based on a frequency of the file appearing on endpoints across at least a portion of the network. ([Col 7: Line 61-65] A frequency weighting module 324 weights the similarity score for the pair of names based on the frequency distribution of the file names, where a file name's frequency is measured by the number of endpoints 112 on which an instance of the file having the given name is found.) Here Manadhata summaries the invention in [Abstract] as “generate a score indicating a confidence that the computer file contains malicious software. The score is weighted based on file name frequency, the age of the file, and the prevalence of the file. The weighted score is used to determine 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the “classifying communications” concept of McDougal, and the “confidence score based on file name frequency” approach of Manadhata, to detect malicious software and improve computer security by identifying and analyzing a computer file stored on a plurality of different endpoints (Manadhata [Col 1: Line 6-7, 45-47]).

Regarding claim 15, the scope of the claim is similar to that of claim 6. Accordingly, the claim is rejected using a similar rationale.

Claim 7 and 16 are rejected under 35 U.S.C. 103 as being unpatentable McDougal (US 20130081142 A1) in view of Jean-Louis (US 20190347429 A1, PRO 62/670,741 with filing date 5/12/2018) and Hanusiak (US 20180336323 A1).

Regarding claim 7, McDougal in view of Jean-Louis teaches all the features with respect to claim 1, as outlined above. McDougal further teaches wherein the metadata comprises a file name or file path, ([0049] a report may be sent based on the monitored behavior and results. The report may include information such as the name of the file.). But McDougal does not teach wherein determining the sensitivity score for the electronic file comprises: matching one or more words that appear in the file name or file path of the electronic file to words appearing in file names or file paths of a plurality of electronic files; and calculating the sensitivity score for the electronic file based in part on frequencies of the one or more matched words appearing in the file names or file paths of the plurality of electronic files. This aspect of the claim is identified as a difference.
explicitly teaches wherein determining the sensitivity score for the electronic file comprises:
matching one or more words that appear in the file name or file path of the electronic file to words appearing in file names or file paths of a plurality of electronic files; and ([0065] compares file-i from the list of files 520 with an id-file-j from an identifier file database 124 and determines a similarity score between file-i and id-file-j, as shown at 610. In one or more examples, the similarity score is determined based on the names of the files, a number of characters matching between the two file names.) Reference McDougal in ¶19 discloses that URL may be used in classifying communications (e.g., the URL may be tokenized, meaning “break text into individual linguistic units, such as words.”). Reference Hanusiak discloses matching linguistic units among files to determine similarity score. Therefore, the combination discloses the whole limitation.
calculating the sensitivity score for the electronic file based in part on frequencies of the one or more matched words appearing in the file names or file paths of the plurality of electronic files. ([0065] the similarity score is determined based on the names of the files, a number of characters matching between the two file names. Alternatively, or in addition, the similarity score is a ratio between the number of matching characters and non-matching characters. Further, in other examples, other attributes of the files are used for computing the similarity score. For example, sizes of the files, date of creation, date of modification, location of the file (path), are used for computing the similarity score. The similarity score thus indicates a similarity between the file from the list of files 520 and a file from the identifier file database 124.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the “classifying communications” concept of McDougal, and the “characters matching between file names” approach of Hanusiak, to be able to evaluate (Hanusiak [0065]).

Regarding claim 16, the scope of the claim is similar to that of claim 7. Accordingly, the claim is rejected using a similar rationale.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20150254555 A1, "Classifying data with deep learning neural records incrementally refined through expert input", by Williams, Jr., teaches classifying data using machine learning that may be incrementally refined based on expert input. Data provided to a deep learning model that may be trained based on a plurality of classifiers and sets of training data and/or testing data. Data Ingestion is used to populate Training Corpus and Testing Corpus with data representations including, data sensitivity information such as a score indicating how widely accessible the data is, based on file or content access permissions, a score indicating the business sensitivity of the data or content based on similarity of data to known highly sensitive content, or computed based on the presence or absence of key phrases or keywords within the data or content, or computed based on the presence or absence of key patterns within the data or content. A use of the system is to detect the presence of sensitive images and diagrams stored within 1 files, documents, and databases.
US 8640251 B1, "Methods and systems for classifying computer documents into confidential levels using log information", by Lee, teaches that files of computer documents are classified 
US 20200320418 A1, "System and Method for Third Party Data Management", by Aminian, teaches a third party data management system that uses a classification algorithm trained using a machine learning process to analyze type(s) of data that will be shared with the third party to determine a risk of sharing data with the third party. Periodically data provided to a particular third party can be analyzed to identify privacy issue(s). In response to the analysis, an action to be taken with respect to the particular third party can be identified and provided to a user.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAN YANG whose telephone number is (408)918-7638.  The examiner can normally be reached on Monday to Friday, 9:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/H.Y./Examiner, Art Unit 2493

/Kevin Bechtel/Primary Examiner, Art Unit 2491