DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant’s claim for domestic priority under 35 U.S.C. 119 (e).

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that use the word “means” or “step” but are nonetheless not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph because the claim limitation(s) recite(s) sufficient structure, materials, or acts to entirely perform the recited function.  Such claim limitation(s) is/are: “document semantics controller configured to receive/determine/generate/determine/determine/generate” in claim 10; “document semantics controller configured to receive/determine/determine/determine/determine/perform” in claim 11; “document semantics controller is configured to determine” in claim 12; “document semantics controller configured to generate” in claim 13; “document semantics controller is configured to determine” in claim 15; “document semantics controller is configured to determine” in claim 16; “document semantics controller is configured to determine” in claim 17; and “document semantics controller is configured to determine” in claim 18.
Because this/these claim limitation(s) is/are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are not being interpreted to cover only the corresponding structure, material, or acts described in the specification as performing the claimed function, and equivalents thereof.
If applicant intends to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to remove the structure, materials, or acts that performs the claimed function; or (2) present a sufficient showing that the claim limitation(s) does/do not recite sufficient structure, materials, or acts to perform the claimed function.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 3, 5, 10, 12, and 14 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Kartoun et al, U.S. Patent 10,885,276.

As per claim 1, it is taught of a method for generating semantic representation of a document using an electronic device to determine data security risk associated with the document (col. 4, lines 3-11), the method comprising:
receiving, by a document semantics controller (programmable data processing apparatus, col. 3, lines 21-26) of the electronic device, a document in an electronic form, wherein the document comprises a plurality of content (document is published online, i.e. in electronic form, col. 4, lines 14-21 & 28-36 and col. 5, lines 20-23);
determining, by the document semantics controller of the electronic device, raw text from the plurality of content (textual content, words, phrases is interpreted as the raw text, col. 5, lines 59-61 and col. 6, lines 53-57);
generating, by the document semantics controller of the electronic device, a plurality of sentence blocks of a predefined size (sentences within context/phrases) using the raw text (col. 5, lines 59-61 and col. 9, lines 45-49);
determining, by the document semantics controller of the electronic device, at least one embeddings for each of the plurality of sentence blocks (col. 9, lines 45-49);
determining, by the document semantics controller of the electronic device, the semantic representation of the document based on the at least one embeddings for each of the plurality of sentence blocks (col. 9, lines 42-49); and
generating, by the document semantics controller of the electronic device, the semantic representation of the document to determine the data security risk associated with the document (col. 6, lines 16-32; col. 9, lines 42-49; and col. 9, line 63 through col. 10, line 16).
As per claim 3, it is taught wherein determining, by the document semantics controller of the electronic device, the raw text from the plurality of content comprises:
extracting, by the document semantics controller of the electronic device, the raw text from the plurality of content (textual content, words, phrases is interpreted as the raw text, col. 5, lines 59-61 and col. 6, lines 53-57);
performing, by the document semantics controller of the electronic device, a character filtration (sensitive information is identified using NLP and ML, hence filtering to eliminate non-sensitive data) on the raw text to eliminate predefined types of characters from the raw text (col. 5, lines 53-61 and col. 6, lines 22-40);
determining, by the document semantics controller of the electronic device, at least one word and at least one sentence in the raw text extracted from the document which is not corresponding to the semantic representation of the document (col. 9, lines 42-49 and col. 9, line 63 through col. 10, line 16);
eliminating (non-sensitive information is marked as not being important, col. 10, lines 3-10), by the document semantics controller of the electronic device, the at least one word and the at least one sentence not corresponding to the semantic representation of the document (col. 9, lines 42-49); and
determining, by the document semantics controller of the electronic device, the raw text from the plurality of content (col. 5, lines 59-61 and col. 6, lines 53-57).
As per claim 5, it is taught wherein the at least one embeddings for each of the plurality of sentence blocks is determined using a deep-learning language model (col. 4, lines 28-36 and col. 9, lines 42-49).
As per claim 10, it is disclosed of an electronic device for generating semantic representation of a document to determine data security risk associated with the document (col. 4, lines 3-11), the electronic device comprising:
a memory; a processor coupled to the memory (col. 2, lines 4-10);
a document semantics controller (programmable data processing apparatus, col. 3, lines 21-26) coupled to the memory and the processor, and configured to:
receive a document in an electronic form, wherein the document comprises a plurality of content (document is published online, i.e. in electronic form, col. 4, lines 14-18 & 28-36 and col. 5, lines 20-23);
determine raw text from the plurality of content (textual content, words, phrases is interpreted as the raw text, col. 5, lines 59-61 and col. 6, lines 53-57);
generate a plurality of sentence blocks of a predefined size (sentences within context/phrases) using the raw text (col. 9, lines 45-49);
determine at least one embeddings for each of the plurality of sentence blocks (col. 9, lines 45-49);
determine the semantic representation of the document based on the at least one embeddings for each of the plurality of sentence blocks (col. 9, lines 42-49); and
generate the semantic representation of the document to determine the data security risk associated with the document (col. 6, lines 16-32; col. 9, lines 42-49; and col. 9, line 63 through col. 10, line 16).
As per claim 12, it is disclosed wherein the document semantics controller is configured to determine the raw text from the plurality of content by:
extract the raw text from the plurality of content (textual content, words, phrases is interpreted as the raw text, col. 5, lines 59-61 and col. 6, lines 53-57);
perform a character filtration (sensitive information is identified using NLP and ML, hence filtering to eliminate non-sensitive data) on the raw text to eliminate predefined types of characters from the raw text (col. 5, lines 53-61 and col. 6, lines 22-40);
determine at least one word and at least one sentence in the raw text extracted from the document which is not corresponding to the semantic representation of the document (col. 9, lines 42-49 and col. 9, line 63 through col. 10, line 16);
eliminate (non-sensitive information is marked as not being important, col. 10, lines 3-10) the at least one word and the at least one sentence not corresponding to the semantic representation of the document (col. 9, lines 42-49); and
determine the raw text from the plurality of content (col. 5, lines 59-61 and col. 6, lines 53-57).  
As per claim 14, it is disclosed wherein the at least one embeddings for each of the plurality of sentence blocks is determined using a deep-learning language model (col. 4, lines 28-36 and col. 9, lines 42-49).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 4 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Kartoun et al, U.S. Patent 10,885,276 in view of Argyros et al, US 2019/015256.

As per claims 4 and 13, it is taught by Kartoun et al wherein the document semantics controller is configured to determine the semantic representation of the document based on the at least one embeddings for each of the plurality of sentence blocks (sentences within context/phrases, col. 9, lines 45-49), however they fail to disclose of determining that a number of extracted raw text meets a predefined threshold for the raw text; generate a plurality of tokens by combining a set of the raw text extracted from the document; and generate the plurality of sentence blocks based on sentence splicing of the plurality of tokens, wherein each sentence block of the plurality of sentence blocks comprises a predefined number of tokens.
Argyros et al discloses of determining that a number of extracted raw text meets a predefined threshold for the raw text (values expressed as character strings such as low or high, paragraph 0075); generate a plurality of tokens by combining a set of the raw text extracted from the document (see paragraph 0076); and generate the plurality of sentence blocks based on sentence splicing of the plurality of tokens, wherein each sentence block of the plurality of sentence blocks comprises a predefined number of tokens (see paragraph 0027 and 0076).
It would have been obvious to person of ordinary skill in the art before the effective filing date of the claimed invention to have been motivated to compute token values as a means indicative of a subject to be used as evaluating factors relevant to computing risk scoring (paragraph 0026).  Although the teachings of Kartoun et al disclose of determining the semantic representation of the document based on the at least one embeddings for each of the plurality of sentence blocks (col. 9, lines 45-49), the teachings of Argyros et al offer a further detailed, automated, and tokenized way for effectively identifying risk factors of texts and grammatical structures (paragraph 0076) that the teachings of Kartoun et al fail to disclose.  The claims would have been obvious because a person of ordinary skill in the art would have been motivated to combine the prior art to achieve the claimed invention and that there would have been a reasonable expectation of success.

Claims 6-9 and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Kartoun et al, U.S. Patent 10,885,276 in view of Redlich et al, U.S. Patent 7,546,334.

As per claims 6 and 15, Kartoun et al discloses wherein the document semantics controller is configured to determine the semantic representation of the document based on the at least one embeddings for each of the plurality of sentence blocks (sentences within context/phrases, col. 9, lines 45-49), however they fail to disclose of determining an embedding matrix comprising the at least one embeddings for each of the plurality of sentence blocks; determine a document embedding by aggregating the at least one embeddings for each of the plurality of sentence blocks in the embedding matrix; and determine the semantic representation of the document based on the document embedding.
Redlich et al teaches of determining an embedding matrix comprising the at least one embeddings for each of the plurality of sentence blocks (Optional Automatic Defenses Matrix in col, 49, lines 1-12 and col. 69, lines 4-17); determine a document embedding by aggregating the at least one embeddings for each of the plurality of sentence blocks in the embedding matrix (col, 49, lines 1-12 and col. 69, lines 4-17); and determine the semantic representation of the document based on the document embedding (col. 4, lines 4-13).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have been motivated to apply a detailed semantic representation of a document in order to minimize the time of exposure of important content within a document (Redlich et al, col. 15, lines 33-34) for minimizing the security risk of the content by protecting the sensitive content (Redlich et al, col. 15, lines 44-48). Although the teachings of Kartoun et al disclose of determining the semantic representation of the document based on the at least one embeddings for each of the plurality of sentence blocks (col. 9, lines 45-49), the teachings of Redlich et al further detailed, automated, and extraction of granular data (Redlich et al, col. 15, lines 44-48) that includes are more detailed analysis of sentence structure (Redlich et al, col. 69, lines 4-17) that the teachings of Kartoun et al fail to disclose.  The claims would have been obvious because a person of ordinary skill in the art would have been motivated to combine the prior art to achieve the claimed invention and that there would have been a reasonable expectation of success.

As per claims 7 and 16, Kartoun et al discloses wherein the document semantics controller is configured to determine the semantic representation of the document based on the at least one embeddings for each of the plurality of sentence blocks (sentences within context/phrases, col. 9, lines 45-49), however they fail to disclose of generating a similarity graph using the plurality of sentence blocks; determine a page-rank score for the each of the sentence blocks of the plurality of sentence blocks; perform a maximum margin relevance scoring on the page-rank score determined for each of the sentence blocks; determine a set of sentence blocks with highest maximum margin relevance scoring; determine a document embedding by aggregating the set of sentence blocks with the highest maximum margin relevance scoring; and determine the semantic representation of the document based on the document embedding.
Redlich et al teaches of generating a similarity graph (algorithm employing a mathematical construct) using the plurality of sentence blocks (col. 53, line 60 through col. 54, line 2 and col. 69, lines 4-17); determine a page-rank score (count value) for the each of the sentence blocks of the plurality of sentence blocks (col. 53, line 60 through col. 54, line 2 and col. 69, lines 4-17); perform a maximum margin relevance scoring (higher level of security is required based upon the count values and data) on the page-rank score determined for each of the sentence blocks (col. 54, lines 28-39); determine a set of sentence blocks with highest maximum margin relevance scoring (higher level of security is required, col. 54, lines 28-39); determine a document embedding by aggregating the set of sentence blocks with the highest maximum margin relevance scoring (col. 54, lines 28-39 and col. 69, lines 4-17); and determine the semantic representation of the document based on the document embedding (col. 4, lines 4-13).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have been motivated to apply a detailed semantic representation of a document in order to minimize the time of exposure of important content within a document (Redlich et al, col. 15, lines 33-34) for minimizing the security risk of the content by protecting the sensitive content (Redlich et al, col. 15, lines 44-48). Although the teachings of Kartoun et al disclose of determining the semantic representation of the document based on the at least one embeddings for each of the plurality of sentence blocks (col. 9, lines 45-49), the teachings of Redlich et al further detailed, automated, and extraction of granular data (Redlich et al, col. 15, lines 44-48) that includes are more detailed analysis of sentence structure including page rank scoring (Redlich et al, col. 53, line 60 through col. 54, line 2 and col. 69, lines 4-17) that the teachings of Kartoun et al fail to disclose.  The claims would have been obvious because a person of ordinary skill in the art would have been motivated to combine the prior art to achieve the claimed invention and that there would have been a reasonable expectation of success.

As per claims 8 and 17, Kartoun et al discloses wherein the document semantics controller is configured to determine the semantic representation of the document based on the at least one embeddings for each of the plurality of sentence blocks (sentences within context/phrases, col. 9, lines 45-49), however they fail to disclose of identifying embeddings of a set of sentence blocks of the plurality of sentence blocks that are outlying; filter to exclude the set of sentence blocks that are outlying; determine a document embedding by aggregating the embeddings of remaining sentence blocks after filtering the set of sentence blocks that are outlying; and determine the semantic representation of the document based on the document embedding.
The teachings of Redlich et al discloses of identifying embeddings of a set of sentence blocks of the plurality of sentence blocks that are outlying (col. 69, lines 4-17); filter to exclude the set of sentence blocks that are outlying (col. 69, lines 25-26); determine a document embedding by aggregating the embeddings of remaining sentence blocks after filtering the set of sentence blocks that are outlying (col. 69, lines 4-17 & 25-26); and determine the semantic representation of the document based on the document embedding (col. 4, lines 4-13).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have been motivated to apply a detailed semantic representation of a document in order to minimize the time of exposure of important content within a document (Redlich et al, col. 15, lines 33-34) for minimizing the security risk of the content by protecting the sensitive content (Redlich et al, col. 15, lines 44-48). Although the teachings of Kartoun et al disclose of determining the semantic representation of the document based on the at least one embeddings for each of the plurality of sentence blocks (col. 9, lines 45-49), the teachings of Redlich et al further detailed, automated, and extraction of granular data (Redlich et al, col. 15, lines 44-48) that includes are more detailed analysis of sentence structure (Redlich et al, col. 69, lines 4-17) that the teachings of Kartoun et al fail to disclose.  The claims would have been obvious because a person of ordinary skill in the art would have been motivated to combine the prior art to achieve the claimed invention and that there would have been a reasonable expectation of success.

As per claims 9 and 18, Kartoun et al teaches wherein the document semantics controller is configured to determine the semantic representation of the document based on the at least one embeddings for each of the plurality of sentence blocks (sentences within context/phrases, col. 9, lines 45-49), however they fail to disclose of determining a plurality of clusters comprising each of the embeddings of each of the plurality of sentence blocks; determine at least one cluster of the plurality of clusters that is a semantic noise; filter the at least one cluster of the plurality of clusters that is the semantic noise; determine a document embedding by aggregating the embeddings of remaining sentence blocks after filtering the semantic noise; and determine the semantic representation of the document based on the document embedding.
Redlich et al discloses of determining a plurality of clusters (relationships of sentences to their environment) comprising each of the embeddings of each of the plurality of sentence blocks (col. 69, lines 4-17); determine at least one cluster of the plurality of clusters that is a semantic noise (common text, col. 22, lines 8-10); filter the at least one cluster of the plurality of clusters that is the semantic noise (col, 69, lines 25-26); determine a document embedding by aggregating the embeddings of remaining sentence blocks after filtering the semantic noise (col. 69, lines 4-17 & 25-26); and determine the semantic representation of the document based on the document embedding (col. 4, lines 4-13).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have been motivated to apply a detailed semantic representation of a document in order to minimize the time of exposure of important content within a document (Redlich et al, col. 15, lines 33-34) for minimizing the security risk of the content by protecting the sensitive content (Redlich et al, col. 15, lines 44-48). Although the teachings of Kartoun et al disclose of determining the semantic representation of the document based on the at least one embeddings for each of the plurality of sentence blocks (col. 9, lines 45-49), the teachings of Redlich et al further detailed, automated, and extraction of granular data (Redlich et al, col. 15, lines 44-48) that includes are more detailed analysis of sentence structure (Redlich et al, col. 69, lines 4-17) that the teachings of Kartoun et al fail to disclose.  The claims would have been obvious because a person of ordinary skill in the art would have been motivated to combine the prior art to achieve the claimed invention and that there would have been a reasonable expectation of success.

Allowable Subject Matter
Claims 2 and 11 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The prior art teachings of Krishnaprasad et al, US 2007/0220268 disclose of security values are appended to a user query and passed to an application.  The results are received from the application based on the security attributes of the requesting user, and are shown documents that which access is permitted and denied if access is not permitted, see paragraph 0131.
As per claims 2 and 11, it was not found to be taught in the prior art of determining at least one attribute of a plurality of attributes associated with the user, wherein the at least one attribute indicates a user security risk profile; determine a document security risk profile based on the semantic representation of the document and semantic representation of neighboring documents; determine whether the user security risk profile matches the document security risk profile; and to allow/deny  access to the document based upon the comparison results.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Yao et al, US 2020/0233958 is relied upon for disclosing of performing semantic analysis on data for risk analysis, see paragraph 0016.
Turner et al, US 2016/0210468 is relied upon for disclosing of classification of sensitive data in an electronic document, see abstract.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHRISTOPHER A REVAK whose telephone number is (571)272-3794. The examiner can normally be reached 5:30am - 3:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, LYNN FEILD can be reached on 571-272-2092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CHRISTOPHER A REVAK/Primary Examiner, Art Unit 2431