Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is in response to the claims filed 6/30/2020.  Claims 1-20 are pending.  Claims 1 (a CRM), 8 (a method), and 16 (a method) are independent.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 12 and 20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the enablement requirement.  The claim(s) contains subject matter which was not described in the specification in such a way as to enable one skilled in the art to which it pertains, or with which it is most nearly connected, to make and/or use the invention. 
Claims 12 and 20 require: “wherein the URL is the only input processed to generate the decision statistic.”
Applicant has a related disclosure in paragraph 20: “The technology described herein uses computing resources more efficiently because the URL being classified may be the only input to the classifier. Using a single input also reduces latency because less information needs to be processed. This technology uses a single input, while the prior art typically uses multiple inputs (e.g., context, metadata) that each need to be processed…. a single unstructured text from a content being classified is the only input to the classification process. As used herein, unstructured means without metadata.”

Claims 12 and 20 requires that the only information used to determine a URLs nature is the URL itself.  Conversely, Applicant’s specification describes that other data, such as context and metadata, are not input into the classifier.  Thus, Applicant’s specification is describing data related to the to-be-classified-URL but the claim is directed to all data.  This goes beyond Applicant’s description to exclude training the classifier, or providing other information on how to detect malicious URLs.  This excludes, for example, using a vocabulary of words to perform a contextual embedding, see Applicant’s specification ¶ 28.  Contextual embeddings are required in independent claim 8, on which claim 12 depends.
In other words, it is impossible to determine whether a URL is malicious using only the URL itself, other information on how to decide maliciousness is required.  Claim 12 is not enabled because the plain meaning of the claim requires impossible acts and excludes other limitations previously set forth in independent claim 8.
The test of enablement is analyzed using the Wands factors, MPEP 2164.01:
(A) The breadth of the claims: Claims 12 and 20 require that only a URL is known to the system.
(B) The nature of the invention: URL filtering.
(C) The state of the prior art: using multiple inputs to determine if a URL is malicious, see Applicant’s specification ¶ 28.
(D) The level of one of ordinary skill: a programmer
(E) The level of predictability in the art: computer programs are completely predictable.
(F) The amount of direction provided by the inventor: Applicant’s specification does not describe a system that uses only a URL.
(G) The existence of working examples: Applicant’s specification does not describe a system that uses only a URL.
(H) The quantity of experimentation needed to make or use the invention based on the content of the disclosure: The issue is that it is impossible to determine the maliciousness of a URL using only the URL itself. Other information is necessary, model training information, lists of known malicious URLs or patterns, etc.  
Thus, the requirement of claims 12 and 20: “wherein the URL is the only input processed to generate the decision statistic” lacks enablement for being impossible to implement.  


The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Each of the independent claims 1, 8, and 16 require: 
“a method for displaying a class indication for unstructured text in a URL”

However, each claim concludes by stating: “blocking access … in response to the decision.”
It is unclear how the claims “display[s] a class indication”, as required by the preamble, when the claims are directed to blocking access.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are is/are rejected under 35 U.S.C. 103 as being unpatentable over Le et al., “URLNet: Learning a URL Represetnation with Deep Learning for Malicious URL Detection” (published 2018), in view of Baughman et al., US 2018/0077120 (filed 2016-09).
As to claim 1, Le discloses: 
…
receiving a URL; (“Our goal is to classify a given URL as malicious or not.” Le § 2.1)
forming, from the URL, a first contextual-word embedding that represents a first word identified in the URL (“For M unique words, we have to learn an embedding matrix” Le § 3.3) and represents a context of the first word in the URL; (See Applicant’s ¶ 28 discussing ‘context’: “each URL ut is then mapped to a vector xt ∈ R M , such that i th element in xt is set as 1 if word wi is present in the URL, and 0 otherwise. In addition to these Bag-of-Words features” Le § 2.2. “Here, the aim is to learn temporal properties from a sequence of words of length 3, 4, 5, 6 appearing together.” Le § 3.3.1)
forming, from the URL, a second contextual-word embedding that represents a second word identified in the URL (“For M unique words, we have to learn an embedding matrix” Le § 3.3) and represents a context of the second word in the URL; (“each URL ut is then mapped to a vector xt ∈ R M , such that i th element in xt is set as 1 if word wi is present in the URL, and 0 otherwise. In addition to these Bag-of-Words features” Le § 2.2. “Here, the aim is to learn temporal properties from a sequence of words of length 3, 4, 5, 6 appearing together.” Le § 3.3.1)
processing the first contextual-word embedding in a first parallel path in a word-level path of the URL classifier, the first parallel path having a first filter length; (“A CNN model typically consists of multiple sets of filters with different lengths (h), and each set consists of multiple filters.” Le § 3.1. See Le Fig. 1)
processing the second contextual-word embedding in a second parallel path in a word-level path of the URL classifier, the second parallel path having a second filter length that is greater than the first filter length;  (“a filter W to convolve on every segment of length h” Le § 3.1)
generating a word-level output from the word-level path; (“The pooled features from the final block are concatenated and passed to fully connected layers for the purpose of classification” Le § 3.1. See also Figure 1, FC layers from Word-level and Char-level CNNs providing input to, “Concatenate Word and Char feature vector”)
generating a decision input vector using the word-level output; (Figure 1, “Concatenate Word and Char feature vector” making a input vector to the master FC and soft-max)
…
determining the decision statistic indicates the URL is malicious; and (“4 fully connected layers finally leading to the output classifier.” Le § 3.4. “we also compare model performance in terms of True Positive Rates at different levels of False Positive Rate to observe the detection rate of malicious URLs” see Le § 4.2 generally)
…

Le does not explicitly disclose:
computer storage media having computer-executable instructions embodied thereon that, when executed, by one or more processors, causes the one or more processors to perform a method for displaying a class indication for unstructured text in a URL, the method comprising:
generating a decision statistic using the decision input vector;
automatically blocking access to the URL in response to the decision statistic indicating the URL is malicious.

Baughman discloses:
computer storage media having computer-executable instructions embodied thereon that, when executed, by one or more processors, causes the one or more processors to perform (“Computer system/server 12 may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. …. program modules may be located in both local and remote computer system storage media including memory storage devices.” Baughman ¶ 51) a method for displaying a class indication for unstructured text in a URL, the method comprising: (“If the proxy receives a reputation score categorized or classified as suspicious, the URL is redirected to a warning page related to the browser. If the proxy receives a rf browser.” Baughman ¶ 69)
generating a decision statistic using the decision input vector; (“a reputation score may be calculated according to the assigned values for generating a response or answer to the trustworthy query.” Baughman ¶ 85.  See Baughman Figs. 11-13 showing confidence probabilities of the score.)
automatically blocking access to the URL in response to the decision statistic indicating the URL is malicious. (“If the proxy receives a reputation score categorized or classified as malicious, the URL is redirected to a denial page related to the browser.” Baughman ¶ 69)

A person of ordinary skill in the art before the effective filing date of the claimed invention would have combined Le with Baughman by using the hardware structure and informing or inhibiting a user’s access of a detected malicious website.  It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine Le with Baughman in order to inform and prevent user’s from accessing malicious websites that might compromise the user’s computer or data in a personal or business network, Baughman ¶ 2.

As to claim 8, Le discloses a method comprising:
…, the method comprising: 
receiving a URL; (“Our goal is to classify a given URL as malicious or not.” Le § 2.1)
forming a contextual-word embedding from the URL (“For M unique words, we have to learn an embedding matrix” Le § 3.3) by identifying a string of characters in the URL that match a word in a URL specific vocabulary; (“This set of unique words forms a dictionary for the URL training corpus.” Le § 3.3. “word embedding is obtained directly from the word embedding matrix (which is learnt during training)” Le § 3.3.3.  Dictionary used for training, the word embedding matrix, which is used in classification.)
processing the contextual-word embedding in a word-level path of a URL classifier; (“A CNN model typically consists of multiple sets of filters with different lengths (h), and each set consists of multiple filters.” Le § 3.1. See Le Fig. 1)
generating a word-level output from the word-level path; (“The pooled features from the final block are concatenated and passed to fully connected layers for the purpose of classification” Le § 3.1. See also Figure 1, FC layers from Word-level and Char-level CNNs providing input to, “Concatenate Word and Char feature vector”)
generating a decision input vector using the word-level output as an input; (Figure 1, “Concatenate Word and Char feature vector” making a input vector to the master FC and soft-max)
…
determining the … indicates a classification; and (“4 fully connected layers finally leading to the output classifier.” Le § 3.4. “we also compare model performance in terms of True Positive Rates at different levels of False Positive Rate to observe the detection rate of malicious URLs” see Le § 4.2 generally)
…

Le does not explicitly disclose:
displaying a class indication for unstructured text in a URL
generating a decision statistic using the decision input vector; decision statistic
automatically blocking access to the URL in response to the classification.

Baughman discloses:
displaying a class indication for unstructured text in a URL (“If the proxy receives a reputation score categorized or classified as suspicious, the URL is redirected to a warning page related to the browser. If the proxy receives a reputation score categorized or classified as anomalous, the URL is redirected to a warning page related to the browser. If the proxy receives a reputation score categorized or classified as malicious, the URL is redirected to a denial page related to the browser.” Baughman ¶ 69)
generating a decision statistic using the decision input vector; decision statistic (“a reputation score may be calculated according to the assigned values for generating a response or answer to the trustworthy query.” Baughman ¶ 85.  See Baughman Figs. 11-13 showing confidence probabilities of the score.)
automatically blocking access to the URL in response to the classification. (“If the proxy receives a reputation score categorized or classified as malicious, the URL is redirected to a denial page related to the browser.” Baughman ¶ 69)

A person of ordinary skill in the art before the effective filing date of the claimed invention would have combined Le with Baughman by using the hardware structure and informing or inhibiting a user’s access of a detected malicious website.  It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine Le with Baughman in order to inform and prevent user’s from accessing malicious websites in a personal or business network, Baughman ¶ 2.

As to claim 16, Le discloses a method comprising:
… the method comprising: 
receiving an unstructured text; (“Our goal is to classify a given URL as malicious or not.” Le § 2.1)
forming a first contextual-character embedding of a first amount of characters from the unstructured text; (“these representations are stored in an embedding matrix” Le § 3.2, discussing “Character-level CNN for Malicious URL Detection”)
forming a second contextual-character embedding of a second amount of characters from the unstructured text, wherein the first amount is less than the second amount; (“temporal patterns in a sequence of characters of lengths 3, 4, 5, 6 are learnt. For each filter size, we use 256 filters” Le § 3.2)
processing the first contextual-character embedding in a first parallel path within a character-level path of a classifier; (“Thus, temporal patterns in a sequence of characters of lengths 3, 4, 5, 6 are learnt. For each filter size, we use 256 filters. This is followed by a Max-Pooling layer which is followed by a fully connected layer regularized by dropout” Le § 3.2, see also Fig. 1)
processing the second contextual-character embedding in a second parallel path within the character-level path of the classifier; (“Thus, temporal patterns in a sequence of characters of lengths 3, 4, 5, 6 are learnt. For each filter size, we use 256 filters. This is followed by a Max-Pooling layer which is followed by a fully connected layer regularized by dropout” Le § 3.2, see also Fig. 1)
generating a first character-level output from the character-level path; (see Le Figure 1, the outputs from the Char-level CNN)
generating a decision input vector using the first character-level output as an input; (Figure 1, “Concatenate Word and Char feature vector” making a input vector to the master FC and soft-max)
…
determining the … indicates the unstructured text is classified as a security risk; and (“4 fully connected layers finally leading to the output classifier.” Le § 3.4. “we also compare model performance in terms of True Positive Rates at different levels of False Positive Rate to observe the detection rate of malicious URLs” see Le § 4.2 generally)
…

Le does not disclose
displaying a class indication for unstructured text,
generating a decision statistic using the decision input vector; decision statistic
automatically blocking access to content associated with the unstructured text in response to the classification.

Baughman discloses: 
displaying a class indication for unstructured text, (“If the proxy receives a reputation score categorized or classified as suspicious, the URL is redirected to a warning page related to the browser. If the proxy receives a reputation score categorized or classified as anomalous, the URL is redirected to a warning page related to the browser. If the proxy receives a reputation score categorized or classified as malicious, the URL is redirected to a denial page related to the browser.” Baughman ¶ 69)
generating a decision statistic using the decision input vector; decision statistic (“a reputation score may be calculated according to the assigned values for generating a response or answer to the trustworthy query.” Baughman ¶ 85.  See Baughman Figs. 11-13 showing confidence probabilities of the score.)
automatically blocking access to content associated with the unstructured text in response to the classification. (“If the proxy receives a reputation score categorized or classified as malicious, the URL is redirected to a denial page related to the browser.” Baughman ¶ 69)

A person of ordinary skill in the art before the effective filing date of the claimed invention would have combined Le with Baughman by using the hardware structure and informing or inhibiting a user’s access of a detected malicious website.  It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine Le with Baughman in order to inform and prevent user’s from accessing malicious websites in a personal or business network, Baughman ¶ 2.


As to claims 2 and 13, Le in view of Baughman discloses the CRM/method/method of claims 1, 8, and 16 and further discloses:
forming a contextual-character embedding from the URL; (“these representations are stored in an embedding matrix” Le § 3.2)
inputting the contextual-character embedding into a character-level path of the URL classifier, (“Character-level CNN for Malicious URL Detection” Le § 3.2) wherein an input layer of the character-level path comprises a plurality of parallel convolutional layers; and (“Thus, temporal patterns in a sequence of characters of lengths 3, 4, 5, 6 are learnt. For each filter size, we use 256 filters. This is followed by a Max-Pooling layer which is followed by a fully connected layer regularized by dropout” Le § 3.2, see also Fig. 1)
generating a character-level output from the character-level path. (see Le Figure 1, the outputs from the Char-level CNN)

As to claim 3, Le in view of Baughman discloses the CRM/method/method of claims 2 and further discloses:
wherein each of the plurality of parallel convolutional layers in the character-level path has an input filter of a different amount of characters. (“Thus, temporal patterns in a sequence of characters of lengths 3, 4, 5, 6 are learnt. For each filter size, we use 256 filters. This is followed by a Max-Pooling layer which is followed by a fully connected layer regularized by dropout” Le § 3.2, see also Fig. 1)

As to claims 4 and 10, Le in view of Baughman discloses the CRM/method/method of claims 1, 8, and 16 and further discloses:
wherein an input layer of the word-level path comprises a plurality of parallel convolutional layers. (“We use the same CNN architecture as in Character CNNs, i.e., we use 4 types of Convolutional filters W ∈ R k×h , with h = 3, 4, 5, 6 and for each filter size, we use 256 filters. Here, the aim is to learn temporal properties from a sequence of words of length 3, 4, 5, 6 appearing together.” Le § 3.3.1. see also Fig. 1 “Convolutional layer with max-pooling”)

As to claim 5, Le in view of Baughman discloses the CRM/method/method of claims 1, 8, and 16 and further discloses:
wherein the first word is identified by identifying a string of characters in the URL that match a word in a URL specific vocabulary. (“This set of unique words forms a dictionary for the URL training corpus.” Le § 3.3. “word embedding is obtained directly from the word embedding matrix (which is learnt during training)” Le § 3.3.3.  Dictionary used for training, the word embedding matrix, which is used in classification.)

As to claims 6 and 9, Le in view of Baughman discloses the CRM/method/method of claims 5 and 8 and further discloses:
wherein the URL specific vocabulary is generated by decomposing a corpus of URLs into n-grams (“the aim is to learn temporal properties from a sequence of words of length 3, 4, 5, 6 appearing together” Le § 3.3.1) and then adding n-grams that occur above a threshold number of times (a threshold of one: “all words that appeared only once in the entire training corpus (also called rare words) were replaced with a single token.” Le § 3.3.1.) within the corpus to the URL specific vocabulary. (“This set of unique words forms a dictionary for the URL training corpus.” Le § 3.3.)

As to claims 7 and 11, Le in view of Baughman discloses the CRM/method/method of claims 4, 10 and further discloses:
wherein each of the plurality of parallel convolutional layers in the word-level path has an input filter for words comprising a different amount of characters. (“We use the same CNN architecture as in Character CNNs, i.e., we use 4 types of Convolutional filters W ∈ R k×h , with h = 3, 4, 5, 6 and for each filter size, we use 256 filters. Here, the aim is to learn temporal properties from a sequence of words of length 3, 4, 5, 6 appearing together.” Le § 3.3.1.)

As to claims 12 and 20, Le in view of Baughman discloses the CRM/method/method of claims 8 and 16 and further discloses:
wherein the URL is the only input processed to generate the decision statistic. (“from the first 60% of the URLs, we randomly selected 5 million URLs for training, and from the last 40%, we randomly selected 10 million URLs for testing.” Le § 4.1.1. Only using URL information. See Applicant’s specification ¶ 20 describing other inputs as “context or metadata”.)

As to claims 14, Le in view of Baughman discloses the CRM/method/method of claims 13 and further discloses:
wherein an input layer of the character-level path comprises a plurality of parallel convolutional layers, and wherein each of the plurality of parallel convolutional layers in the character-level path has an input filter of a different amount of characters.[[.]]  (“Thus, temporal patterns in a sequence of characters of lengths 3, 4, 5, 6 are learnt. For each filter size, we use 256 filters. This is followed by a Max-Pooling layer which is followed by a fully connected layer regularized by dropout” Le § 3.2, see also Fig. 1)

As to claims 15, Le in view of Baughman discloses the CRM/method/method of claims 14 and further discloses:
wherein the contextual-word embedding represents a first word identified in the URL (“For M unique words, we have to learn an embedding matrix” Le § 3.3) and a context of the first word in the URL. (See Applicant’s ¶ 28 discussing ‘context’: “each URL ut is then mapped to a vector xt ∈ R M , such that i th element in xt is set as 1 if word wi is present in the URL, and 0 otherwise. In addition to these Bag-of-Words features” Le § 2.2. “Here, the aim is to learn temporal properties from a sequence of words of length 3, 4, 5, 6 appearing together.” Le § 3.3.1)

As to claims 17, Le in view of Baughman discloses the CRM/method/method of claims 16 and further discloses:
wherein an input layer of the character-level path comprises a plurality of parallel convolutional layers. (“Thus, temporal patterns in a sequence of characters of lengths 3, 4, 5, 6 are learnt. For each filter size, we use 256 filters. This is followed by a Max-Pooling layer which is followed by a fully connected layer regularized by dropout” Le § 3.2, see also Fig. 1)

As to claims 18, Le in view of Baughman discloses the CRM/method/method of claims 16 and further discloses:
forming a contextual-word embedding from the unstructured text; (“For M unique words, we have to learn an embedding matrix” Le § 3.3)
inputting the contextual-word embedding into a word-level path of the classifier; (“A CNN model typically consists of multiple sets of filters with different lengths (h), and each set consists of multiple filters.” Le § 3.1. See Le Fig. 1)
generating a word-level output from the word-level path; and (“The pooled features from the final block are concatenated and passed to fully connected layers for the purpose of classification” Le § 3.1. See also Figure 1, FC layers from Word-level and Char-level CNNs providing input to, “Concatenate Word and Char feature vector”)
wherein the word-level output is also used to form the decision input vector. (Figure 1, “Concatenate Word and Char feature vector” making a input vector to the master FC and soft-max)

As to claims 19, Le in view of Baughman discloses the CRM/method/method of claims 18 and further discloses:
wherein an input layer of the word-level path comprises a plurality of parallel convolutional layers, (“We use the same CNN architecture as in Character CNNs, i.e., we use 4 types of Convolutional filters W ∈ R k×h , with h = 3, 4, 5, 6 and for each filter size, we use 256 filters. Here, the aim is to learn temporal properties from a sequence of words of length 3, 4, 5, 6 appearing together.” Le § 3.3.1. see also Fig. 1 “Convolutional layer with max-pooling”) and wherein each of the plurality of parallel convolutional layers in the word-level path has an input filter for words comprising a different amount of characters. (“We use the same CNN architecture as in Character CNNs, i.e., we use 4 types of Convolutional filters W ∈ R k×h , with h = 3, 4, 5, 6 and for each filter size, we use 256 filters. Here, the aim is to learn temporal properties from a sequence of words of length 3, 4, 5, 6 appearing together.” Le § 3.3.1.)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  See PTO-892, particularly:
Fei et al., US 2020/0311519 discloses the use of skip-grams. 
Zou et al., US 2021/0218754, discloses detecting malicious URLs.
Rae et al., US 2020/0104677, discloses outputting the probability of a malicious URL. 
Huang et al., US 2020/0059451, discloses learning malicious URL patterns.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL W CHAO whose telephone number is (571)272-5165. The examiner can normally be reached M, W-F 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Saleh Najjar can be reached on (571) 272-4006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL W CHAO/Examiner, Art Unit 2492