DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

	This non-final action is responsive to the application filed on 4/15/20.
	Claims 1-20 are pending. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 17 and 18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. These claims  recite the limitation "the document" without previously reciting a document.  There is insufficient antecedent basis for this limitation in the claim.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1, 2, 4, 6, 11, 12-14, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over LaFever et al. (US 20170243028, Herein “LaFever”) in view of Morrison et al. (US 20170353423, Herein “Morrison”).
Regarding claim 1, LaFever teaches A method implemented by a computer system having one or more processors and memories (fig. 1; [0041]), comprising:
receiving a data stream of text data (within the data, perform de-identification of the data ([0233] to [0235]));
identifying, by a natural language processor, a piece of sensitive text information in the
received data stream using a token-majority-type-based classifier combined with at least two other classifiers, wherein the piece of sensitive text information comprises one or more information attributes (categorical classifications and other types of classification schemas ([0470] to [0476]); for instance, a token is identified either as a SSN or a credit card number or first name, for which other classifications may also be assigned ([0472] and [0473]); for instance, “Social Security Number” is identified in association with the three categories of the respective three columns in Fig. 1J; e.g., multiple classifications in association with social security number ([0465] to [0470]));
selecting a natural pseudonym by a pseudonymization processor, wherein the natural pseudonym has at least one information attribute same as the corresponding one or more information attributes of the piece of sensitive text information such that the natural pseudonym is difficult to distinguish from the sensitive text information in the data stream (replacement for referring to a given value, such as for protecting data by substituting for a matched word/token ([0180] and ([0181] to [0188]))); and
modifying the data stream by replacing the piece of sensitive text information with the natural pseudonym (de-identification based on anonymity such that substitution replaces an identifying data element using pseudonymous identifiers ([0035], [0043]; [0235] to [0242])).

However, while LaFever discloses providing anonymity for data elements contained in one or more databases [0648], LaFever fails to specifically teach data stream. 
Yet, in a related art, Morrison discloses a dialog stream for which privacy can be made by limiting access to individually identifying information regarding dialog participants [0012]. 
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the data stream of Morrison with the pseudonymization of LaFever to have data stream. The combination would allow for, according to the motivation of Morrison, advantageously limiting access to individually-identifying information regarding dialog participants particularly with respect to a dialog stream [0012], thus taking advantage of the pseudonymization offered by LaFever with an application toward known techniques of dialog for the purpose of preserving privacy of dialog participants [0012]; see also [0003] to [0010].  
Furthermore, Morrison teaches or makes abundantly clear:
identifying, by a natural language processor, a piece of sensitive text information in the received data stream using a token-majority-type-based classifier combined with at least two other classifiers, wherein the piece of sensitive text information comprises one or more information attributes (based on a majority type classification such as based on features for classifying content as personal information, such as for automatically replacing telephone numbers with a pseudonym [0247] and further classification such as a sub-reference (e.g., “You”) based on the respective viewer of the content, such as “Guest 3” pseudonym being further classified as “You” based on the guest user for whom the dialog stream is rendered [0253]).

Regarding claim 2, LaFever in view of Morrison teaches the limitations of claim 1, as above.
Furthermore, LaFever teaches The method of claim 1, wherein the one or more information attributes comprise at least one of a gender, an age, an ethnicity, an information type, a number of letters, a capitalization pattern, a geographic origin, and street address characteristics of a location (e.g., an information type such as a SSN or other type such as credit card number or first name) [0472]; gender [0383]).

Regarding claim 4, LaFever in view of Morrison teaches the limitations of claim 1, as above.
Furthermore, LaFever teaches The method of claim 1, wherein the piece of sensitive text information is a piece of personal identifiable information (personal trait for pseudonymization [0180]).

Regarding claim 6, LaFever in view of Morrison teaches the limitations of claim 1, as above.
Furthermore, LaFever teaches The method of claim 1, wherein the natural pseudonym enables a same downstream processing behavior as the sensitive text information (the replacement pseudonym [0188] for obscuring personal data [0192] such that the recipient downstream may still interpret the data at a general or pseudonymized level  [0544]).

Regarding claim 11, LaFever in view of Morrison teaches the limitations of claim 1, as above.
Furthermore, Morrison teaches the method of claim 1, wherein the pseudonymization processor comprises at least one of a name gender and original classifier, a personal name replacer, a street address replacer, a multi- word-phrase-based type classifier, a placename replacer, an institution name replacer, an identifying number replacer, an exceptional value replacer, rare context replacer, other sensitive data replacer, and a data shifter (classifying a name as a person and friend, such as by replacement with pseudonym “Friend n” ([0018] and [0240] to [0247]); for instance, determination of a type of word, such as a reference type (e.g., person name) [0240]).

Regarding claim 12, LaFever in view of Morrison teaches the limitations of claims 1 and 11, as above.
Furthermore, Morrison teaches The method of claim 11, wherein the pseudonymization processor is configured to generate a pseudonym table, wherein the pseudonym table comprises a mapping of the sensitive text information and the natural pseudonym (correlations table [0251]).

Regarding claim 13, LaFever in view of Morrison teaches the limitations of claim 1, as above.
Furthermore, Morrison teaches The method of claim 1, wherein identifying the piece of sensitive text information comprises: identifying a plurality of pieces of sensitive text information in the received data stream;  wherein selecting a natural pseudonym comprises selecting a plurality of natural pseudonyms by a pseudonymization processor, wherein each natural pseudonym corresponds to one of the plurality of pieces of sensitive text information on a one-by-one basis; and wherein modifying the data stream comprises replacing the plurality of pieces of sensitive text information with the plurality of natural pseudonyms (pseudonymization [0200] within dialog ([0012] to [0016])). 

Regarding claim 14, LaFever in view of Morrison teaches the limitations of claims 1 and 13, as above.
Furthermore, Morrison teaches The method of claim 13, wherein the plurality of natural pseudonyms are selected on the one or more information attributes (attribute such as recognition of nickname ([0228] and [0229])) such that an output of a data processing software processing the modified data stream is the same as an output of the data processing software processing the data stream that is not modified (similarity other than with respect to the pseudonymous content or displayed with the personally identifiable information ([0217] to [0227])). 

Regarding claim 19, LaFever teaches A system for pseudoanonymization of sensitive text information comprising: a computer system having one or more processors and memories (computer [0041]), the one or more processors configured to: 
The claim recites similar limitations as claim 1 – see above.


Claim(s) 3 and 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over LaFever in view of Morrison and further in view of Scotney et al. (US 10,936,751, Herein “Scotney”).
Regarding claim 3, LaFever in view of Morrison teaches the limitations of claim 1, as above.
However, LaFever in view of Morrison fails to specifically teach The method of claim 1, wherein the natural pseudonym has at least two information attributes same as corresponding information attributes of the piece of sensitive text information. 
Yet, in a related art, Scotney discloses matching based on a plurality of matching features, such as determining a match between replacement item (e.g., word) and replaced item based on conditions or rules, such as a length of data item, a type of characters within data item, and presence of special markers within data item (fig. 1, cols. 3-6). 
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the two information attributes same as corresponding attributes of Scotney with the text for pseudonymization identification of LaFever in view of Morrison to have wherein the natural pseudonym has at least two information attributes same as corresponding information attributes of the piece of sensitive text information. The combination would allow for, according to the motivation of Scotney, more effectively ensuring the privacy or other confidentiality of content by specifying (e.g., same attributes) the criteria by which pseudonyms or other anonymization techniques are applied to the correct, feature determined data, such as for more accurately determining specific data items (e.g., phone number, names, email) that are candidate subjects for restriction (col. 1, lines 13-30).  As such, the combination would make for more accurate determination of matches for the substitute term.  

Regarding claim 5, LaFever in view of Morrison teaches the limitations of claim 1, as above.
However, LaFever in view of Morrison fails to specifically teach The method of claim 1, wherein the natural pseudonym has a same number of letters as the piece of sensitive text information.
Yet, in a related art, Scotney discloses rule set of number of characters (col. 2).
It would have been obvious to one of ordinary skill in the art at the time of th einvention’s effective filing date to combine the number-based identification of Scotney with the text for pseudonymization identification of LaFever in view of Morrison to have wherein the natural pseudonym has a same number of letters as the piece of sensitive text information. The combination would allow for, according to the motivation of Scotney, more effectively ensuring the privacy or other confidentiality of content by specifying (e.g., same attributes) the criteria by which pseudonyms or other anonymization techniques are applied to the correct, feature determined data, such as for more accurately determining specific data items (e.g., phone number, names, email) that are candidate subjects for restriction (col. 1, lines 13-30).  As such, the combination would make for more accurate determination of matches for the substitute term.  


Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over LaFever in view of Morrison and further in view of Beinhauer et al. (US 20160306999, Herein “Beinhauer”). 
Regarding claim 7, LaFever in view of Morrison teaches the limitations of claim 1, as above.
However, LaFever in view of Morrison fails to specifically teach The method of claim 1, wherein the sensitive text information includes a first date range, and wherein the natural pseudonym has a second date range having a same duration as the first  date range.
	Yet, in a relate dart, Beinhauer discloses maintaining date range within pseudonymization [0027].
	It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the date range for pseudonymication of Beinhauer with the pseudonymization of LaFever in view of Morrison to have wherein the sensitive text information includes a first date range, and wherein the natural pseudonym has a second date range having a same duration as the first date range. The combination would allow for, according to the motivation of Beinhauer, anonymizing certain confidential information while still maintaining temporal duration information, which would have been obvious particularly in light of the ability to maintain anonymity while using the duration-retaining ability of LaFever in view of Morrison to still be able to know the temporal duration information associated with the respective pseudonymization of Beinhauer [0027]. 
-

Claim(s) 8 and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over LaFever in view of Morrison in view of Brun (US 20070038437).
Regarding claim 8, LaFever in view of Morrison teaches the limitations of claim 1, as above.
However, LaFever in view of Morrison fails to specifically teach The method of claim 1, wherein a first classifier is a term-morphological-analysis classifier, and a second classifier is a term-context-based classifier.
Yet, in a related art, Brun discloses identification of named entities further categorized as persons, dates, places, personal identification numbers, etc.  and further identified as anonymous or public ([0013] and [0014]); further performed according to another classifer performing classification based on local lexical information, even further based on syntactical information [0016]. 
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the plurality classification of Brun with the classification for pseudonymization of LaFever in view of Morrison to have wherein a first classifier is a term-morphological-analysis classifier, and a second classifier is a term-context-based classifier. The combination would allow for, according to the motivation of Brun, accurately classifying the text in a more accurate manner than LaFever in view of Morrison alone by taking into account additional contextual and the form of the parsed text for identifying additional evidence indicating the classification of the content for possible pseudonymization, particularly with respect to whether the content should specifically be made as private rather than public or non-pseudonymized [0016]. 

Regarding claim 10, LaFever in view of Morrison in view of Brun teaches the limitations of claims 1 and 8, as above.
Furthermore, LaFever teaches The method of claim 8, wherein the term-context-based classifier is selected from the group consisting of a multi-word-phrase-based type classifier, a token/type-ngram-context based classifier, a glue-patterns-in-context-based classifier, a document-region-based type classifier or a type-specific rule-based type classifier, and combinations thereof (types of data such as direct identifiers [0031]).

Furthermore, Morrison discloses types or rules such as replacing phone number of senders and recipients with indicia [0200].  


Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over LaFever in view of Morrison in view of Brun and in view of Salloum et al. (US 20190043486, Herein “Salloum”).
Regarding claim 9, LaFever in view of Morrison in view of Brun teaches the limitations of claims 1 and 8, as above.
However, LaFever in view of Morrison in view of Brun fails to specifically teach The method of claim 8, wherein the term-morphological-analysis classifier is a prefix-suffix-based type classifier, a subword-compound-based type classifier, or combinations thereof.
Yet, in a related art, Salloum discloses prefix-suffix classification ([0034]to [0039]).
It would have been obious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the prefix-suffic classifier of Salloum with the word classification of LaFever in view of Morrison in view of Brun to have wherein the term-morphological-analysis classifier is a prefix-suffix-based type classifier, a subword-compound-based type classifier, or combinations thereof. The combination would allow for, according to the motivation of Salloum, mproving word classification based on word characteristics such as prefix and suffix to be able to better determine certain words [0034] which would normally be performed by a human skilled understanding of grammatical characteristics of words [0002], particularly since such incorporation of prefixes and suffixes better capture the semantic and morpho-syntactic information of certain words [0036]. 


Claim(s) 15 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over LaFever in view of Morrison and in view of Dongre et al. (US 20190080115, Herein “Dongre”).
Regarding claim 15, LaFever in view of Morrison teaches the limitations of claim 1, as above.
Furthermore, Morrison teaches The method of claim 1, wherein receiving the data stream of text data comprises text data from a document, further comprising tokenizing the text data to form a plurality of tokens (receipt of dialog messages [0229]; abstract – identifying and replacing various content of streaming dialog/conversation).

However, in an effort to advance prosecution, Dongre makes abundantly clear document, as follows: messages as documents, each message/document transferred as a message file such as with respect to anonymization [0024]; further, receive anonymization metadata and corresponding document ([0002] and [0024]) such that the codes and evidence information (e.g., anonymization metadata) are used in rendering the document as anonymized or pseudonymized at the recipient device ([0024] and [0033]); coding output such that the appropriate content is anonymized, such as received anonymized document received from the anonymizer server [0048], further allowing for tokenization to be performed as a method to shield sensitive content from unwanted recipients ([0019] and [0020]), particularly with respect to message documents.  
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the coding of Dongre with the pseudonymization of LaFever in view of Morrison to have document. The combination would lalow for, according to the motivation of Dongre, preparing the anonymization content such as using a content key, therefore allowing to not only preserve format, but also for using the data associated with the data stream to appropriate.y process with pseudonymization ([0048] to [0050]). 


Regarding claim 20, LaFever teaches the limitations of claim 19, as above.
Furthermore, LaFever teaches The system of claim 19, further comprising:
a downstream processor, communicatively coupled to the computer system, and having one or more processors and memories (networked system for receiving from external database such as for performing de-anonymizing [0116]), wherein the downstream processor is configured to:

However, LaFever in view of Morrison fails to specifically teach receive, via a network, a pseudonymized document from the computer system,
generate a coding output that includes codes and evidence information from the pseudonymized document. 
Yet, in a related art, Dongre discloses receive anonymization metadata and corresponding document ([0002] and [0024]) such that the codes and evidence information (e.g., anonymization metadata) are used in rendering the document as anonymized or pseudonymized at the recipient device ([0024] and [0033]); coding output such that the appropriate content is anonymized, such as received anonymized document received from the anonymizer server [0048]. 
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the coding of Dongre with the pseudonymization of LaFever in view of Morrison to have receive, via a network, a pseudonymized document from the computer system,
generate a coding output that includes codes and evidence information from the pseudonymized document. The combination would lalow for, according to the motivation of Dongre, preparing the anonymization content such as using a content key, therefore allowing to not only preserve format, but also for using the data associated with the data stream to approprialey process with pseudonymization ([0048] to [0050]). 


Claim(s) 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over LaFever in view of Morrison  in view of Dongre and in view of Berteau (US 20090285474) in view of Ronnewinkel (US 20050228774).
Regarding claim 16, LaFever in view of Morrison in view of Dongre teaches the limitations of claims 1 and 15, as above.
However, LaFever in view of Morrison in view of Dongre fails to specifically teach The method of claim 15, wherein identifying the piece of sensitive text information comprises: determining, using the token-majority-type-based classifier, a baseline probability of each token from the plurality of tokens; determining, using the at least two other classifiers, type probabilities of each token being the piece of sensitive text information; compiling a weighted combination of numeric scores based on the relative efficacy of the token-majority-type-based classifier using the baseline probability and at least one other classifier using the type probability; identifying, by the natural language processor, a piece of sensitive text information in the received data stream based on the weighted combination of numeric scores. 
Yet, in a related art, Berteau discloses classification of textual tokens and corresponding textual context into respective categories based on a probability and respective category threshold (abstract), such that a probability is used for determining categorization using tokens [0004]. 
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the probability determination of classification of Berteau with the pseudonymization of LaFever in view of Morrison in view of Dongre to have The method of claim 15, wherein identifying the piece of sensitive text information comprises: determining, using the token-majority-type-based classifier, a baseline probability of each token from the plurality of tokens; determining, using the at least two other classifiers, type probabilities of each token being the piece of sensitive text information; compiling a weighted combination of numeric scores based on the relative efficacy of the token-majority-type-based classifier using the baseline probability and at least one other classifier using the type probability; identifying, by the natural language processor, a piece of sensitive text information in the received data stream based on the weighted combination of numeric scores. The combination would allow for, according to the motivation of Berteau, enhancing text classification using probabilistic techniques for overcoming the problem of analyzing text and categorizing it into one of a plurality of categories, thus improving the accuracy of the text classification [0002]. 

However, LaFever in view of Morrison in view of Dongre in view of Berteau fails to specifically teach at least two other classifiers, type probabilities of each token being the piece of sensitive text information; compiling a weighted combination of numeric scores based on the relative efficacy of the token-majority-type-based classifier using the baseline probability and at least one other classifier using the type probability; identifying, by the natural language processor, a piece of sensitive text information in the received data stream based on the weighted combination of numeric scores. 
Yet, in a related art, Ronnewinkel discloses class weight measure for probability determination and classification of text for classification [0148]; potential classes [0143].
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the class weight measure for classification of potential classes of Ronnewinkel with the probability classification of LaFever in view of Morrison in view of Dongre in view of Berteau to have at least two other classifiers, type probabilities of each token being the piece of sensitive text information; compiling a weighted combination of numeric scores based on the relative efficacy of the token-majority-type-based classifier using the baseline probability and at least one other classifier using the type probability; identifying, by the natural language processor, a piece of sensitive text information in the received data stream based on the weighted combination of numeric scores. The combination would allow for, according to the motivation of Ronnewinkel, increasing the capability to classify data efficiently, particularly with respect to the need for classifying natural language text, such as based on a probability analysis with respect to multiple classes; see also ([0002] and [0003]) thus allowing for the analysis of multiple classes in addition to the classes of LaFever in view of Morrison in view of Berteau, but further based on the ability to perform probability analysis across multiple candidate classes, such as by way of class weighting, thus improving accuracy of the class determination [0145]. 


Claim(s) 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over LaFever in view of Morrison and further in view of Wong et al. (US 9,426,102, Herein “Wong”). 
Regarding claim 17, LaFever in view of Morrison teaches the limitations of claim 1, as above.
However, LaFever in view of Morrison fails to specifically teach The method of claim 1, wherein the natural pseudonym preserves byte offset relative to a beginning of the document.
Yet, in a related art, Wong discloses text substitution with performance by measuring locations relative to bytes of data in the message or data stream, thus allowing for relocating the substituted text at the same location as the previously identified text for substitution, thus enhancing the user experience of reading replacement text within a stream of text of a message or document, such as quoted text within a document/message (col. 4, lines 52-67; col. 5; see also figs. 2 and 3). 
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the byte offset for replacement text of Wong with the replacement text of LaFever in view of Morrison to have The method of claim 1, wherein the natural pseudonym preserves byte offset relative to a beginning of the document. The combination would allow for, according to the motivation of Wong, in addition to the reason described above, further allowing for more effectively conveying streams of dialog or conversation by organizing substitution text in a manner that is convenient and similar to the original message thread or conversation, thus allowing for the replacement text to conveniently match the replaced text in layout (col. 1 and 2). 


Claim(s) 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over LaFever in view of Morrison and in view of Fedtke (US 20090282036).
Regarding claim 18, LaFever in view of Morrison teaches the limitations of claim 1, as above.
Furthermore, Morrison teaches The method of claim 1, wherein modifying the document by replacing the piece of sensitive text information with the natural pseudonym forms a pseudonymized document (replacement using pseudonymization of dialog (abstract; [0200])); and the method further comprises.

However, LaFever in view of Morrison fails to specifically teach transmitting the pseudonymized document to a downstream processor, wherein the downstream processor comprises a coding application configured to generate a coding output that includes codes and evidence information. 
Yet, in a related art, Fedtke discloses document [0071] based on coded information according to metadata [0047].
It  would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the document of Fedtke with the pseudonymization of LaFever in view of Morrison to have transmitting the pseudonymized document to a downstream processor, wherein the downstream processor comprises a coding application configured to generate a coding output that includes codes and evidence information. The combination would allow for, according to the motivation of Fedtke, sharing content particularly with respect to documents or files; accordingly, the techniques as described above may be utilized for sharing certain file or document content with other users while hiding certain sensitive information of the document [0071], further allowing for certain coded data to include certain technical details of the pseudonymization process, which may advantagesouly be utilized for processing the anonymization/pseudonymization information of the original data stream [0047].

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASON EDWARDS whose telephone number is (571) 272-5334. The examiner can normally be reached on Mon-Fri; 8am-5pm EST.
	If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Scott Baderman can be reached on 571-272-3644. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
	Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance form a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA or CANADA) or 571-272-1000.

	/JASON T EDWARDS/              Examiner, Art Unit 2144