DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-20 remain pending and are ready for examination.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
 

Claims 4, 8, 13 and 17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
	
Regarding claims 4 and 17, the claims recite " wherein the comparison between the first set of attributes and the second set of attributes is based at least in part on a 
 Regarding claim 8, the claim recites "randomly sampling from the set of modified attributes according to a distribution of occurrence of the set of modified attributes in the first profile". Claim is rejected for being indefinite because the claim is unclear since the modified profiles are generated only from a second profile from a second corpus of a plurality of second user data as recited in claim 1. 
Regarding claim 13, the claim recites "training a machine learning model on the selected modified profile". Claim is rejected for being indefinite because the claim is unclear on how the system/method would training a machine learning model on the selected modified profile. Examiner note: Further clarification is required.  


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-6, 8-9, 11-13 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Step 2A Prong One: the independent claims determining a mathematical distance between the first profile and each modified profile of the set of modified profiles based at least in part on a comparison between the first set of attributes and the second set of attributes; and selecting a modified profile having a smallest determined mathematical distance. Thus, the claim recites a mathematical concept abstract idea.

Step 2A Prong Two: the claim recites the combination of the additional element memory and processor to conduct the determining a mathematical distance between the first profile and each modified profile of the set of modified profiles based at least in part on a comparison between the first set of attributes and the second set of attributes; and selecting a modified profile having a smallest determined mathematical distance. Each of the additional limitations is no more than mere instructions to apply the exception using a generic computing device. The combination of these additional elements is no more than mere instructions to apply the exception using a generic computing device, accordingly, even in combination, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to the abstract idea.
Step 2B: This part of the eligibility analysis evaluates whether the claim as a whole amounts to significantly more than the recited exception, i.e., whether any 

Claims 2-6, 8-9, 11-13 and 15-19 which merely expands on the abstract concept. Accordingly, claims 2-6, 8-9, 11-13 and 15-19 are directed to the same abstract idea without significantly more.

Claim Rejections - 35 USC § 103


The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: 
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-3, 5-6, 8, 12-16 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Bach et al., U.S. Pub No: US 20110202567 A1 (Hereinafter “Bach”) in view of Bull et al., U.S. Pub No: US 20170177809 A1 (Hereinafter “Bull”).

Regarding claim 1, Bach discloses A method for dataset modifying, comprising: 
generating a first profile from a first corpus of a plurality of first user data (see paragraph [0008, 0010-0013], wherein using the first collection profile or the second collection profile within a matching operation; and transmitting information based on the matching operation or a collection profile or receiving a message based on a matching operation or the collection profile); 
generating a set of profiles from a second profile from a second corpus of a plurality of second user data (see paragraph [0008, 0010-0013], wherein using the first collection profile or the second collection profile within a matching operation; and transmitting information based on the matching operation or a collection profile or receiving a message based on a matching operation or the collection profile),wherein the first profile and the set of profiles comprises respective sets of first and second attributes corresponding to one or both of text or metadata associated with the respective plurality of first and second user data (see paragraph [0008, 0010-0013, 0052], wherein the generation of the output collection profile/user DNA is conducted in an automatic way. To this end, an extraction of suitable DNA tags (metadata) from all songs in the collection is performed. Then, the tags are analyzed for similar features and outliers are excluded. Then, the high-level tags are weighted based ; 
determining a mathematical distance between the first profile and each profile of the set of profiles based at least in part on a comparison between the first set of attributes and the second set of attributes (see paragraph [0008, 0010-0013, 0059], wherein a matching or non -matching decision can be taken when the distance measure D, which is exemplarily calculated at 110, is smaller than a predefined distance. This predefined distance can be set by the user and determines the number of matching hits for a certain search. Additionally, one can also determine a match between the music DNA and a media data item having a feature vector which results in the smallest distance D among all other feature vectors in the set); and 
selecting a profile having a smallest determined mathematical distance (see paragraph [0008, 0010-0013, 0059], wherein a matching or non -matching decision can be taken when the distance measure D, which is exemplarily calculated at 110, is smaller than a predefined distance. This predefined distance can be set by the user and determines the number of matching hits for a certain search. Additionally, one can also determine a match between the music DNA and a media data item having a feature vector which results in the smallest distance D among all other feature vectors in the set).
Although Bach teaches multiple profiles can be generated, Bach fails to explicitly disclose generating a set of modified profiles from a second profile from a second corpus of a plurality of second user data;
determining a mathematical distance between the first profile and each modified profile of the set of modified profiles based at least in part on a comparison between the first set of attributes and the second set of attributes.
Bull discloses generating a set of modified profiles from a second profile from a second corpus of a plurality of second user data (see paragraph [0534, 0597], wherein the set of similar administrator profiles on which the performance metric calculations are based, may change, and thus the value of the similar administrator field 1208 may also change depending on how many similar administrator profiles are identified with respect to the updated parameters. For example, when the account opening fee is increased to $100, some administrator profiles may no longer be sufficiently similar to the administrator profile having the changed parameter, and those dissimilar profiles may no longer be used for purposes of the dynamic report. On the other hand, other previously dissimilar administrator profiles may become sufficiently similar due to the updated parameter, and those newly similar administrator profiles may be used in the dynamic report);
determining a mathematical distance between the first profile and each modified profile of the set of modified profiles based at least in part on a comparison between the first set of attributes and the second set of attributes (see paragraph [0534, 0597, 0601], wherein the clustering technique can include generating vectors using multi-dimensional features associated with each profile, and determining a distance between vectors to identify a set of vectors within a threshold distance from one another. The identified set of vectors within the threshold distance from one another can form a cluster).

 
Regarding claim 2, the combination of Bach and Bull further disclose wherein the first profile and the set of modified profiles comprise a distribution of occurrences of the respective first set of attributes and the second set of attributes (see Bach paragraph [0070], wherein the determination of the confidence threshold can be performed based on different criteria such as a mix between a distance increase between two adjacent media items and the number of media items having a distance below the distance, in which the distance increase occurs. See also Bull paragraph [0189], wherein The performance metrics can include various resulting characteristics and attributes associated with an electronic benefits account, such as, but not limited to, percentage or number of participants or customers of an electronic benefits account associated with a given administrator profile (e.g., during a defined time interval), amount of money contributed to an electronic benefits account associated with a given administrator profile, number of geographic regions in which their customers are located, demographics data associated with customers, number of transactions associated with tax benefit accounts, size of the transactions, frequency of transactions, frequency of funding the tax benefit account, size of contributions, or statistics based on these parameters, performance metrics, or attributes. Values of the 

Regarding claim 3, the combination of Bach and Bull further disclose wherein the respective first set of attributes and second set of attributes comprises a sequence of state transitions, a distribution of lengths of a plurality of text communications, a frequency of an occurrence of one or more text combinations or metadata, or a combination thereof (see Bach paragraph [0070], wherein the determination of the confidence threshold can be performed based on different criteria such as a mix between a distance increase between two adjacent media items and the number of media items having a distance below the distance, in which the distance increase occurs. See also Bull paragraph [0189], wherein The performance metrics can include various resulting characteristics and attributes associated with an electronic benefits account, such as, but not limited to, percentage or number of participants or customers of an electronic benefits account associated with a given administrator profile (e.g., during a defined time interval), amount of money contributed to an electronic benefits account associated with a given administrator profile, number of geographic regions in which their customers are located, demographics data associated with customers, number of transactions associated with tax benefit accounts, size of the transactions, frequency of transactions, frequency of funding the tax benefit account, size of contributions, or statistics based on these parameters, performance metrics, or attributes. Values of the performance metrics may vary dependent on the parameters associated with the electronic benefits accounts).

Regarding claim 5, the combination of Bach and Bull further disclose modifying the second corpus to generate the plurality of modified profiles (see Bull paragraph [0534, 0597], wherein the set of similar administrator profiles on which the performance metric calculations are based, may change, and thus the value of the similar administrator field 1208 may also change depending on how many similar administrator profiles are identified with respect to the updated parameters. For example, when the account opening fee is increased to $100, some administrator profiles may no longer be sufficiently similar to the administrator profile having the changed parameter, and those dissimilar profiles may no longer be used for purposes of the dynamic report. On the other hand, other previously dissimilar administrator profiles may become sufficiently similar due to the updated parameter, and those newly similar administrator profiles may be used in the dynamic report).

Regarding claim 6, the combination of Bach and Bull further disclose wherein modifying the second corpus comprises one or more of upsampling of the plurality of second user data of the second  corpus, downsampling of the plurality of second user data of the second corpus, dropping elements from the second plurality of second user data of the second corpus (see Bull paragraph [0082], wherein use the filter criterion to identify a subset of the one or more administrator profiles stored in the administrator profile data structure, generate a third value of the first performance metric based on the subset of the one or more administrator profiles, render, via the dynamic report interface, the electronic report to , removing elements from the plurality of second user data of the second corpus, generating a vocabulary from the second plurality of second user data of the second corpus, generating one or more models from the second plurality of second user data of the second corpus, or a combination thereof (see Bach paragraph [0008-0010], wherein a feature extractor for extracting at least two different features describing a content of a media data item from a plurality of media data items of the collection; and a profile creator for creating the collection profile by combining the extracted features or weighted extracted features for the plurality of media data items so that the collection profile represents a quantitative fingerprint of a content of the collection, wherein the apparatus further has an input for receiving information on a music taste of a user of the collection of different audio files, and wherein the profile creator is operative to create a raw collection profile without information on a user behavior logged by the profile creator or information on a music taste, and to weight the raw collection profile using weights derived from the information on the music taste or the user behavior to obtain the collection profile).

Regarding claim 8, the combination of Bach and Bull further disclose generating a set of modified attributes for a remaining subset of attribute patterns by modifying at least one attribute of the plurality of second attributes (Bull, see ; and 
randomly sampling from the set of modified attributes according to a distribution of occurrence of the set of modified attributes in the first profile (Bull, see paragraph [0501], wherein the matching technique can include a correlation technique, which identifies a statistical relationship between two random variables or two sets of data. In some embodiments, measures of correlation to infer a presence or absence of association in a sample of data include one or more of an odds ratio, a risk ratio, an absolute risk reduction, distance correlation, tetrachroic correlation coefficient, mutual information, and the like).

Regarding claim 12, the combination of Bach and Bull further disclose the plurality of first user data and the plurality of second user data comprise: one or more of data logs, customer relationship management data, contact data, customer data, emails, calendar events, service tickets, short message service (SMS) text messages, voice calls, social media messages, or a combination thereof (Bach, see paragraph [0008-0010]).

Regarding claim 13, the combination of Bach and Bull further disclose training a machine learning model on the selected modified profile (Bull, see paragraph [0078]).

Claims 14-16 and 18-19 are apparatus claims and rejected under the same rationale as claims 1-3 and 5-6.

Claim 20 is a non-transitory computer-readable medium claim and rejected under the same rationale as claims 1 and/or 14.

Claims 4, 7 and 17 rejected under 35 U.S.C. 103 as being unpatentable over Bach in view of Bull and further in view of Rogynskyy et al., U.S. Pub No: US 20190361861 A1 (Hereinafter “Rogynskyy”).

Regarding claim 4, the combination of Bach and Bull disclose all the features with respect to claim 1, as outline above. The combination of Bach and Bull fail to explicitly disclose wherein the comparison between the first set of attributes and the second set of attributes is based at least in part on a probability weighted difference between occurrences of the first set of attributes in the first profile and occurrences of the second set of attributes in each modified profile of the set of modified profiles. 
Rogynskyy discloses wherein the comparison between the first set of attributes and the second set of attributes is based at least in part on a probability weighted difference between occurrences of the first set of attributes in the first profile and occurrences of the second set of attributes in each modified profile of the set of modified profiles (see paragraph [0003, 0008-0009, 0191]).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the system of Bach and Bull to include a probability weighted difference between occurrences of the first set of attributes in the first profile and occurrences of the second set of attributes in each modified profile of the set of modified profiles, as taught by Rogynskyy, since doing so would improve the system to be more efficient and less error prone (Rogynskyy; paragraphs [0002]).
 
Regarding claim 7, the combination of Bach and Bull disclose all the features with respect to claim 1, as outline above. The combination of Bach and Bull fail to explicitly disclose identifying a subset of attribute patterns from the plurality of second user data having matching attribute patterns in the first profile; and retaining the subset of attribute patterns unchanged in the set of modified profiles based at least in part on identifying the subset of attribute patterns. 
Rogynskyy discloses identifying a subset of attribute patterns from the plurality of second user data having matching attribute patterns in the first profile ; and 
retaining the subset of attribute patterns unchanged in the set of modified profiles based at least in part on identifying the subset of attribute patterns (see paragraph [0108], wherein the node graph generation system 200 can be configured to classify phone numbers as a general company number or a direct office number by performing regex patterns to determine if an "ext." or an "x" followed by some numbers is included in the value. The regex can also be configured to identify phone number prefixes, such as "800." The system can identify the phone numbers as the publicly known phone number of the company. In some embodiments, the node graph generation system 200 can be configured to restrict or otherwise prevent a phone number determined to be a general company number from being inserted as a value of a personal number. In some embodiments, the node graph generation system 200 can be configured to determine the value of phone numbers of other nodes corresponding to the same company and if the system determines that the number to be added to a node matches the number of multiple other nodes belonging to the same entity or company, the system can probabilistically determine, for instance, that the number is a work number and update the number as a value in the work number field (instead of a personal number field). Similar techniques can be applied for determining or inferring other information by comparing the data of a node profile to patterns observed from a plurality of related node profiles. In some embodiments, the system can determine 
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the system of Bach and Bull to include a probability weighted difference between occurrences of the first set of attributes in the first profile and occurrences of the second set of attributes in each modified profile of the set of modified profiles, as taught by Rogynskyy, since doing so would improve the system to be more efficient and less error prone (Rogynskyy; paragraphs [0002]).

Claim 17 is an apparatus claim and rejected under the same rationale as claim 4.


Claims 9-10 rejected under 35 U.S.C. 103 as being unpatentable over Bach in view of Bull and further in view of Beach et al., U.S. Patent No: US 10395640 B1 (Hereinafter “Beach”).

Regarding claim 9, the combination of Bach and Bull disclose all the features with respect to claim 1, as outline above. The combination of Bach and Bull further discloses identifying a group of text from each first user data of the plurality of first user data (see Bach paragraph [0008-0010], wherein a feature extractor for extracting at least two different features describing a content of a media data item from a plurality of media data items of the collection; and a profile creator for creating the collection profile by combining the extracted features or weighted extracted features for the plurality of media data items so that the collection profile represents a quantitative fingerprint of a content of the collection, wherein the apparatus further has an input for receiving information on a music taste of a user of the collection of different audio files, and wherein the profile creator is operative to create a raw collection profile without information on a user behavior logged by the profile creator or information on a music taste, and to weight the raw collection profile using weights derived from the information on the music taste or the user behavior to obtain the collection profile).
The combination of Bach and Bull fail to explicitly disclose generating a set of sequences of state transitions associated with the groups of text. 
Beach discloses generating a set of sequences of state transitions associated with the groups of text (see col.1 line [46-65]).


Regarding claim 10, the combination of Bach, Bull and Beach further disclose generating a set of collapsed sequences by combining duplicate state transitions from each sequence of the set of sequences (see Beach col.8 line [48-65]).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Bach in view of Bull and further in view of Shan et al., U.S. Pub No: US 20060116920 A1 (Hereinafter “Shan”).

Regarding claim 11, the combination of Bach and Bull fail to explicitly disclose wherein the mathematical distance comprises a Manhattan distance, a Euclidean distance, a harmonic mean of a minimum distance, or a combination thereof.
Shan discloses wherein the mathematical distance comprises a Manhattan distance, a Euclidean distance, a harmonic mean of a minimum distance, or a combination thereof (Shan, see paragraph [0039, 0064], wherein the threshold for a small distance threshold is a Euclidean distance measurement of less than 20% of the 
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the system of Bach and Bull to include Euclidean distance, as taught by Shan, improve data molding for forecasting in the business environment (Shan; paragraphs [0004]).


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAHER N ALGIBHAH whose telephone number is (571)272-0718.  The examiner can normally be reached on Monday-Thursday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aleksandr Kerzhner can be reached on (571) 270-1760.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-1264.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  






/MAHER N ALGIBHAH/Examiner, Art Unit 2165
/TAREK CHBOUKI/Primary Examiner, Art Unit 2165