DETAILED ACTION
This action is in response to the RCE (Request for continued examination) filed on February 26, 2021.

Notice of AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .  In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
Continued Examination under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on February 26, 2021 has been entered.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 02/26/2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Response to Arguments
2.	Applicants argued in the January 15 amendments, Pg. 1-3 that: “Regarding claim 1, Applicant argues that Gould, Anderson and Mao, each taken individually or in any combination, are not understood to describe or to render obvious at least the foregoing features of claim 1. However, the Office Action alleges that Anderson teaches Applicant's claimed features of “filtering ...bits representing values in the one or more second fields in the records in the second data set through the filter mask generated from 
Applicant further argues that “nowhere does Anderson, taken in any combination with Gould and Mao,
describe or render obvious “when one or more values in the one or more second fields are included in
the set of values represented in the filter mask, indicating a potential key relationship between the
potential key field of the first data set and the one more second fields in the second data set,”1 as
required by Applicant’s claim. In particular, Anderson, taken in any combination with Gould and Mao,
does not describe or render obvious that its mask code’s inclusion or exclusion of fields or field values “
indicates a potential key relationship between the potential key field of the first data set and the one
more second fields in the second data set,”2 as required by Applicant’s claim. In fact, Anderson’s mask
code inclusion or exclusion of fields or field values provides no indications of relationships among
datasets, let alone a potential key relationship between the potential key field of the first data set and the
one more second fields in the second data set. Mao and Gould, taken in any combination with Anderson,
fail to remedy these deficiencies in Anderson.”
	The Office respectfully disagrees. According to Applicant's specification, the filter mask can be, for example, a Bloom filter. A conventional Bloom filter includes a number of bits and one or more hash functions. Each hash function is used to map an input value into a subset of the bits in 25 the filter key. Filter keys are combined to create a filter mask. A Bloom filter can be created by creating filter keys using a single hash function. Filter masks can be created for a field that is identified as a potential key. The filter 
	The Gould reference teaches identifying subsets which have a functional relationship among fields, between a first and second fields based on quantity, for example, census data. It also includes a degree of matching to the functional relationship. It further includes filtering the subsets based on values in the fields of those subsets. In addition, Mao teaches the use of a Bloom filter vector with each identity represented by bits. A comparison is performed to determine whether there is a potential match. This probabilistic data structure can identify false positive matches and test whether it belongs to a member set. False positives are possible, however false negatives are not. A Bloom filter vector will not produce false negatives; however, it may produce false positives. A first peer may provide one or more group identities in a concealed format which the second peer may use to determine whether access to a data object, keys, etc. may be granted to the first peer. The Anderson reference also teaches a mask code which acts as a filter and can include or exclude fields or field values. Figures 3-5 are also examples of filtering and inclusion/exclusion of field values of a transaction record. FIG. 4 show's a user interface that includes the list of fields, the bit labels, and buttons to set the mask.    FIG. 5 shows when the mask is set selectively and there are more than two data pattern codes for each field.
	Regarding independent claims 7 and 13, Applicant has not overcome the rejections. See arguments regarding same subject matter above.
	Regarding the dependent claims 2-6, 8-12 and 14-18, Applicant has not overcome the rejections and they remain similarly rejected.
	Applicant is further reminded that the Examiner cites particular paragraphs and line numbers in the references as applied to the claims for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in its entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner. 
Claim Rejections - 35 USC § 103
3.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

4.	Claims 19-21 are cancelled. Claims 1 – 18 are rejected under 35 U.S.C. 103 as being unpatentable over Gould (US 2005/0114369 A1) in view of Anderson et al. (US 2012/0197887 A1) and further in view of Mao (US 2013/0031367 A1.)
Regarding claim 1, Gould discloses “A method including: receiving by a data processing system a first field in a first data set, that included a plurality of records; identifying a set of one or more first fields associated with the records as a potential key field of the first data set;” (See Fig. 9 & Fig. 11A) (See also Page 1, [0008], [0016] & [0021]) (A method and corresponding software and a system for processing data with information characterizing values of a first field in records of a first data source.  The information characterizing the values of the first field includes information characterizing a distribution of values of that field. The data can include records with a variable record structure such as conditional fields and/or variable numbers of fields. The information characterizing the distribution of values of the first field can include multiple data records, each associating a different value and a corresponding number of occurrences of that value in the first field in the first data source.)

“receiving a second data set, that includes a plurality of records that have one or more second fields” (See Fig. 11B) (See also Page 1, [0016] & [0021]) (Information characterizing values of a second field includes multiple records of a second data source. The information characterizing the values of the second field includes information characterizing a distribution of values of that field. )

But, Gould does not explicitly disclose “identifying one or more filter keys that represent one or more values in the set of the one or more first fields: storing, in memory, a filter mask that stores a set of bits that are generated from the one or more filter keys that represent the one or more values in the set of one or more first fields”

However, Mao teaches “identifying one or more filter keys that represent one or more values in the set of the one or more first fields: storing, in memory, a filter mask that stores a set of bits that are generated from the one or more filter keys that represent the one or more values in the set of one or more first fields” (See Fig. 3, Fig. 9, Fig. 10 and [0030]-[0031], [0041]-[0043]) (FIG. 3 is one example of generating a Bloom filter vector that conceals a plurality of identities. The first peer may provide one or more group identities (in a concealed format) which the second peer may use to ascertain whether access (keys, etc.) may be granted to the first peer. If a particular group identity of the first peer matches a group identity that is allowed access to the digital object, then the second peer may grant such access. In some implementations, in order to verify the first peer's assertion of being a member of a particular group, a subsequent authentication process may be performed to authenticate the first peer's membership of the particular group. For example, this may be done by the first peer presenting some credential (e.g., a user identity) signed by a group administrator of the particular group (e.g., signed by a private key belonging to the particular group and verifiable by a corresponding public key).
Concealment of identities may be achieved by hashing one or more identities and representing the hash values within a binary vector, The data structure may be a binary vector in which each of the one or more identities are represented by a plurality of bits that are uniformly and randomly distributed along the binary vector, which may be implemented as a Bloom filter. One or more of these hash functions may then be used to generate one or more offset or position index values into a composite binary vector that may represent a plurality of identities for the first peer. A requesting peer node's identities (e.g., group identities) may be concealed using a binary vector data structure such as a Bloom filter vector. An identity (e.g., group identity may be converted into a sequence of bits by some conversion function (e.g., hash function).

But, Gould does not explicitly disclose “filtering by the data processing system bits representing values in the one or more second fields in the records in the second data set through the filter mask generated from the one or more filter keys that represent the one or more values in the set of one or more first fields identified as the potential key field to determine whether the values in the one or more second fields in the records in the second data set have corresponding one or more values that pass the filter mask by being represented in the filter mask; when one or more values in the one or more second fields are included in the set of values represented in the filter mask, indicating a potential key relationship between the potential key field of the first data set and the one more second fields in the second data set.” 

However, Anderson teaches “filtering by the data processing system bits representing values in the one or more second fields in the records in the second data set through the filter mask generated from the one or more filter keys that represent the one or more values in the set of one or more first fields identified as the potential key field to determine whether the values in the one or more second fields in the records in the second data set have corresponding one or more values that pass the filter mask by being represented in the filter mask; when one or more values in the one or more second fields are included in the set of values represented in the filter mask, indicating a potential key relationship between the potential key field of the first data set and the one more second fields in the second data set” (See Fig. 4-5 [0061], [0082-[0083]) (A mask code 324 can 10 be implemented corresponding to the mandatory and additional fields 308, 312, The mask code 324 acts as filter that can selectively include or exclude fields or field values of the record 304, As described above, the first six fields of the example transaction record are mandatory fields 308 and the next three fields are additional fields 312, As such, the mask code 324 can be formulated to indicate the mandatory and additional fields 308, 312 as 15    follows: [0 0 0 1 1 1 1 1 1], A bitwise AND operation can be computed between the bitmap code 320 and the mask code. If the result of the bitwise AND is anything other than 63, then one of the mandatory fields 308 is unpopulated. In some scenarios, heterogeneous datasets (e.g., datasets in which records may accept values in different data record formats) may have records that include separate fields to identify the record type. As such, data formats of the 20 population for the records can be made conditional on the record type. In some implementations, the pattern information can be presented to a user through a graphical display within a user interface presented to the user. This way a user may be able to quickly ascertain a percentage of fields in a record that are populated. FIG. 4 show's an exemplary user interface that includes the list of fields 400, the bit labels 401, and buttons to set the mask 402. 25    FIG. 5 shows a user interface when the mask is set selectively and there are more than two data pattern codes for each field. The fields are listed 500 and labels assigned to each field 501. The mask 502 is set by selecting buttons. The legend 504 lists the data pattern codes that are displayed. Here gray levels are used to distinguish the data pattern codes but other possible display representations are possible, including simply displaying a numerical data 30 pattern code.)

It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to combine Gould (Data profiling) with Mao (Facilitating access controls for digital objects stored) and Anderson (Generating data pattern information) in order to allow for individual records or values within records to be compressed when stored and decompressed when accessed to reduce the storage requirements of the system. Anderson, [002]. 

One having ordinary skill would also be motivated to combine Gould, Mao and Anderson, in view of the suggestions provided by Mao in paragraph [006], which suggests, “a way is needed to preserve the privacy, identities, memberships, etc. of a peer while still being able to perform access control in a peer-to-peer network.”

	Regarding claim 2, Gould in view Mao and further in view of Anderson discloses “The method of claim 1, further including determining a count of a number of records each having a value associated with the one or more second fields in the records in the second data set that passes the filter mask; storing the count in a profile; and determining the Sorensen-Dice coefficient of the set of values in the filter mask and the records in the second data set having a value associated with the one or more second fields.” (See Page 1, [0014] & [0016]) (Information characterizing values of a second field in records of a second data source are accepted. Quantities characterizing a relationship between the first field and the second field are then computed based on the accepted information. Information relating the first field and the second field is presented. The data can include comparing characteristics of the data to reference characteristics for the data, such as by comparing statistical properties of the data. Gould discloses in [0121]-0144] wherein Referring to FIG. 5A, a sub-graph 600 implementing one embodiment of the make census component 406 includes a filter component 602 that passes a portion of incoming records based on a filter expression stored in the profile setup object 200. The filter expression may limit the fields or number of values profiled. An example of a filter expression is one that limits profiling to a single field of each incoming record (e.g., "title"). Another optional function of the filter component 602 is to implement the cleaning option described above, sending a sample of invalid records to the invalid records component 408. Records flowing out of the filter component 602 flow into a local rollup sequence stats component 604 and a partition by round-robin component 612.
(See also, Page 2, [0031]) (Determining the co-occurrence statistics includes forming data elements each identifying a pair of fields and identifying a pair of values occurring in the pair of fields in one of the data records. Fig. 3 and [0010], computing the summary data includes counting a number of occurrences for each of a set of distinct values for a field. The profile information can include statistics for the field based on the counted number of occurrences for said field. Each census record includes a count of the number of occurrences of the unique field/value pair for that census record.) 

Regarding claim 3, Gould in view Mao and further in view of Anderson discloses “The method of claim 1, further includes: producing for a given record a filter key that is based on values in the one or more first fields of the given record in the first data set; and generating the filter mask based on filter keys produced for the records in the first data set by combining the filter keys according to a Boolean operation.” (See [0010]) (Computing the summary data includes counting a number of occurrences for each of a set of distinct values for a field. The profile information can include statistics for the field based on the counted number of occurrences for said field.) 

Regarding claim 4, Gould in view Mao and further in view of Anderson discloses “The method of claim 3, wherein generating a filter key for the corresponding value includes: generating a hash value for the corresponding value; segmenting the hash value into a predetermined number of integers; generating a filter key by setting bits in a bit vector based on the integers.” (See Fig. 5B) (See also, Page 8, [0129]) (FIG. 5B is a diagram that illustrates a sub-graph 630 implementing the analyze census component 412 of the profiling graph 400. A partition by field component 632 reads a flow of census elements from the census file component 410 and re-partitions the census elements according to a hash value based on the field such that census records with the same field but different values are in the same partition. The partition in to string, number, date component 634 further partitions the census elements according to the type of the value in the census element. Different statistics are computed using a rollup process for values that are strings in the rollup string component 636), numbers (in the rollup number component 638), or dates/datetimes in the rollup date component 640. For example, it may be appropriate to calculate average and standard deviation for a number. (See Fig. 4 & See Page 4, [0077]) (The initial information about records can include the number of bits that represent a distinct value (e.g., 16 bits (=2 bytes) and the order of values, including values associated with record fields and values associated with tags or delimiters, and the type of value (e.g., string, signed/unsigned integer) represented by the bits.)

	Regarding claim 5, Gould in view Mao and further in view of Anderson discloses “The method of claim 3, wherein generating the filter mask further includes performing a binary operation on each of a plurality of generated filter keys.”  (See Fig. 11A – 12B) (See Page 10, [0146] – [0150]) (A join operation is performed on two data sets (e.g., files or tables). In another approach, described below in section 6.1, after the make census component 406 generates a census file for a data set, the information in the census file can be used to perform the joint-field analysis between fields in two different profiled data sets, or between fields in two different parts of the same profiled data set (or any other data set for which a census file exists). The result of joint-field analysis includes information about potential relationships between the fields. Three types of relationships that are discovered are: a "common domain" relationship, a "joins well" relationship, and "foreign key" relationship. Gould discloses in [0121]-0144] wherein Referring to FIG. 5A, a sub-graph 600 implementing one embodiment of the make census component 406 includes a filter component 602 that passes a portion of incoming records based on a filter expression stored in the profile setup object 200. The filter expression may limit the fields or number of values profiled. An example of a filter expression is one that limits profiling to a single field of each incoming record (e.g., "title"). Another optional function of the filter component 602 is to implement the cleaning option described above, sending a sample of invalid records to the invalid records component 408. Records flowing out of the filter component 602 flow into a local rollup sequence stats component 604 and a partition by round-robin component 612.) 
Regarding claim 6, Gould in view Mao and further in view of Anderson discloses “The method of claim 3, further including determining whether values in the one or more second fields in the records in the second data set have corresponding one or more values that pass the filter mask includes: generating one or more second filter keys for the one or more values associated with the one or more second fields in the second data set: and comparing the one or more second filter keys to the filter mask.” (See 11A-12B) (See Page 11, [0161]) (A census join component 1200 analyzes fields from Table A and Table B and compiles the statistics for an occurrence chart by performing a "census join" operation from census data for the tables. Each census record has a field/value pair and a count of the occurrences of the value in the field. Since each census record has a unique field/value pair, for a given key field, the values in an input flow of the census join component 1200 are unique.) (See also Page 7, [0119] & Page 14, [0196]-[200]) (Graphs can use a rule within the import component to find a relationship between a foreign key or field in one table to a primary key or field in another table, or to perform functional dependency calculations on parts of the data.)

Regarding claim 7, Gould in view Mao and further in view of Anderson discloses “A non-transitory computer storage medium encoded with computer program instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: receiving a first data set that includes a plurality of records; identifying a set of one or more first fields as a potential key field of the first data set: receiving a second data set that includes a plurality of records that have one or more second fields; filtering bits representing values in the one or more second fields in the records in the second data set through the filter mask generated from the one or more filter keys that represent the one or more values in the set of one or more first fields identified as the potential key field to determine whether the values in the one or more second fields in the records in the second data set have corresponding one or more values that pass the filter mask by being represented in the filter mask; when one or more values in the one or more second fields are included in the set of values represented in the filter mask, indicating a potential key relationship between the potential key field of the first data set and the one more second fields in the second data set.” (See Page 15, [0207] & [0208]) (The approaches described can be implemented using software for execution on a computer. The software forms procedures in one or more computer programs that execute on one or more programmed or programmable computer systems which may be of various architectures, such as distributed, client/server, or grid each including at least one processor, at least one data storage system for example, volatile and non-volatile memory and/or storage elements. The software may be provided on a medium or device readable by a general or special purpose programmable computer. (See Fig. 9 & Fig. 11A) (See also Page 1, [0008], [0016] & [0021]) (A method and corresponding software and a system for processing data with information characterizing values of a first field in records of a first data source.  The information characterizing the values of the first field includes information characterizing a distribution of values of that field. The data can include records with a variable record structure such as conditional fields and/or variable numbers of fields. The information characterizing the distribution of values of the first field can include multiple data records, each associating a different value and a corresponding number of occurrences of that value in the first field in the first data source. Information characterizing values of a second field includes multiple records of a second data source. The information characterizing the values of the second field includes information characterizing a distribution of values of that field.

See Anderson Fig. 2 [003], [0052], [0061], A method includes: storing, in a data storage system, at least one dataset including a plurality of records; and processing, in a data processing system coupled to the data storage system, the plurality of records to produce codes representing data patterns in the records, the processing including: for each of multiple records in the plurality of records, associating with the record a code encoding one or more elements, wherein each element represents a state or property of a corresponding field or combination of fields as one of a set of element values, and, for at least one element of at least a first code, the number of element values in the set is smaller than the total number of data values that occur in the corresponding field or combination of fields over all of the plurality of records in the dataset. The initial information about records can include the number of bits that represent a distinct value, the order of fields within a record, and the type of value represented by the bits. A mask code 324 can be implemented corresponding to the mandatory and additional fields 308, 312. The mask code 324 acts as filter that can selectively include or exclude fields or field values of the record 304.)

See Anderson (See Fig. 4-5 [0061], [0082-[0083]) (A mask code 324 can 10 be implemented corresponding to the mandatory and additional fields 308, 312, The mask code 324 acts as filter that can selectively include or exclude fields or field values of the record 304, As described above, the first six fields of the example transaction record are mandatory fields 308 and the next three fields are additional fields 312, As such, the mask code 324 can be formulated to indicate the mandatory and additional fields 308, 312 as 15    follows: [0 0 0 1 1 1 1 1 1], A bitwise AND operation can be computed between the bitmap code 320 and the mask code. If the result of the bitwise AND is anything other than 63, then one of the mandatory fields 308 is unpopulated. In some scenarios, heterogeneous datasets (e.g., datasets in which records may accept values in different data record formats) may have records that include separate fields to identify the record type. As such, data formats of the 20 population for the records can be made conditional on the record type. In some implementations, the pattern information can be presented to a user through a graphical display within a user interface presented to the user. This way a user may be able to quickly ascertain a percentage of fields in a record that are populated. FIG. 4 show's an exemplary user interface that includes the list of fields 400, the bit labels 401, and buttons to set the mask 402. 25    FIG. 5 shows an exemplary user interface when the mask is set selectively and there are more than two data pattern codes for each field. The fields are listed 500 and labels assigned to each field 501. The mask 502 is set by selecting buttons. The legend 504 lists the data pattern codes that are displayed. Here gray levels are used to distinguish the data pattern codes but other possible display representations are possible, including simply displaying a numerical data 30 pattern code.)

“identifying one or more filter keys that represent one or more values in the set of the one or more first fields: storing, in memory, a filter mask that stores a set of bits that are generated from the one or more filter keys that represent the one or more values in the set of one or more first fields” (See Mao: Fig. 3, Fig. 9, Fig. 10 and [0030]-[0031], [0041]-[0043]) (FIG. 3 is one example of generating a Bloom filter vector that conceals a plurality of identities. The first peer may provide one or more group identities (in a concealed format) which the second peer may use to ascertain whether access (keys, etc.) may be granted to the first peer. If a particular group identity of the first peer matches a group identity that is allowed access to the digital object, then the second peer may grant such access. In some implementations, in order to verify the first peer's assertion of being a member of a particular group, a subsequent authentication process may be performed to authenticate the first peer's membership of the particular group. For example, this may be done by the first peer presenting some credential (e.g., a user identity) signed by a group administrator of the particular group (e.g., signed by a private key belonging to the particular group and verifiable by a corresponding public key).

Concealment of identities may be achieved by hashing one or more identities and representing the hash values within a binary vector, The data structure may be a binary vector in which each of the one or more identities are represented by a plurality of bits that are uniformly and randomly distributed along the binary vector, which may be implemented as a Bloom filter. One or more of these hash functions may then be used to generate one or more offset or position index values into a composite binary vector that may represent a plurality of identities for the first peer. A requesting peer node's identities (e.g., group identities) may be concealed using a binary vector data structure such as a Bloom filter vector. An identity (e.g., group identity may be converted into a sequence of bits by some conversion function (e.g., hash function).

It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to combine Gould (Data profiling) with Mao (Facilitating access controls for digital objects stored) and Anderson (Generating data pattern information) in order to allow for individual records or values within records to be compressed when stored and decompressed when accessed to reduce the storage requirements of the system. Anderson, [002]. 

One having ordinary skill would also be motivated to combine Gould, Mao and Anderson, in view of the suggestions provided by Mao in paragraph [006], which suggests, “a way is needed to preserve the privacy, identities, memberships, etc. of a peer while still being able to perform access control in a peer-to-peer network.”

	Regarding claim 8, Gould in view Mao and further in view of Anderson discloses “The medium of claim 7, further including determining a count of a number of records each having a value associated with the one or more second fields in the records in the second data set that passes the filter mask; storing the count in a profile; and determining the Sorensen-Dice coefficient of the set of values in the filter mask and the records in the second data set having a value associated with the one or more second fields.” (See Page 1, [0014] & [0016]) (Information characterizing values of a second field in records of a second data source are accepted. Quantities characterizing a relationship between the first field and the second field are then computed based on the accepted information. Information relating the first field and the second field is presented. The data can include comparing characteristics of the data to reference characteristics for the data, such as by comparing statistical properties of the data. Gould discloses in [0121]-0144] wherein Referring to FIG. 5A, a sub-graph 600 implementing one embodiment of the make census component 406 includes a filter component 602 that passes a portion of incoming records based on a filter expression stored in the profile setup object 200. The filter expression may limit the fields or number of values profiled. An example of a filter expression is one that limits profiling to a single field of each incoming record (e.g., "title"). Another optional function of the filter component 602 is to implement the cleaning option described above, sending a sample of invalid records to the invalid records component 408. Records flowing out of the filter component 602 flow into a local rollup sequence stats component 604 and a partition by round-robin component 612.
(See also, Page 2, [0031]) (Determining the co-occurrence statistics includes forming data elements each identifying a pair of fields and identifying a pair of values occurring in the pair of fields in one of the data records. Fig. 3 and [0010], computing the summary data includes counting a number of occurrences for each of a set of distinct values for a field. The profile information can include statistics for the field based on the counted number of occurrences for said field. Each census record includes a count of the number of occurrences of the unique field/value pair for that census record.) 
Regarding claim 9, Gould in view Mao and further in view of Anderson discloses “The medium of claim 7, wherein the operations further include: producing for a given record a filter key that is based on values in the one or more first fields of the given record in the first data set; and generating the filter mask based on filter keys produced for the records in the first data set by combining the filter keys according to a Boolean operation.” (See [0010]) (Computing the summary data includes counting a number of occurrences for each of a set of distinct values for a field. The profile information can include statistics for the field based on the counted number of occurrences for said field.) 

Regarding claim 10, Gould in view Mao and further in view of Anderson discloses “The medium of claim 9, wherein generating a filter key for a corresponding value includes: generating a hash value for the corresponding value; segmenting the hash value into a predetermined number of integers; and generating the filter key by setting bits in a bit vector based on the integers.” (See Fig. 4 & See Page 4, [0077]) (The initial information about records can include the number of bits that represent a distinct value (e.g., 16 bits (=2 bytes) and the order of values, including values associated with record fields and values associated with tags or delimiters, and the type of value (e.g., string, signed/unsigned integer) represented by the bits.)

(See Fig. 5B) (See also, Page 8, [0129]) (FIG. 5B is a diagram that illustrates a sub-graph 630 implementing the analyze census component 412 of the profiling graph 400. A partition by field component 632 reads a flow of census elements from the census file component 410 and re-partitions the census elements according to a hash value based on the field such that census records with the same field but different values are in the same partition. The partition in to string, number, date component 634 further partitions the census elements according to the type of the value in the census element. Different statistics are computed using a rollup process for values that are strings in the rollup string component 636), numbers (in the rollup number component 638), or dates/datetimes in the rollup date component 640. For example, it may be appropriate to calculate average and standard deviation for a number.)

	Regarding claim 11, Gould in view Mao and further in view of Anderson discloses “The medium of claim 9, wherein generating the filter mask further includes performing a binary operation on each of a plurality of generated filter keys.”  (See Fig. 11A – 12B) (See Page 10, [0146] – [0150]) (A join operation is performed on two data sets (e.g., files or tables). In another approach, described below in section 6.1, after the make census component 406 generates a census file for a data set, the information in the census file can be used to perform the joint-field analysis between fields in two different profiled data sets, or between fields in two different parts of the same profiled data set (or any other data set for which a census file exists). The result of joint-field analysis includes information about potential relationships between the fields. Three types of relationships that are discovered are: a "common domain" relationship, a "joins well" relationship, and "foreign key" relationship. Gould discloses in [0121]-0144] wherein Referring to FIG. 5A, a sub-graph 600 implementing one embodiment of the make census component 406 includes a filter component 602 that passes a portion of incoming records based on a filter expression stored in the profile setup object 200. The filter expression may limit the fields or number of values profiled. An example of a filter expression is one that limits profiling to a single field of each incoming record (e.g., "title"). Another optional function of the filter component 602 is to implement the cleaning option described above, sending a sample of invalid records to the invalid records component 408. Records flowing out of the filter component 602 flow into a local rollup sequence stats component 604 and a partition by round-robin component 612.) 
Regarding claim 12, Gould in view Mao and further in view of Anderson discloses “The medium of claim 9, wherein the operations further include: determining whether the values in the one or more second fields in the records in the second data set have corresponding one or more values that pass the filter mask including: generating one or more second filter keys for the one or more values associated with the one or more second fields in the second data set; and comparing the one or more second filter keys to the filter mask.” (See 11A-12B) (See Page 11, [0161]) (A census join component 1200 analyzes fields from Table A and Table B and compiles the statistics for an occurrence chart by performing a "census join" operation from census data for the tables. Each census record has a field/value pair and a count of the occurrences of the value in the field. Since each census record has a unique field/value pair, for a given key field, the values in an input flow of the census join component 1200 are unique.) (See also Page 7, [0119] & Page 14, [0196]-[200]) (Graphs can use a rule within the import component to find a relationship between a foreign key or field in one table to a primary key or field in another table, or to perform functional dependency calculations on parts of the data.)

Regarding claim 13, Gould in view Mao and further in view of Anderson discloses “A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving a first data set that includes a plurality of records; identifying a set of one or more first fields as a potential key field of the first data set: receiving a second data set that includes a plurality of records that have one or more second fields; filtering bits representing values in the one or more second fields in the records in the second data set through the filter mask generated from the  one or more filter keys that represent the one or more values in the set of one or more first fields identified as the potential key field to determine whether the values in the one or more second fields in the records in the second data set have corresponding one or more values that pass the filter mask by being represented in the filter mask; when one or more values in the one or more second fields are included in the set of values represented in the filter mask, indicating a potential key relationship between the potential key field of the first data set and the one more second fields in the second data set.” (See Fig. 1, Fig. 9 & Fig. 11A) (See also Page 1, [0008], [0016] & [0021]) (A method and corresponding software and a system for processing data with information characterizing values of a first field in records of a first data source.  The information characterizing the values of the first field includes information characterizing a distribution of values of that field. The data can include records with a variable record structure such as conditional fields and/or variable numbers of fields. The information characterizing the distribution of values of the first field can include multiple data records, each associating a different value and a corresponding number of occurrences of that value in the first field in the first data source. Information characterizing values of a second field includes multiple records of a second data source. The information characterizing the values of the second field includes information characterizing a distribution of values of that field.

See Anderson Fig. 2 [003], [0052], [0061], A method includes: storing, in a data storage system, at least one dataset including a plurality of records; and processing, in a data processing system coupled to the data storage system, the plurality of records to produce codes representing data patterns in the records, the processing including: for each of multiple records in the plurality of records, associating with the record a code encoding one or more elements, wherein each element represents a state or property of a corresponding field or combination of fields as one of a set of element values, and, for at least one element of at least a first code, the number of element values in the set is smaller than the total number of data values that occur in the corresponding field or combination of fields over all of the plurality of records in the dataset. The initial information about records can include the number of bits that represent a distinct value, the order of fields within a record, and the type of value represented by the bits. A mask code 324 can be implemented corresponding to the mandatory and additional fields 308, 312. The mask code 324 acts as filter that can selectively include or exclude fields or field values of the record 304.)

See Anderson (See Fig. 4-5 [0061], [0082-[0083]) (A mask code 324 can 10 be implemented corresponding to the mandatory and additional fields 308, 312, The mask code 324 acts as filter that can selectively include or exclude fields or field values of the record 304, As described above, the first six fields of the example transaction record are mandatory fields 308 and the next three fields are additional fields 312, As such, the mask code 324 can be formulated to indicate the mandatory and additional fields 308, 312 as 15    follows: [0 0 0 1 1 1 1 1 1], A bitwise AND operation can be computed between the bitmap code 320 and the mask code. If the result of the bitwise AND is anything other than 63, then one of the mandatory fields 308 is unpopulated. In some scenarios, heterogeneous datasets (e.g., datasets in which records may accept values in different data record formats) may have records that include separate fields to identify the record type. As such, data formats of the 20 population for the records can be made conditional on the record type. In some implementations, the pattern information can be presented to a user through a graphical display within a user interface presented to the user. This way a user may be able to quickly ascertain a percentage of fields in a record that are populated. FIG. 4 show's an exemplary user interface that includes the list of fields 400, the bit labels 401, and buttons to set the mask 402. 25    FIG. 5 shows an exemplary user interface when the mask is set selectively and there are more than two data pattern codes for each field. The fields are listed 500 and labels assigned to each field 501. The mask 502 is set by selecting buttons. The legend 504 lists the data pattern codes that are displayed. Here gray levels are used to distinguish the data pattern codes but other possible display representations are possible, including simply displaying a numerical data 30 pattern code.)

“identifying one or more filter keys that represent one or more values in the set of the one or more first fields: storing, in memory, a filter mask that stores a set of bits that are generated from the one or more filter keys that represent the one or more values in the set of one or more first fields” (See Mao: Fig. 3, Fig. 9, Fig. 10 and [0030]-[0031], [0041]-[0043]) (FIG. 3 is one example of generating a Bloom filter vector that conceals a plurality of identities. The first peer may provide one or more group identities (in a concealed format) which the second peer may use to ascertain whether access (keys, etc.) may be granted to the first peer. If a particular group identity of the first peer matches a group identity that is allowed access to the digital object, then the second peer may grant such access. In some implementations, in order to verify the first peer's assertion of being a member of a particular group, a subsequent authentication process may be performed to authenticate the first peer's membership of the particular group. For example, this may be done by the first peer presenting some credential (e.g., a user identity) signed by a group administrator of the particular group (e.g., signed by a private key belonging to the particular group and verifiable by a corresponding public key).

Concealment of identities may be achieved by hashing one or more identities and representing the hash values within a binary vector, The data structure may be a binary vector in which each of the one or more identities are represented by a plurality of bits that are uniformly and randomly distributed along the binary vector, which may be implemented as a Bloom filter. One or more of these hash functions may then be used to generate one or more offset or position index values into a composite binary vector that may represent a plurality of identities for the first peer. A requesting peer node's identities (e.g., group identities) may be concealed using a binary vector data structure such as a Bloom filter vector. An identity (e.g., group identity may be converted into a sequence of bits by some conversion function (e.g., hash function).

It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to combine Gould (Data profiling) with Mao (Facilitating access controls for digital objects stored) and Anderson (Generating data pattern information) in order to allow for individual records or values within records to be compressed when stored and decompressed when accessed to reduce the storage requirements of the system. Anderson, [002]. 

One having ordinary skill would also be motivated to combine Gould, Mao and Anderson, in view of the suggestions provided by Mao in paragraph [006], which suggests, “a way is needed to preserve the privacy, identities, memberships, etc. of a peer while still being able to perform access control in a peer-to-peer network.”

	Regarding claim 14, Gould in view Mao and further in view of Anderson discloses “The system of claim 13, further including determining a count of a number of records each having a value associated with the one or more second fields in the records in the second data set that passes the filter mask; storing the count in a profile; and determining the Sorensen-Dice coefficient of the set of values in the filter mask and the records in the second data set having a value associated with the one or more second fields.” (See Page 1, [0014] & [0016]) (Information characterizing values of a second field in records of a second data source are accepted. Quantities characterizing a relationship between the first field and the second field are then computed based on the accepted information. Information relating the first field and the second field is presented. The data can include comparing characteristics of the data to reference characteristics for the data, such as by comparing statistical properties of the data. Gould discloses in [0121]-0144] wherein Referring to FIG. 5A, a sub-graph 600 implementing one embodiment of the make census component 406 includes a filter component 602 that passes a portion of incoming records based on a filter expression stored in the profile setup object 200. The filter expression may limit the fields or number of values profiled. An example of a filter expression is one that limits profiling to a single field of each incoming record (e.g., "title"). Another optional function of the filter component 602 is to implement the cleaning option described above, sending a sample of invalid records to the invalid records component 408. Records flowing out of the filter component 602 flow into a local rollup sequence stats component 604 and a partition by round-robin component 612.
(See also, Page 2, [0031]) (Determining the co-occurrence statistics includes forming data elements each identifying a pair of fields and identifying a pair of values occurring in the pair of fields in one of the data records. Fig. 3 and [0010], computing the summary data includes counting a number of occurrences for each of a set of distinct values for a field. The profile information can include statistics for the field based on the counted number of occurrences for said field. Each census record includes a count of the number of occurrences of the unique field/value pair for that census record.) 
	Regarding claim 15, Gould in view Mao and further in view of Anderson discloses “The system of claim 13, wherein the operations further include: producing for a given record a filter key that is based on values in the one or more first fields of the given record in the first data set; and generating the filter mask based on filter keys produced for the records in the first data set by combining the filter keys according to a Boolean operation.” (See [0010]) (Computing the summary data includes counting a number of occurrences for each of a set of distinct values for a field. The profile information can include statistics for the field based on the counted number of occurrences for said field.) 

Regarding claim 16, Gould in view Mao and further in view of Anderson discloses “The system of claim 15, wherein generating a filter key for a corresponding value includes: generating a hash value for the corresponding value; segmenting the hash value into a predetermined number of integers; and generating the filter key by setting bits in a bit vector based on the integers.” See Fig. 5B) (See also, Page 8, [0129]) (FIG. 5B is a diagram that illustrates a sub-graph 630 implementing the analyze census component 412 of the profiling graph 400. A partition by field component 632 reads a flow of census elements from the census file component 410 and re-partitions the census elements according to a hash value based on the field such that census records with the same field but different values are in the same partition. The partition in to string, number, date component 634 further partitions the census elements according to the type of the value in the census element. Different statistics are computed using a rollup process for values that are strings in the rollup string component 636), numbers (in the rollup number component 638), or dates/date times in the rollup date component 640. For example, it may be appropriate to calculate average and standard deviation for a number. (See Fig. 4 & See Page 4, [0077]) (The initial information about records can include the number of bits that represent a distinct value (e.g., 16 bits (=2 bytes) and the order of values, including values associated with record fields and values associated with tags or delimiters, and the type of value (e.g., string, signed/unsigned integer) represented by the bits.)

	Regarding claim 17, Gould in view Mao and further in view of Anderson discloses “The system of claim 15, wherein generating the filter mask further includes performing a binary operation on each of a plurality of generated filter keys.” (See Fig. 11A – 12B) (See Page 10, [0146] – [0150]) (A join operation is performed on two data sets (e.g., files or tables). In another approach, described below in section 6.1, after the make census component 406 generates a census file for a data set, the information in the census file can be used to perform the joint-field analysis between fields in two different profiled data sets, or between fields in two different parts of the same profiled data set (or any other data set for which a census file exists). The result of joint-field analysis includes information about potential relationships between the fields. Three types of relationships that are discovered are: a "common domain" relationship, a "joins well" relationship, and "foreign key" relationship. Gould discloses in [0121]-0144] wherein Referring to FIG. 5A, a sub-graph 600 implementing one embodiment of the make census component 406 includes a filter component 602 that passes a portion of incoming records based on a filter expression stored in the profile setup object 200. The filter expression may limit the fields or number of values profiled. An example of a filter expression is one that limits profiling to a single field of each incoming record (e.g., "title"). Another optional function of the filter component 602 is to implement the cleaning option described above, sending a sample of invalid records to the invalid records component 408. Records flowing out of the filter component 602 flow into a local rollup sequence stats component 604 and a partition by round-robin component 612.) 

Regarding claim 18, Gould in view Mao and further in view of Anderson discloses “The system of claim 15, wherein the operations further include: determining whether the in the one or more second fields in the records in the second data set have corresponding more values that pass the filter mask including: generating one or more second filter keys for the one or more values associated with the more second fields in the second data set; and comparing the one or more second filter keys to the filter mask.” (See 11A-12B) (See Page 11, [0161]) (A census join component 1200 analyzes fields from Table A and Table B and compiles the statistics for an occurrence chart by performing a "census join" operation from census data for the tables. Each census record has a field/value pair and a count of the occurrences of the value in the field. Since each census record has a unique field/value pair, for a given key field, the values in an input flow of the census join component 1200 are unique.) (See also Page 7, [0119] & Page 14, [0196]-[200]) (Graphs can use a rule within the import component to find a relationship between a foreign key or field in one table to a primary key or field in another table, or to perform functional dependency calculations on parts of the data.)






Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TRACY M MCGHEE whose telephone number is (313)446-6581.  The examiner can normally be reached on 9am-5pm. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain Alam can be reached on 571-272-3978.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/TRACY M MCGHEE/Examiner, Art Unit 2154        

/HOSAIN T ALAM/Supervisory Patent Examiner, Art Unit 2154