DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 2/22/2022 has been entered.
 
Response to Arguments
Applicant's arguments filed 2/22/2022 have been fully considered but they are not persuasive.
	Applicant states (pp. 9) that Ardila does not teach determining the variance in data formats after identifying matches and mismatches in schema elements, since the variance present in the generated error log is not the same as the corresponding pre-defined data formats. Examiner respectfully disagrees.
Ardila compares every schema element (i.e., data format) in the input schema with an element in the output schema on features of the element such as name and type, to compute a confidence measure representing the strength of match. A mismatch (i.e., variance) is detected if the strength of match is below a predetermined threshold (Ardila: [0035]). Detected mismatches (i.e., errors) are displayed to the user in a table (i.e., error log) (Ardila: fig. 7, #730; [0036]).
Applicant further states (pp. 9-10) that Crooks does not teach the amended limitation of an attribute comprising a number of characters, a type, a structure and presence of keywords. Examiner respectfully disagrees.
Data mapping in Crook is the process of performing transformations to resolve data representation differences between document formats. Example transforms include name, structural (i.e., type), and value transformations (i.e., presence of keywords) (Crook: [0047]).
In summary, Baid combined with Crook and Ardila teaches the argued limitations of independent claims 1, 12 and 19.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4-6, 9-12, 15 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Baid et al. US patent application 2013/0226944 [herein “Baid], in view of Crook et al. US patent application 2009/0171720 [herein “Crook”], and further in view of Ardila et al. US patent publication 2016/0147796 [herein “Ardila”].
Claim 1 recites “A system for regularizing data between a data source and a data destination, the data corresponding to a given data category of a plurality of data categories, wherein the given data category includes specific data fields, wherein the system comprises: a data processing arrangement comprising: a data fetching module operable to fetch data from the data source, wherein the fetched data includes one or more data fields having values in corresponding data formats;”
Baid teaches a system to transform data from one structure to another, where a user authors a transform of input data (i.e., data source) to output data (i.e., data destination) in terms of structure that is independent of format. Data transformation is performed as a function of the transform and input data. Access (i.e., fetch) of input data and population of output data in specific formats are implemented by common interfaces [0004]. Input and output data is described as structures in a common object system [0029], whose attributes (i.e., data fields) of abstract data types (i.e., categories) [0089] are expressed as named values [0031].
Claim 1 further recites “a data transformation module operable to receive the fetched data from the data fetching module, wherein the data transformation module is operable to: receive pre-defined data formats for the values of data fields for a specific data category;”
Baid accesses input data and populates output data in specific formats (i.e., pre-defined data formats) using common interfaces [0004]. Input and output data is described as structures in a common object system [0029], whose attributes (i.e., data fields) of abstract data types (i.e., categories) [0089] are described as named values [0031].
Claim 1 further recites “identify data fields for the values of the fetched data based on at least one attribute of the values, wherein the at least one attribute comprises: a number of characters, a type, a structure and presence of keywords;”
Baid does not disclose this limitation; however, data mapping in Crook is the process of performing transformations to resolve data representation differences between document formats. Example transforms include name, structural (i.e., type), and value transformations (i.e., presence of keywords) (Crook: [0047]).
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Baid with Crook. One having ordinary skill in the art would have found motivation to incorporate Crook’s transformations in Baid to resolve data representation differences between input and output formats.
Claim 1 further recites “compare data formats of values of data fields of the fetched data with received pre-defined data formats for the values; determine, based on the comparison, a deviation between a data format of at least one value and a corresponding pre-defined data format for the at least one value; and transform the data format of the at least one value to the corresponding pre- defined data format, if the deviation is determined;”
In Baid, input and output structures participating in a transform are expressed as named values in path expressions [0031], and transforms are expressed as map statements from source to target named values [0032]. Transforms are executed to transform input data in one structure (i.e., source data formats) to output data in another structure (i.e., pre-defined data format) [0020]. Existence of transforms indicates deviation between input and output structures.
Claim 1 further recites “a data validation module operable to: receive from the data transformation module, the pre-defined data formats, and the transformed data if the deviation is determined, or the fetched data if the deviation is not determined; confirm if data formats of values of all data fields of a received data are same as corresponding pre-defined data formats, wherein the received data comprises the transformed data or the fetched data;”
Baid does not disclose this limitation; however, Crook maintains a metadata library of transforms (Crook: [0091]). Given input signature P (i.e., source data format) and output signature Q (i.e., pre-defined data format), the match function determines if two signatures are compatible (i.e., deviation not determined) (Crook: [0208]).
Claim 1 further recites “generate an error log for the transformed data when data formats of values of one or more data fields are not same as the corresponding pre-defined data formats;”
Baid’s executor of transforms outputs error messages if transformation cannot be completed [0020]. Baid does not disclose this limitation; however, Ardila teaches automatic schema (i.e., data format) mismatch detection and resolution, where a transformation pipeline comprises a set of one or more related transform jobs with output of a first job optionally providing input to a second job (Ardila: [0015]). Detected mismatches (i.e., errors) are displayed to the user in a table (i.e., error log) (Ardila: fig. 7, #730; [0036]).
Claim 1 further recites “a data regularization module operable to: receive a generated error log for the transformed data from the data validation module, wherein the generated error log comprises transformed data having data formats of values of one or more data fields that are not same as the corresponding pre-defined data formats;”
Baid does not disclose this limitation; however, Ardila compares the data set schema (i.e., input schema) and the job schema (i.e., output schema) of a transform job (i.e., transformed data) to determine (i.e., generate) a list of mismatches (i.e., not the same) to be displayed to the user in a table (Ardila: fig. 7, #730; [0035]-[0036]).
Claim 1 further recites “determine a variance in data formats of values of the one or more data fields present in the generated error log, that are not same as the corresponding pre-defined data formats, and the corresponding pre-defined data formats;”
Baid does not disclose this limitation; however, Ardila compares every schema element (i.e., data format) in the input schema with an element in the output schema on features of the element such as name and type, to compute a confidence measure representing the strength of match. A mismatch (i.e., variance) is detected if the strength of match is below a predetermined threshold (Ardila: [0035]).
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Baid with Ardila. One having ordinary skill in the art would have found motivation to incorporate Ardila’s confidence measure in Baid to detect and quantify matches between data formats, in order to guide Baid’s transformation system to efficiently resolve mismatches.
Claim 1 further recites “identify a resolution for the determined variance, wherein the resolution comprises changing the data formats of values of the one or more data fields, that are not same as the corresponding pre-defined data formats, to the corresponding pre-defined data formats; and”.
Baid does not disclose this limitation; however, every transform in Crook’s metadata library transforms one signature to another. When the pair of input and output signatures P and Q are not compatible (i.e., determined variance), Crook identifies iteratively a sequence of one or more transforms, such that their composition (i.e., resolution) transforms (i.e., changes) P to Q (Crook: [0215]). Every iteration starts with the match function determining if the (previously transformed) signature P is compatible with Q, and if so, iteration stops, otherwise a next transform is identified and applied. In other words, if signatures are represented as nodes in a graph, then a transform is an arc connecting one signature node to another. Given a pair of nodes representing input and output signatures P and Q respectively, Crook searches iteratively for a path from node P to node Q.
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Baid with Crook. One having ordinary skill in the art would have found motivation to incorporate Crook’s algorithm in Baid, such that primitive transforms can be composed to form composite transforms, which significantly reduces the number of pre-computed transforms needed in the metadata library.
Claim 1 further recites “transmit the resolved data to the data transformation module, wherein the data transformation module is further operable to process the resolved data along with the fetched data and identify from the received data, regularized data having data formats of values of all data fields same as the corresponding predefined data formats and transmit the regularized data to the data destination; and a database arrangement for implementing the data destination, the database arrangement being communicatively coupled to the data processing arrangement, wherein the database arrangement is operable to store the received regularized data.”
Once input data is transformed (i.e., resolved) to output data (i.e., regularized data), Baid populates output data (i.e., transmits to and stores at data destination) in specific format (i.e., pre-defined data format) by wrapping access to objects and attributes being operated on by the transformation via a common interface [0059]-[0068].

Claim 4 recites “The system of claim 1, wherein the data source is implemented using at least one database.” Baid’s transformation system can be practiced in a general-purpose computing device, containing a server and a mass storage (i.e., database) (fig. 9, [0090]).

Claim 5 recites “The system of claim 1, wherein the data processing arrangement is implemented within a server arrangement.” Baid’s transformation system can be practiced in a general-purpose computing device, containing a server and a mass storage (i.e., database) (fig. 9, [0090]).

Claim 6 recites “The system of claim 1, wherein at least one of: the data fetching module, the data transformation module, the data validation module, and the data regularization module, is implemented using a machine-learning algorithm.” In Baid, authoring of transforms can employ machine learning to infer functions/methods or maplets to suggest to a user based on context [0076].

Claim 9 recites “The system of claim 1, wherein the data validation module is further operable to generate a notification comprising data formats of values of the one or more data fields not being same as the corresponding pre-defined data formats.”
Baid teaches claim 1, but does not disclose this claim; however, Ardila compares the data set schema (i.e., input schema) and the job schema (i.e., output schema) of a transform job (i.e., transformed data) to determine a list of mismatches (i.e., not the same) to be displayed to the user (i.e., notifies the user) (Ardila: [0035]-[0036]).
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Baid with Ardila. One having ordinary skill in the art would have found motivation to incorporate Ardila’s confidence measure in Baid to detect and quantify matches between data formats, in order to guide Baid’s transformation system to efficiently resolve mismatches.
Claim 18 is analogous to claim 9, and is similarly rejected.

Claim 10 recites “The system of claim 1, wherein the system further comprises a database driver module, wherein the database driver module allows retrieval of the regularized data stored in the database arrangement.”
Baid’s transformation system can be practiced in a general-purpose computing device, containing a server and a mass storage (i.e., database), where transformed output data (i.e., regularized data) is stored (fig. 9, [0090]). The mass storage can include one or more program modules (i.e., driver) to control resources [0097].

Claim 11 recites “The system of claim 1, wherein the system simultaneously regularizes, in operation, data corresponding to more than one data category of the plurality of data categories.”
In Baid, input and output data is described as structures in a common object system [0029], whose attributes (i.e., data fields) of abstract data types (i.e., plurality of data categories) [0089] are expressed as named values in path expressions [0031]. Input data can be transformed (i.e., regularized) concurrently (i.e., simultaneously) [0077].

Claim 12 recites “A method for regularizing data between a data source and a data destination, the data corresponding to a given data category of a plurality of data categories, wherein the given data category includes specific data fields, wherein the method comprises: fetching from the data source, a data including one or more data fields having values in corresponding data formats;”
Baid teaches a method to transform data from one structure to another, where a user authors a transform of input data (i.e., data source) to output data (i.e., data destination) in terms of structure that is independent of format. Data transformation is performed as a function of the transform and input data. Access (i.e., fetch) of input data and population of output data in specific formats are implemented by common interfaces [0004]. Input and output data is described as structures in a common object system [0029], whose attributes (i.e., data fields) of abstract data types (i.e., categories) [0089] are expressed as named values [0031].
Claim 12 further recites “receiving pre-defined data formats for the values of data fields for a specific data category;”
Baid accesses input data and populates output data in specific formats (i.e., pre-defined data formats) using common interfaces [0004]. Input and output data is described as structures in a common object system [0029], whose attributes (i.e., data fields) of abstract data types (i.e., categories) [0089] are described as named values [0031].
Claim 12 further recites “identifying data fields for the values of the fetched data based on at least one attribute of the values, wherein the at least one attribute comprises: a number of characters, a type, a structure and presence of keywords;”
Baid does not disclose this limitation; however, data mapping in Crook is the process of performing transformations to resolve data representation differences between document formats. Example transforms include name, structural (i.e., type), and value transformations (i.e., presence of keywords) (Crook: [0047]).
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Baid with Crook. One having ordinary skill in the art would have found motivation to incorporate Crook’s transformations in Baid to resolve data representation differences between input and output formats.
Claim 12 further recites “comparing data formats of values of data fields of the fetched data with pre-defined data formats for the values; determining, based on the comparison, a deviation between a data format of at least one value and a corresponding pre-defined data format for the at least one value; and transforming the data format of the at least one value to the corresponding pre-defined data format, if the deviation is determined;”
In Baid, input and output structures participating in a transform are expressed as named values in path expressions [0031], and transforms are expressed as map statements from source to target named values [0032]. Transforms are executed to transform input data in one structure (i.e., source data formats) to output data in another structure (i.e., pre-defined data format) [0020]. Existence of transforms indicates deviation between input and output structures.
Claim 12 further recites “confirming if data formats of values of all data fields of a received data are same as corresponding pre-defined data formats, wherein the received data comprises the transformed data or the fetched data;”
Baid does not disclose this limitation; however, Crook maintains a metadata library of transforms (Crook: [0091]). Given input signature P (i.e., source data format) and output signature Q (i.e., pre-defined data format), the match function determines if two signatures are compatible (i.e., deviation not determined) (Crook: [0208]).
Claim 12 further recites “generate an error log for the transformed data when data formats of values of one or more data fields are not same as the corresponding pre-defined data formats,”
Baid’s executor of transforms outputs error messages if transformation cannot be completed [0020]. Baid does not disclose this limitation; however, Ardila teaches automatic schema (i.e., data format) mismatch detection and resolution, where a transformation pipeline comprises a set of one or more related transform jobs with output of a first job optionally providing input to a second job (Ardila: [0015]). Detected mismatches (i.e., errors) are displayed to the user in a table (i.e., error log) (Ardila: fig. 7, #730; [0036]).
Claim 12 further recites “wherein the generated error log comprises transformed data having data formats of values of one or more data fields that are not same as the corresponding pre-defined data formats;”
Baid does not disclose this limitation; however, Ardila compares the data set schema (i.e., input schema) and the job schema (i.e., output schema) of a transform job (i.e., transformed data) to determine (i.e., generate) a list of mismatches (i.e., not the same) to be displayed to the user in a table (Ardila: fig. 7, #730; [0035]-[0036]).
Claim 12 further recites “determine a variance in data formats of values of the one or more data fields present in the generated error log, that are not same as the corresponding pre-defined data formats, and the corresponding pre-defined data formats;”
Baid does not disclose this limitation; however, Ardila compares every schema element (i.e., data format) in the input schema with an element in the output schema on features of the element such as name and type, to compute a confidence measure representing the strength of match. A mismatch (i.e., variance) is detected if the strength of match is below a predetermined threshold (Ardila: [0035]).
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Baid with Ardila. One having ordinary skill in the art would have found motivation to incorporate Ardila’s confidence measure in Baid to detect and quantify matches between data formats, in order to guide Baid’s transformation system to efficiently resolve mismatches.
Claim 12 further recites “identifying a resolution for the determined variance, wherein the resolution comprises changing the data formats of values of the one or more data fields, that are not same as the corresponding pre-defined data formats, to the corresponding pre-defined data formats; and”.
Baid does not disclose this limitation; however, every transform in Crook’s metadata library transforms one signature to another. When the pair of input and output signatures P and Q are not compatible (i.e., determined variance), Crook identifies iteratively a sequence of one or more transforms, such that their composition (i.e., resolution) transforms (i.e., changes) P to Q (Crook: [0215]). Every iteration starts with the match function determining if the (previously transformed) signature P is compatible with Q, and if so, iteration stops, otherwise a next transform is identified and applied. In other words, if signatures are represented as nodes in a graph, then a transform is an arc connecting one signature node to another. Given a pair of nodes representing input and output signatures P and Q respectively, Crook searches iteratively for a path from node P to node Q.
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Baid with Crook. One having ordinary skill in the art would have found motivation to incorporate Crook’s algorithm in Baid, such that primitive transforms can be composed to form composite transforms, which significantly reduces the number of pre-computed transforms needed in the metadata library.
Claim 12 further recites “processing the resolved data along with the fetched data and identifying from the received data, regularized data having data formats of values of all data fields same as the corresponding predefined data formats and transmit the regularized data to the data destination.”
Once input data is transformed (i.e., resolved) to output data (i.e., regularized data), Baid populates output data (i.e., transmits to data destination) in specific format (i.e., pre-defined data format) by wrapping access to objects and attributes being operated on by the transformation via a common interface [0059]-[0068].
Claim 19 is analogous to claim 12, and is similarly rejected.

Claim 15 recites “The method of claim 14, wherein the method employs at least one machine 20learning algorithm.” In Baid, authoring of transforms can employ machine learning to infer functions/methods or maplets to suggest to a user based on context [0076].

Claims 7 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Baid, as applied to claims 1 and 12 above respectively, in view of Crook, and further in view of Najork et al. US patent US patent 6,351,755 [herein “Najork”].
Claim 7 recites “The system of claim 1, wherein the data fetching module is implemented as a web-crawler.”
Baid teaches claim 1, but does not disclose this claim; however, Najork teaches a web crawler where pages downloaded (i.e., fetched) by the crawler are processed by a sequence of processing modules, to extract and store information about the downloaded pages (Najork: 1:48-54).
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Baid with Najork. One having ordinary skill in the art would have found motivation to integrate Baid’s transformation system with Najork’s crawler such that content downloaded by the crawler can be processed by data transformations to extract and store information for downstream consumption.
Claim 16 is analogous to claim 7, and is similarly rejected.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Dingman et al. US patent 9,430,114.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHELLY X. QIAN whose telephone number is (408)918-7599. The examiner can normally be reached Monday - Friday 8-5 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tony Mahmoudi can be reached on (571)272-4078. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SHELLY X QIAN/Examiner, Art Unit 2163                                                                                                                                                                                                        



/ALEX GOFMAN/Primary Examiner, Art Unit 2163