DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 3/9/2021 have been fully considered but they are not persuasive.
	Applicant states (pp. 10) that Najork does not disclose identifying a relevant data element by analyzing an identified attribute using predefined qualifier conditions on an attribute-based relevancy score. According to the instant specification [0042], relevance factor (i.e., relevancy score) refers to the predefined qualifier condition, and is either relevant when condition is met or irrelevant when condition is not met.
In Najork, a processing module might look for information (i.e., data elements) of a specific type (i.e., attribute) in the downloaded page, e.g., a URL. The crawler determines if the URL has not been seen before, or the downloaded page has changed by more than a threshold amount (i.e., predefined qualifier conditions). If so, the URL is considered relevant and enqueued to a to-be-crawled queue (1:48-64).
	Applicant argues (pp. 11) that Najork’s prioritization of URLs to be crawled is different from the importance factor in claim 1, which is a numerical score assigned to relevant data elements based on their associated web content.
Najork’s crawler prioritizes the URLs to be crawled (i.e., relevant data elements) to maximize the perceived accuracy or quality (i.e., importance factor) of downloaded pages (i.e., web content), by preferring pages from web servers with known high quality content (3:1-9), or pages whose content is known to change rapidly such as news sites (3:13-15). Najork maintains a set of parallel priority queues, each associated with a distinct priority level (i.e., numerical score). Every URL to be crawled is assigned a priority level, and stored in the corresponding priority queue (4:2-11).
Applicant also argues (pp. 11) that Najork does not teach identifying a relevant data element based on features associated with that data element.
Najork looks for information (i.e., data elements) of a specific type in the downloaded page, e.g., a URL. The crawler determines if the URL has not been seen before, or the downloaded page has changed by more than a threshold amount (i.e., features associated with data elements). If so, the URL is considered relevant and enqueued to a to-be-crawled queue (1:48-64).
Applicant further states (pp. 11) that Najork’s assigning of priority to URLs is not based on applicant’s assigning of chronological score to relevant data elements.
Najork’s crawler maintains a parallel set of priority queues of to-be-crawled URLs, each associated with a distinct priority level (4:3-11). A newly found URL is assigned a priority level (i.e., chronological score) based on properties (i.e., importance factor) of the URL or the web page on which the URL was found (4:18-22).
Najork does not disclose a crawler that first determines relevant data elements and then determines the importance factor for each of the identified data elements.
Najork performs 4 steps for every downloaded web page:
 Identifies a URL on the page.
 Determines if a URL from step (i) is relevant (i.e., importance factor).
 Determines the priority level (i.e., chronological score) of a relevant URL from step (ii).
 Add the URL from step (iii) to the corresponding priority queue.
Finally, Applicant argues (pp. 11-12) that Najork does not disclose the database arrangement of claim 1, because Najork stores to-be-crawled URLs in Frontier aggregated by priority levels, which is not a database; and stores processed URLs in a database without assigned priority levels.
Najork’s Frontier is a data store used internally by crawler for deciding which URLs to crawl next. Separately, Najork’s crawler includes a set of tools for storing an extensible set of data with each URL. These tools enable downstream processing modules to store a record of information associated with each download, each record being a set of name/value pairs (fig. 1, #139; 3:61-67). These records of information are added to a database of processed URLs, from which the download history can be processed (i.e., aggregated) offline (9:31-36), based on download time, last modify time, priority level (i.e., chronological score), etc.
	In summary, Najork does teach all limitations of claim 1.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 3-5, 7-10, 12 and 14-16 are rejected under 35 U.S.C. 102(1) as being anticipated by Najork et al. US patent 6,351,755 [herein “Najork”].
Claim 1 recites “A system for crawling, wherein the system includes a computer system for executing data processing tasks, wherein the system comprises: a data processing arrangement comprising a communication interface for accessing a wide area computer network and a crawler, wherein the crawler is configured to: receive at least one Uniform Resource Identifier;”
Najork teaches a web crawler system. When a new URL (i.e., Uniform Resource Identifier) is discovered (i.e., received), it is added to a to-be-crawled queue based on a predetermined policy (6:43-46).
Claim 1 further recites “retrieve source information associated with the at least one Uniform Resource Identifier, wherein the source information includes a pool of data elements;”
Najork downloads (i.e., retrieve) the page (i.e., source information) corresponding to the URL at the head of a selected queue (8:66-9:2). The downloaded page is processed by various modules to extract information (i.e., data elements), including URLs (1:48-64).
Claim 1 further recites “determine at least one relevant data element from the pool of data elements, wherein determining the at least one relevant data element includes: identifying at least one attribute associated with each data element in the pool of the data elements, wherein the at least one attribute associated with each data element refers to inherent properties of each data element,”
According to the instant specification [0042], relevance factor refers to the predefined qualifier condition, and is either relevant when condition is met or irrelevant when condition is not met.
In Najork, a processing module might look for information (i.e., data elements) of a specific type (i.e., attribute) in the downloaded page, e.g., a URL (1:48-52).
Claim 1 further recites “analyzing the at least one identified attribute, based on predefined qualifier conditions, for detecting a relevance factor for each data element, wherein predefined qualifier conditions signify a state of the at least one attribute and wherein the relevance factor for each data element refers to a condition that determines a relation of the data element, wherein the relation means either relevant or irrelevant, and using the relevance factor to determine the at least one relevant data element from the pool of data elements;”
Najork’s crawler determines if the URL has not been seen before (i.e., state), or the downloaded page has changed by more than a threshold amount (i.e., predefined qualifier conditions). If so, the URL is considered relevant (i.e., relevance factor) and enqueued to a to-be-crawled queue (1:50-64).
Claim 1 further recites “analyze the at least one relevant data element to determine an importance factor associated therewith, wherein the importance factor relates to an importance of each relevant data element of the at least one Uniform Resource Identifier;”
Najork’s crawler prioritizes the URLs (i.e., relevant data elements) to be crawled to maximize the perceived accuracy or quality (i.e., importance factor) of downloaded pages, by preferring pages from web servers with known high quality content (3:1-9), or pages whose content is known to change rapidly such as news sites (3:13-15).
Claim 1 further recites “assign a chronological score to each of the at least one relevant data element based on the determined importance factor thereof, wherein the chronological score refers to a numerical value that is used to arrange the at least one relevant data element; and”.
Najork’s crawler maintains (i.e., arranges) a parallel set of priority queues of to-be-crawled URLs, each associated with a distinct priority level (i.e., numerical value) (4:3-11). When crawler downloads a page, it extracts relevant URLs (i.e., relevant data elements) from the page and adds them to the queues based on priority (1:48-64). A newly found URL is assigned a priority level (i.e., chronological score) based on properties (i.e., importance factor) of the URL or the web page on which the URL was found (4:18-22).
Claim 1 further recites “crawl each of the at least one relevant data element based on the assigned chronological score thereof; and”.
Najork’s crawler repeatedly selects a URL from the queues in the order of priority from high to low, downloads and processes the page (2:57-62).
Claim 1 further recites “a database arrangement communicably coupled to the data processing arrangement, wherein the database arrangement is configured to aggregate the at least one relevant data element based on the assigned chronological score.”
Najork’s crawler includes a set of tools for storing an extensible set of data with each URL. These tools enable the processing modules to store a record of information associated with each download, each record being a set of name/value pairs (fig. 1, #139; 3:61-67). These records of information are added to a database of processed URLs, from which the download history can be processed (i.e., aggregated) offline (9:31-36), based on download time, last modify time, priority level (i.e., chronological score), etc.
Claims 10 and 16 are analogous to claim 1, and are similarly rejected.

Claim 3 recites “The system of claim 1, wherein the data processing arrangement is configured to generate an agent application.”
Najork uses multiple concurrent threads (i.e., agent applications) to process URLs in a set of priority queues respecting the priority order (4:3-11).

Claim 4 recites “The system of claim 3, wherein the agent application receives the at least one Uniform Resource Identifier.”
Najork uses multiple concurrent threads (i.e., agent applications) to process URLs in a set of priority queues respecting the priority order (4:3-11). Each thread takes the URL at the head of a selected queue and downloads (i.e., retrieves) the corresponding page (8:66-9:2).

Claim 5 recites “The system of claim 1, wherein the data element includes any one of: hyperlinks, documents, text, metadata associated with the one or more elements.”
In Najork, a processing module of the crawler might look for information (i.e., data elements) of a specific type in the downloaded page, e.g., a URL (i.e., hyperlink) (1:48-64).
Claim 12 is analogous to claim 5, and is similarly rejected.

Claim 7 recites “The system of claim 1, wherein the predefined qualifier conditions include any one of: a relevant type associate with each data element; and at least one relevant feature associate with each data element.”
In Najork, a processing module might look for information (i.e., data elements) of a specific type (i.e., relevant type) in the downloaded page, e.g., a URL. The crawler determines if the URL has not been seen before, or the downloaded page has changed by more than a threshold amount (i.e., predefined qualifier conditions). If so, the URL is considered relevant and enqueued for later processing (1:48-64).
Claim 14 is analogous to claim 7, and is similarly rejected.

Claim 8 recites “The system of claim 1, wherein the importance factor is determined based on web content associated with the at least one relevant data element.”
Najork’s crawler prioritizes the URLs to be downloaded to maximize the perceived accuracy or quality (i.e., importance factor) of downloaded pages (i.e., data elements), by preferring pages (i.e., web content) from web servers with known high quality content (3:1-9), or pages whose content is known to change rapidly such as news sites (3:13-15).
Claim 15 is analogous to claim 8, and is similarly rejected.

Claim 9 recites “The system of claim 1, wherein the database arrangement includes a data storage unit, wherein the data storage unit is configured to aggregate the at least one relevant data element based on the assigned chronological score.”
Najork’s crawler includes a set of tools for storing an extensible set of data with each URL. These tools enable the processing modules to store a record of information associated with each download, each record being a set of name/value pairs (fig. 1, #139; 3:61-67). These records of information are added to a database (i.e., data storage unit) of processed URLs, from which the download history can be processed (i.e., aggregated) offline (9:31-36), based on download time, last modify time, priority level (i.e., chronological score), etc.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Najork as applied to claim 1 above, and further in view of Najork. US patent 7,139,747 [herein “Najork2”].
Claim 2 recites “The system of claim 1, wherein the crawler is implemented in a distributed architecture.”
Najork teaches claim 1, but does not disclose this claim; however, Najork2 achieves efficient crawling by distributing the URLs to be downloaded among a plurality of crawlers interconnected via a network (Najork2: 1:59-66).
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Najork with Najork2. One having ordinary skill in the art would have found motivation to incorporate the distributed architecture of Najork2 into Najork’s crawler system to greatly improve downloading and processing performance.

Claims 6 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Najork as applied to claims 1 and 10 above respectively, and further in view of Sun et al. US patent application 2013/0024441 [herein “Sun”].
Claim 6 recites “The system of claim 1, wherein the at least one attribute associated with each data element includes any one of: a type associate with each data element; and a feature associate with each data element.”
In Najork, a processing module of the crawler might look for information (i.e., data elements) of a specific type in the downloaded page, e.g., a URL (i.e., hyperlink) (1:48-64). Najork teaches claim 1, but does not disclose this claim; however, Sun represents a downloaded page as a DOM tree (Sun: [0020]), and uses the type of a node (i.e., data element) in the tree to determine how to extract information, e.g., text, image, link, etc.
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Najork with Sun. One having ordinary skill in the art would have found motivation to utilize Sun’s DOM representation when processing downloaded pages in Najork, to facilitate easy extraction of URLs from pages for further processing.
Claim 13 is analogous to claim 6, and is similarly rejected.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHELLY X. QIAN whose telephone number is (408)918-7599.  The examiner can normally be reached on Monday - Friday 8-5 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tony Mahmoudi can be reached on (571)272-4078.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SHELLY X QIAN/Examiner, Art Unit 2163                                                                                                                                                                                                        



/ALEX GOFMAN/Primary Examiner, Art Unit 2163