DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments, see pp. 11, filed 10/13/2022, with respect to the rejection of claim 1 under 35 U.S.C. 103 have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground of rejection is made in view of Rahm.
Applicant states (pp. 11) that Najork does not teach the amended limitation “wherein the source information automatically terminates an infinite loop of the Uniform Resource Identifier and identifies dynamic resources and dummy resources,” This is taught instead by Rahm.
The instant specification [0021] teaches that the source information associated with the Uniform Resource Identifier enables the system to identify and crawl dynamic websites and dummy websites, and to terminate an infinite loop of Uniform Resource Identifiers.
Rahm detects a spam poison (i.e., dynamic and dummy) web site that traps (i.e., infinite loop) a web crawler by dynamically generating unlimited number of URLs within a particular domain. Rahm determines that a currently visited page has the same or very similar content as one or more already-visited pages, and takes corrective actions such as browsing to a different domain (Rahm: [0018]).
Therefore, one having ordinary skill in the art would be motivated to utilize Rahm in Najork to avoid the crawler being bogged down in spam poison web sites.
	Applicant further states (pp. 12) that Najork does not teach the claim element “agent application”. Examiner respectfully disagrees.
According to the instant specification, agent application [0031] is any set of instructions executable by a computer to configure the computer to perform a task. Source information [0036] describes the placement and operations (i.e., features and functioning) of the data element in a user-viewable hypertext document, such as CSS and type of content (e.g., text or video).
Najork’s crawler (i.e., agent application) downloads (i.e., retrieves) the page (i.e., user-viewable hypertext document) together with metadata (i.e., source information) (11:1-10) associated with the top URL of the priority queues (8:66-9:2).
In summary, Najork combined with Rahm teaches the argued limitations of independent claims 1, 10 and 16.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 8-10 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Najork et al. US patent 6,351,755 [herein “Najork”], and further in view of Rahm. US patent application 2009/0287641 [herein “Rahm”].
Claim 1 recites “A system for aggregating data elements from a pool of data elements associated with at least one Uniform Resource Identifier using a crawler, wherein the system includes a computer system for executing data processing tasks, wherein the system comprises: a data processing arrangement comprising a communication interface for accessing a wide area computer network and the crawler, wherein the crawler is configured to: (a) receive the at least one Uniform Resource Identifier, wherein the at least one Uniform Resource Identifier enables locating and extracting a resource stored in the wide area computer network;”
Najork teaches a web crawler system that locates and downloads web pages (i.e., resources) over networks (1:31-39). When a new URL (i.e., Uniform Resource Identifier or URI) is discovered (i.e., received), it is added to one of the priority queues based on a predetermined policy (6:43-46).
Claim 1 further recites “(b) retrieve a source information included in an agent application, that receives the at least one Uniform Resource Identifier and provides the resource, wherein the source information are instructions to define features and functioning associated with the resource,”
According to the instant specification, agent application [0031] is any set of instructions executable by a computer to configure the computer to perform a task. Source information [0036] describes the placement and operations (i.e., features and functioning) of the data element in a user-viewable hypertext document, such as CSS and type of content (e.g., text or video).
Najork’s crawler (i.e., agent application) downloads (i.e., retrieves) the page (i.e., user-viewable hypertext document) together with metadata (i.e., source information) (11:1-10) associated with the top URL of the priority queues (8:66-9:2).
Claim 1 further recites “wherein the source information automatically terminates an infinite loop of the Uniform Resource Identifier and identifies dynamic resources and dummy resources,”
The instant specification [0021] teaches that the source information associated with the Uniform Resource Identifier enables the system to identify and crawl dynamic and dummy websites, and to terminate an infinite loop of Uniform Resource Identifiers.
Najork does not disclose this limitation; however, Rahm detects a spam poison (i.e., dynamic and dummy) web site that traps a web crawler (i.e., infinite loop) by dynamically generating unlimited number of URLs within a particular domain. Rahm determines that a currently visited page has the same or very similar content as one or more already-visited pages, and takes corrective actions (Rahm: [0018]).
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to apply the teachings of Rahm to Najork. One having ordinary skill in the art would have found motivation to utilize Rahm in Najork to avoid the crawler being bogged down in spam poison web sites.
Claim 1 further recites “wherein the source information includes the pool of data elements that constitute the resource, wherein the data elements include any one of: hyperlinks, documents, text, or metadata associated with the data elements;”
Najork processes the page by various modules to extract information (i.e., pool of data elements), including the type of content in the page, and URLs (i.e., hyperlinks) contained in the page that are to be crawled (1:48-64).
Claim 1 further recites “(c) determine at least one relevant data element from the pool of data elements, wherein determining the at least one relevant data element includes: identifying at least one attribute associated with each of the data elements in the pool of the data elements, wherein the at least one attribute associated with each of the data elements includes any one of: a type associated with each of the data elements and/or a feature associated with each of the data elements, wherein the type describes a category to which each of the data elements belongs, and the feature describes a characteristic of each of the data elements;”
Najork extracts (i.e., identifies) and stores information (i.e., attributes) about the page. Examples of various data collected about the page include its MIME type (i.e., category) and size (i.e., characteristic), date/time and duration of the download, date/time of last modification and expiration, etc. (1:65-2:4).
Claim 1 further recites “evaluating any one of: the type associated with each of the data elements and/or the feature associated with each of the data elements, based on predefined qualifier conditions to generate at least one evaluated attribute associated with each of the data elements, wherein the predefined qualifier conditions include at least one relevant type and at least one relevant feature associated with each of the data elements;”
In Najork, a processing module might look for information of a specific type (i.e., relevant type) in the page. A processing module might also determine if the page has been indexed (i.e., evaluated attribute), or if the page has changed by more than a threshold amount (i.e., relevant feature) (1:48-64), or if a URL contained in the page has been visited (8:10-15).
Claim 1 further recites “determining a relevance factor for each of the data elements based on the generated at least one evaluated attribute associated with each of the data elements, wherein the relevance factor for each of the data elements refers to a condition that determines a relation of each of the data elements, wherein the relation means either relevant or irrelevant; and using the relevance factor to determine the at least one relevant data element from the pool of data elements;”
In Najork, a processing module might look for information of a specific type in the page. A processing module might also determine if the page has been indexed, or if the page has changed by more than a threshold amount, or if a URL contained in the page has been visited. If such a qualifier condition is satisfied, the corresponding data element is considered relevant (i.e., relevance factor) (1:48-64). In particular, when a relevant data element is a URL, it should be crawled further.
Claim 1 further recites “(d) analyze the determined at least one relevant data element to determine an importance factor associated therewith, wherein the importance factor relates to an importance of each of the at least one relevant data element of the at least one Uniform Resource Identifier, wherein the importance factor assigned to each of the at least one relevant data element is a numerical value;”
Najork maintains a parallel set of priority queues of to-be-crawled URLs, each associated (i.e., assigned) with a distinct priority level (i.e., numerical value) (4:3-11). When the identified relevant data element is a URL, Najork adds it to one of the priority queues (1:50-53). Najork prioritizes the to-be-crawled URLs to maximize the perceived accuracy or quality (i.e., importance factor) of the pages from crawling, by preferring (i.e., more important) pages from web servers known for high quality content (3:1-9), or pages whose content is known to change rapidly such as news sites (3:13-15). In other words, priority level is a numerical value combining one or more properties of the URL or page on accuracy and quality (4:18-22).
Claim 1 further recites “(e) assign a chronological score to each of the at least one relevant data element based on the determined importance factor thereof, wherein the chronological score refers to a numerical value that is used to arrange the at least one relevant data element; and”.
Najork maintains a parallel set of priority queues of to-be-crawled URLs, each associated (i.e., assigned) with a distinct priority level (i.e., numerical value) (4:3-11). When the identified relevant data element is a URL, Najork adds it to one of the priority queues (1:50-53). Najork prioritizes the to-be-crawled URLs to maximize the perceived accuracy or quality of the pages from crawling, by preferring pages from web servers known for high quality content (3:1-9), or pages whose content is known to change rapidly such as news sites – freshness or frequency of change (i.e., chronological score) (3:13-22).
Claim 1 further recites “(f) crawl each of the at least one relevant data element based on the assigned chronological score thereof;”
Najork repeatedly selects the top URL from the priority queues in the order of priority from high to low (i.e., chronological score), and downloads (i.e., crawls) and processes the associated page (2:57-62).
Claim 1 further recites “a database arrangement communicably coupled to the data processing arrangement, wherein the database arrangement is configured to aggregate each of the at least one relevant data element based on the assigned chronological score associated with each of the at least one relevant data element.”
Najork’ provides a set of tools for storing an extensible set of data with each URL. These tools enable the processing modules to store a record of information associated with each download, each record being a set (i.e., pool) of name/value pairs (fig. 1, #139; 3:61-67). These records of information are added to a database of processed URLs, from which the download history can be processed (i.e., aggregated) offline (9:31-36), based on download time, last modify time, priority level (i.e., chronological score), etc.
Claims 10 and 16 are analogous to claim 1, and are similarly rejected.

Claim 3 recites “The system of claim 1, wherein the data processing arrangement is configured to generate the agent application.”
Najork uses multiple concurrent threads (i.e., agent applications) to process URLs in a set of priority queues respecting the priority order (4:3-11).

Claim 8 recites “The system of claim 1, wherein the importance factor is determined based on web content associated with the at least one relevant data element.”
Najork’s crawler prioritizes the to-be-crawled URLs to maximize the perceived accuracy or quality (i.e., importance factor) of downloaded pages (i.e., data elements), by preferring pages (i.e., web content) from web servers with known high quality content (3:1-9), or pages whose content is known to change rapidly such as news sites (3:13-15).
Claim 15 is analogous to claim 8, and is similarly rejected.

Claim 9 recites “The system of claim 1, wherein the database arrangement includes a data storage unit, wherein the data storage unit is configured to aggregate the at least one relevant data element based on the assigned chronological score.”
Najork’s crawler includes a set of tools for storing an extensible set of data with each URL. These tools enable the processing modules to store a record of information associated with each download, each record being a set of name/value pairs (fig. 1, #139; 3:61-67). These records of information are added to a database (i.e., data storage unit) of processed URLs, from which the download history can be processed (i.e., aggregated) offline (9:31-36), based on download time, last modify time, priority level (i.e., chronological score), etc.

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Najork as applied to claim 1 above, in view of Rahm, and further in view of Najork. US patent 7,139,747 [herein “Najork2”].
Claim 2 recites “The system of claim 1, wherein the crawler is implemented in a distributed architecture.”
Najork teaches claim 1, but does not disclose this claim; however, Najork2 achieves efficient crawling by distributing the URLs to be downloaded among a plurality of crawlers interconnected via a network (Najork2: 1:59-66).
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to apply the teachings of Najork2 to Najork. One having ordinary skill in the art would have found motivation to incorporate the distributed architecture of Najork2 into Najork’s crawler system to greatly improve downloading and processing performance.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHELLY X. QIAN whose telephone number is (408)918-7599. The examiner can normally be reached Monday - Friday 8-5 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tony Mahmoudi can be reached on (571)272-4078. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SHELLY X QIAN/Examiner, Art Unit 2163                                                                                                                                                                                                        



/ALEX GOFMAN/Primary Examiner, Art Unit 2163