Skip to content

Quarke/Project4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CONFIGURATION FILE PARAMETERS

PARTNERS: tflanag2 and dmattia

Parameter       Description                                                 Default
PERIOD_FETCH    The time (in seconds) between fetches of the various sites  180
NUM_FETCH       Number of fetch threads (1 to 8)                            1
NUM_PARSE       Number of parsing threads (1 to 8)                          1
SEARCH_FILE     File containing the search strings                          Search.txt
SITE_FILE       File containing the sites to query                          Sites.txt

The program operates in 5 main stages. 

1) Setup
    The main function executes and spawn threads, parses the configuration file, and sets up various variables to prepare the following sections to run. It exits on appropriate errors such as invalid files or parameters. 

2) Fetch
    The setup phase added websites to a threadsafe queue protected by a mutex and cond_var. The threads initailized and spun off for the fetching function grab websites from this queue, initializa CURL objects and make requests. It packs the return from this libcurl requests into a website struct and places that into a threadsafe parse queue again protected by a mutex and cond_var.

3) Parse
    The parse threads wake up when an item is placed in the parse_queue. Each parse thread grabs a website object as they are available. It takes a time signature and parse the website for each term in the termlist. It takes counts of these terms and places them into a OUTPUT struct. Each output struct is them placed onto the output_queue. waking the output thread. 

4) Output
    A single thread wakes when the output queue recieves and OUTPUT struct. It parses the struct for the appropriate data and writes to a html file along with other html text. A sigalarm switches these files every time a batch intiates.

5) Cleanup
    A SIGINT or SIGTERM triggers cleanup. This cancels threads and closes all relevent objects, and exits gracefullly. 

A sigalarm function controls when a batch is added to the fetch_queue and controls the output file. 

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •