Module backend.sources.sources_controller_start

Controller to start and manage the Sources Scraper.

This script manages the execution of the scraping processes by spawning threads to run specified jobs. It identifies the correct working directory and starts two processes: - source(): Executes the scraping process. - reset(): Resets any failed jobs.

The script determines the working directory based on the location of the job_sources.py script.

Dependencies

  • threading
  • subprocess
  • psutil
  • time
  • os
  • sys
  • inspect
  • Custom libraries: lib_logger, lib_helper, lib_db

Classes

class SourcesController

A controller class for managing the Sources Scraper.

Attributes

args : list
List of arguments for the stop method (not used in start).
db : object
Database object (not used in start method).

Methods

init(): Initializes the SourcesController object. del(): Destructor for the SourcesController object. start(workingdir): Starts the scraper by launching two jobs in separate threads.

Initializes the SourcesController object.

Expand source code
class SourcesController:
    """
    A controller class for managing the Sources Scraper.

    Attributes:
        args (list): List of arguments for the `stop` method (not used in `start`).
        db (object): Database object (not used in `start` method).

    Methods:
        __init__(): Initializes the SourcesController object.
        __del__(): Destructor for the SourcesController object.
        start(workingdir): Starts the scraper by launching two jobs in separate threads.
    """

    def __init__(self):
        """
        Initializes the SourcesController object.
        """
        # Initialization logic here (if any)
        pass

    def __del__(self):
        """
        Destructor for the SourcesController object.

        Prints a message when the SourcesController object is destroyed.
        """
        print('Sources Controller object destroyed')

    def start(self, workingdir):
        """
        Starts the Sources Scraper by opening two jobs in separate threads:
        
        - `source()`: Calls `job_sources.py` to start the scraping process.
        - `reset()`: Calls `job_reset_sources.py` to reset failed jobs.

        Args:
            workingdir (str): The directory containing the job scripts.
        """

        def source():
            """
            Executes the job_sources.py script to start the scraping process.
            """
            job = 'python ' + os.path.join(workingdir, "jobs", 'job_sources.py')
            os.system(job)   

        def reset():
            """
            Executes the job_reset_sources.py script to reset failed jobs.
            """
            job = 'python ' + os.path.join(workingdir, "jobs", 'job_reset_sources.py')
            os.system(job)              

        # Start threads for the defined job functions
        process1 = threading.Thread(target=source)
        process1.start()

        process2 = threading.Thread(target=reset)
        process2.start()

Methods

def start(self, workingdir)

Starts the Sources Scraper by opening two jobs in separate threads:

  • source(): Calls job_sources.py to start the scraping process.
  • reset(): Calls job_reset_sources.py to reset failed jobs.

Args

workingdir : str
The directory containing the job scripts.