Module backend.sources.sources_controller_stop
Controller to start and manage the Sources Scraper.
This script is responsible for managing the lifecycle of the Sources Scraper, including stopping processes and resetting the database. It interacts with processes and web browsers to terminate them gracefully and performs necessary cleanup actions.
Dependencies
- threading
- subprocess
- psutil
- time
- os
- sys
- inspect
- Custom libraries: lib_logger, lib_helper, lib_db
Classes
class SourcesController-
A controller class for managing the Sources Scraper processes.
Attributes
args:list- Arguments for the
stopmethod.
- args[0] (list): List of browser process names to be terminated.
- db (object): Database object for interacting with the database.
Methods
init(): Initializes the SourcesController object. del(): Destructor for the SourcesController object. stop(args: list): Stops specified processes and resets the database.
Initializes the SourcesController object.
Expand source code
class SourcesController: """ A controller class for managing the Sources Scraper processes. Attributes: args (list): Arguments for the `stop` method. - args[0] (list): List of browser process names to be terminated. - db (object): Database object for interacting with the database. Methods: __init__(): Initializes the SourcesController object. __del__(): Destructor for the SourcesController object. stop(args: list): Stops specified processes and resets the database. """ def __init__(self): """ Initializes the SourcesController object. """ # Initialization logic (if any) should be added here pass def __del__(self): """ Destructor for the SourcesController object. Prints a message when the SourcesController object is destroyed. """ print('Sources Controller object destroyed') def stop(self, args): """ Stops the scraper processes and resets the database. This method identifies and terminates Python processes and specific browser instances. It also performs a cleanup operation to reset the database entries. Args: args (list): A list where: - args[0] (list): List of browser process names (e.g., "chrome", "chromium"). - db (object): Database object for resetting the database. Raises: ValueError: If the database object is not provided in `args`. """ # List of processes related to the scraper to be killed processes_to_kill = ["job_sources.py", "sources_scraper.py", "sources_reset.py", "job_reset_sources.py", "sources_controller_start.py"] kill_browser = False # Iterate over all running processes for proc in psutil.process_iter(attrs=['pid', 'name', 'cmdline']): try: if "python" in proc.info['name']: # Check if the process name or command line matches any in the kill list if proc.info['cmdline']: if any(name in proc.info['name'] or name in proc.info['cmdline'] for name in processes_to_kill): proc.kill() # Kill the process kill_browser = True # Kill browser processes specified in the arguments if kill_browser: for browser in args[0]: if browser in proc.info['name'] or browser in proc.info['cmdline']: proc.kill() except Exception as e: pass # Wait for processes to terminate before resetting the database time.sleep(60) # Reset the database db.reset(job_server)Methods
def stop(self, args)-
Stops the scraper processes and resets the database.
This method identifies and terminates Python processes and specific browser instances. It also performs a cleanup operation to reset the database entries.
Args
args:list- A list where: - args[0] (list): List of browser process names (e.g., "chrome", "chromium"). - db (object): Database object for resetting the database.
Raises
ValueError- If the database object is not provided in
args.