Module backend.backend_controller_stop

Script to manage and clean up processes and database entries related to the scraping application.

This script performs the following tasks: 1. Terminates specific running processes based on their command-line arguments. 2. Updates the database by removing obsolete classifier results and resetting pending sources.

Dependencies

  • psutil: For managing and killing processes.
  • time: For time-related operations (though not used directly in this script).
  • os: For interacting with the operating system, including file path operations.
  • sys: For system-specific parameters and functions (though not used directly in this script).
  • inspect: For inspecting live objects and obtaining the current file's directory.
  • json: For parsing the database configuration file.
  • psycopg2: For PostgreSQL database connection and operations.

Functions

def delete_source_pending(source_id, db_conn, job_server)

Delete a source entry from the database based on its ID.

Args

source_id : int
Identifier of the source to delete.
db_conn : dict
Database connection parameters.
def get_sources_pending(db_conn, job_server)

Retrieve all sources with pending status (progress = 2) from the database.

Args

db_conn : dict
Database connection parameters.

Returns

list
List of tuples containing source ID and creation timestamp.
def load_db_config(config_path)

Load the database configuration from a JSON file.

Args

config_path : str
Path to the JSON configuration file.

Returns

dict
Database configuration parameters.
def reset_classifiers(result, db_conn, job_server)

Reset classifier-related entries for a specific result in the database.

Args

result : str
Identifier of the result to reset.
db_conn : dict
Database connection parameters.
def reset_result_source(source_id, db_conn, job_server)

Remove entries from the result_source table related to a specific source ID.

Args

source_id : int
Identifier of the source to reset.
db_conn : dict
Database connection parameters.
def terminate_processes(args: list)

Terminate specific processes based on their command-line arguments.

Processes are terminated if their command-line arguments contain specific keywords.

def update_pending_jobs(db_conn, job_server)

Update pending jobs in the database by resetting progress status and removing obsolete entries.

Args

db_conn : dict
Database connection parameters.