Documentation

Basic usage

Basic usage:

from pySmartDL import SmartDL

url = "https://github.com/iTaybb/pySmartDL/raw/master/test/7za920.zip"
dest = "C:\\Downloads\\" # or '~/Downloads/' on linux

obj = SmartDL(url, dest)
obj.start()
# [*] 0.23 Mb / 0.37 Mb @ 88.00Kb/s [##########--------] [60%, 2s left]

path = obj.get_dest()

For more examples please refer to the Code Examples page.

pySmartDL.SmartDL (main class)

class pySmartDL.SmartDL(urls, dest=None, progress_bar=True, fix_urls=True, threads=5, timeout=5, logger=None, connect_default_logger=False, request_args=None, verify=True)

The main SmartDL class

Parameters:
  • urls (string or list of strings) – Download url. It is possible to pass unsafe and unicode characters. You can also pass a list of urls, and those will be used as mirrors.
  • dest (string) – Destination path. Default is %TEMP%/pySmartDL/.
  • progress_bar (bool :param fix_urls: If true, attempts to fix urls with unsafe characters. :type fix_urls: bool :param threads: Number of threads to use. :type threads: int) – If True, prints a progress bar to the stdout stream. Default is True.
  • timeout – Timeout for network operations, in seconds. Default is 5. :type timeout: int
  • logger (logging.Logger instance) – An optional logger.
  • connect_default_logger (bool) – If true, connects a default logger to the class.
  • request_args (dict) – Arguments to be passed to a new urllib.request.Request instance in dictionary form. See urllib.request docs for options.
  • verify (bool) – If ssl certificates should be validated.
Return type:

SmartDL instance

Note

The provided dest may be a folder or a full path name (including filename). The workflow is:

  • If the path exists, and it’s an existing folder, the file will be downloaded to there with the original filename.
  • If the past does not exist, it will create the folders, if needed, and refer to the last section of the path as the filename.
  • If you want to download to folder that does not exist at the moment, and want the module to fill in the filename, make sure the path ends with os.sep.
  • If no path is provided, %TEMP%/pySmartDL/ will be used.
add_basic_authentication(username, password)

Uses HTTP Basic Access authentication for the connection.

Parameters:
  • username (string) – Username.
  • password (string) – Password.
add_hash_verification(algorithm, hash)

Adds hash verification to the download.

If hash is not correct, will try different mirrors. If all mirrors aren’t passing hash verification, HashFailedException Exception will be raised.

Note

If downloaded file already exist on the destination, and hash matches, pySmartDL will not download it again.

Warning

The hashing algorithm must be supported on your system, as documented at hashlib documentation page.

Parameters:
  • algorithm (string) – Hashing algorithm.
  • hash (string) – Hash code.
fetch_hash_sums()

Will attempt to fetch UNIX hash sums files (SHA256SUMS, SHA1SUMS or MD5SUMS files in the same url directory).

Calls self.add_hash_verification if successful. Returns if a matching hash was found.

Return type:bool

New in 1.2.1

start(blocking=None)

Starts the download task. Will raise RuntimeError if it’s the object’s already downloading.

Warning

If you’re using the non-blocking mode, Exceptions won’t be raised. In that case, call isSuccessful() after the task is finished, to make sure the download succeeded. Call get_errors() to get the the exceptions.

Parameters:blocking (bool) – If true, calling this function will block the thread until the download finished. Default is True.
get_eta(human=False)

Get estimated time of download completion, in seconds. Returns 0 if there is no enough data to calculate the estimated time (this will happen on the approx. first 5 seconds of each download).

Parameters:human (bool) – If true, returns a human-readable formatted string. Else, returns an int type number
Return type:int/string
get_speed(human=False)

Get current transfer speed in bytes per second.

Parameters:human (bool) – If true, returns a human-readable formatted string. Else, returns an int type number
Return type:int/string
get_progress()

Returns the current progress of the download, as a float between 0 and 1.

Return type:float
get_progress_bar(length=20)

Returns the current progress of the download as a string containing a progress bar.

Note

That’s an alias for pySmartDL.utils.progress_bar(obj.get_progress()).

Parameters:length (int) – The length of the progress bar in chars. Default is 20.
Return type:string
isFinished()

Returns if the task is finished.

Return type:bool
isSuccessful()

Returns if the download is successfull. It may fail in the following scenarios:

  • Hash check is enabled and fails.
  • All mirrors are down.
  • Any local I/O problems (such as no disk space available).

Note

Call get_errors() to get the exceptions, if any.

Will raise RuntimeError if it’s called when the download task is not finished yet.

Return type:bool
get_errors()

Get errors happened while downloading.

Return type:list of Exception instances
get_status()

Returns the current status of the task. Possible values: ready, downloading, paused, combining, finished.

Return type:string
wait(raise_exceptions=False)

Blocks until the download is finished.

Parameters:raise_exceptions (bool) – If true, this function will raise exceptions. Default is False.
stop()

Stops the download.

pause()

Pauses the download.

resume()

Continues the download. same as unpause().

unpause()

Continues the download. same as resume().

limit_speed(speed)

Limits the download transfer speed.

Parameters:speed (int) – Speed in bytes per download per second. Negative values will not limit the speed. Default is -1.
get_dest()

Get the destination path of the downloaded file. Needed when no destination is provided to the class, and exists on a temp folder.

Return type:string
get_dl_time(human=False)

Returns how much time did the download take, in seconds. Returns -1 if the download task is not finished yet.

Parameters:human (bool) – If true, returns a human-readable formatted string. Else, returns an int type number
Return type:int/string
get_dl_size(human=False)

Get downloaded bytes counter in bytes.

Parameters:human (bool) – If true, returns a human-readable formatted string. Else, returns an int type number
Return type:int/string
get_final_filesize(human=False)

Get total download size in bytes.

Parameters:human (bool) – If true, returns a human-readable formatted string. Else, returns an int type number
Return type:int/string
get_data(binary=False, bytes=-1)

Returns the downloaded data. Will raise RuntimeError if it’s called when the download task is not finished yet.

Parameters:
  • binary (bool) – If true, will read the data as binary. Else, will read it as text.
  • bytes (int) – Number of bytes to read. Negative values will read until EOF. Default is -1.
Return type:

string

get_data_hash(algorithm)

Returns the downloaded data’s hash. Will raise RuntimeError if it’s called when the download task is not finished yet.

Parameters:algorithm (bool) – Hashing algorithm.
Return type:string

Warning

The hashing algorithm must be supported on your system, as documented at hashlib documentation page.

get_json()

Returns the JSON in the downloaded data. Will raise RuntimeError if it’s called when the download task is not finished yet. Will raise json.decoder.JSONDecodeError if the downloaded data is not valid JSON.

Return type:dict

Exceptions

The following exceptions may be raised:

exception pySmartDL.HashFailedException

May be raised when hash check fails.

exception pySmartDL.CanceledException

Raised when user cancels the task with SmartDL.stop().

exception urllib2.HTTPError

May be raised due to problems with the servers. Read more on the official documentation.

exception urllib2.URLError

May be raised due to problems while reaching the servers. Read more on the official documentation.

exception exceptions.IOError

May be raised due to any local I/O problems (such as no disk space available). Read more on the official documentation.

Warning

If you’re using the non-blocking mode, Exceptions won’t be raised. In that case, call isSuccessful() after the task is finished, to make sure the download succeeded. Call get_errors() to get the the exceptions.

pySmartDL.utils (helper class)

The Utils class contains many functions for project-wide use.

pySmartDL.utils.combine_files(parts, dest, chunkSize=4194304)

Combines files.

param parts:Source files.
type parts:list of strings
param dest:Destination file.
type dest:string
Parameters:chunkSize – Fetching chunk size. :type chunkSize: int
pySmartDL.utils.url_fix(s, charset='utf-8')

Sometimes you get an URL by a user that just isn’t a real URL because it contains unsafe characters like ‘ ‘ and so on. This function can fix some of the problems in a similar way browsers handle data entered by the user:

>>> url_fix(u'http://de.wikipedia.org/wiki/Elf (Begriffsklärung)')
'http://de.wikipedia.org/wiki/Elf%20%28Begriffskl%C3%A4rung%29'
Parameters:
  • s (string) – Url address.
  • charset (string) – The target charset for the URL if the url was given as unicode string. Default is ‘utf-8’.
Return type:

string

(taken from werkzeug.utils)

pySmartDL.utils.progress_bar(progress, length=20)

Returns a textual progress bar.

>>> progress_bar(0.6)
'[##########--------]'
Parameters:
  • progress (float) – Number between 0 and 1 describes the progress.
  • length (int) – The length of the progress bar in chars. Default is 20.
Return type:

string

pySmartDL.utils.is_HTTPRange_supported(url, timeout=15)

Checks if a server allows Byte serving, using the Range HTTP request header and the Accept-Ranges and Content-Range HTTP response headers.

Parameters:
  • url (string) – Url address.
  • timeout (int) – Timeout in seconds. Default is 15.
Return type:

bool

pySmartDL.utils.get_filesize(url, timeout=15)

Fetches file’s size of a file over HTTP.

Parameters:
  • url (string) – Url address.
  • timeout (int) – Timeout in seconds. Default is 15.
Returns:

Size in bytes.

Return type:

int

pySmartDL.utils.get_random_useragent()

Returns a random popular user-agent. Taken from here, last updated on 2020/09/19.

Returns:user-agent
Return type:string
pySmartDL.utils.sizeof_human(num)

Human-readable formatting for filesizes. Taken from here.

>>> sizeof_human(175799789)
'167.7 MB'
Parameters:num (int) – Size in bytes.
Return type:string
pySmartDL.utils.time_human(duration, fmt_short=False, show_ms=False)

Human-readable formatting for timing. Based on code from here.

>>> time_human(175799789)
'6 years, 2 weeks, 4 days, 17 hours, 16 minutes, 29 seconds'
>>> time_human(589, fmt_short=True)
'9m49s'
Parameters:
  • duration (int/float) – Duration in seconds.
  • fmt_short (bool) – Format as a short string (47s instead of 47 seconds)
  • show_ms (bool) – Specify milliseconds in the string.
Return type:

string

pySmartDL.utils.get_file_hash(algorithm, path)

Calculates a file’s hash.

Warning

The hashing algorithm must be supported on your system, as documented at hashlib documentation page.

Parameters:
  • algorithm (string) – Hashing algorithm.
  • path (string) – The file path
Return type:

string

pySmartDL.utils.calc_chunk_size(filesize, threads, minChunkFile)

Calculates the byte chunks to download.

Parameters:
  • filesize (int) – filesize in bytes.
  • threads (int) – Number of trheads
  • minChunkFile (int) – Minimum chunk size
Return type:

Array of (startByte,endByte) tuples

pySmartDL.utils.create_debugging_logger()

Creates a debugging logger that prints to console.

Return type:logging.Logger instance
class pySmartDL.utils.DummyLogger

A dummy logger. You can call debug(), warning(), etc on this object, and nothing will happen.

class pySmartDL.utils.ManagedThreadPoolExecutor(max_workers)

Managed Thread Pool Executor. A subclass of ThreadPoolExecutor.

submit(fn, *args, **kwargs)

Submits a callable to be executed with the given arguments.

Schedules the callable to be executed as fn(*args, **kwargs) and returns a Future instance representing the execution of the callable.

Returns:
A Future representing the given call.
get_exceptions()

Return all the exceptions raised.

Return type:List of Exception instances
get_exception()

Returns only the first exception. Returns None if no exception was raised.

Return type:Exception instance