fat_llama

Fat Llama Logo

Fat Llama build - status PyPI PyPI - Downloads

fat_llama is a Python package for upscaling audio files to FLAC or WAV formats using advanced audio processing techniques. It utilizes CUDA-accelerated calculations to enhance audio quality by upsampling and adding missing frequencies through FFT (Fast Fourier Transform), resulting in richer and more detailed audio.

Features

Requirements

(Note: For cpu verison please look at https://pypi.org/project/fat-llama-fftw/)

Installation

Install via pip:

pip install fat-llama

Note: This version works with CUDA 12.

Further need CUDA & CuPy properly installed: https://docs.cupy.dev/en/stable/install.html

Also, requires ffmpeg: https://support.audacityteam.org/basics/installing-ffmpeg

Note to install on older versions of CUDA and CuPy. You will need to download specific versions and install locally.

To install locally:

git clone <target_url>
cd fat_llama
pip install .

Usage

Example Usage

You can run the example provided in example.py:

from fat_llama.audio_fattener.feed import upscale

# Example call to the method
upscale(
    input_file_path='input_test.mp3',
    output_file_path='output_test.flac',
    source_format='mp3',
    target_format='flac',
    max_iterations=300,
    threshold_value=0.6,
    target_bitrate_kbps=1400,
    toggle_normalize=True,
    toggle_autoscale=True,
    toggle_adaptive_filter=True
)

Function Parameters

Running the Example

To run the example, execute the following command:

python example.py

This will upscale the MP3 file specified in the example and produce a FLAC file with full processing.

Spectrogram Results

Spectrogram Results

How it works

How it Works

Algorithm Explanation

The upscaling process involves several steps:

  1. Reading Audio File: The audio file is read, and the audio samples are extracted along with the sample rate and bitrate.
  2. Calculating Upscale Factor: The upscale factor is calculated to achieve the target bitrate.
  3. Upscaling Channels: The audio channels are upscaled using an interpolation algorithm. Each sample is repeated multiple times to increase the resolution.
  4. Iterative Soft Thresholding (IST): IST is applied to enhance the audio by adding missing frequencies. This process uses FFT to transform the signal into the frequency domain, apply a threshold to keep significant frequencies, and then inverse transform back to the time domain.
  5. Scaling Amplitude: The amplitude of the upscaled audio is scaled to match the original.
  6. Normalizing Audio: The audio is normalized to the range -1 to 1.
  7. Writing FLAC File: The processed audio is written to a FLAC file.

Why FFT and IST?

FFT (Fast Fourier Transform) is used to transform the audio signal into the frequency domain. This allows for the identification and manipulation of specific frequency components. By applying a threshold in the frequency domain, we can keep significant frequencies and discard noise and add it to our upscaling data to add detail to upscaling frequencies.

The report titled “Fast Sparse Fourier Transformations for NMR Spectroscopy” by Badruddin Kamal, supervised by Thomas Huber and Alastair Rendall, 2015, provides a comprehensive understanding of sparse representations and their applications in signal processing. IST leverages the concepts from this report to add missing frequencies and enhance the audio quality by making it more detailed and rich. This is particularly useful in upscaling audio where some frequencies might be missing or congested.

Test Audio Source

ericzo - beyond link(https://soundcloud.com/ericzomusic/free-electro-trap-anthem-beyond)

Changelog

All notable changes to this project will be documented in this file.

[1.1.0] - 2024-08-01

Chanaged

[1.0.2] - 2024-07-26

Changed

[1.0.1] - 2024-07-26

Changed

[1.0.0] - 2024-07-25

Added

Changed

Removed

[0.1.8] - 2024-07-24

Added

Changed

Fixed

[0.1.7] - 2024-07-22

Added

[0.1.0] to [0.1.6] - 2024-07-20

Added