One of the best python audio libraries: pydub

2019/11/2912:55:07 technology 1639

pydub provides a simple high-level interface, which greatly expands the ability of python to process audio files. pydub may not be the most powerful Python audio processing library, but it is definitely the most concise and easy-to-use audio library for Python. What's the drawback, probably only highly dependent on ffmpeg, Linux installation is not easy. Its functions are enough to meet the audio processing needs in most cases. If it really can't meet you, why not choose a more powerful library to develop your professional applications?

One of the best python audio libraries: pydub - DayDayNews

1. Quickly start

open an audio file

open a WAV file:

from pydub import AudioSegment

song = AudioSegment.from_wav("never_gonna_give_you_up.wav1zz from 15gonna_give_you_3s_up.waz_opening a MP18zzong_from 15gonna_give_you_3s. Open an OGG or FLV or other FFMPEG-supported file:

ogg_version = AudioSegment.from_ogg("never_gonna_give_you_up.ogg")
flv_version = AudioSegment.from_flv("never_gonna_gonna_give_you_up_file.flv4")_16_youzz_up_from_up_file.flv4. "mp4")
wma_version = AudioSegment.from_file("never_gonna_give_you_up.wma", "wma")
aac_version = AudioSegment.from_file("never_gonna_give_you_up.aiff", "aac")
zzzz1 operates on audio slices imported from audio segment 0 Extract a fragment from the paragraph):

 The time scale for any operation performed by pydub is milliseconds 
ten_seconds = 10 * 1000

first_10_seconds = song[:ten_seconds]
last_5_seconds = song[-5000:]

makes the beginning louder and the end weaker

The sound of the first ten seconds becomes louder and the sound of the last five seconds becomes weaker:

 sound gain 6dB
beginning = first_10_seconds + 6

sound reduction 3dB
end = last_5_seconds -3

connect audio segment

connect two audio segments (connect one file after the other)

without_the_middle = beginning + end

audio segment length

How long is the audio segment?

without_the_middle.duration_seconds == 15.0 

audio segment is immutable

audio segment is immutable

 audio cannot be modified 
backwards = song.reverse()

cross-fade

are not cross-fade (beginning and end, again, changing)

 1.5 seconds fade in and fade out 
with_style = beginning.append(end, crossfade=1500)z16

18z

repeat

 repeat the fragment twice 
do_it_over = with_style * 2

fade

fade (note that you can use many operators to fade into a series) The operator will return an AudioSegment object)

 2 seconds to fade in, 3 seconds to fade out 
awesome = do_it_over.fade_in(2000).fade_out(3000)

save the result

save the edited result (say, support all ffmpeg supported formats)

save Results with tags (metadata)

awesome.export("mashup.mp3", format="mp3", tags={&39;artist&39;: &39;Various artists&39;, &39;album&39;: &39;Best of 2011&39; , &39;comments&39;: &39;This album is awesome!&39;})

You can also export your results by specifying any bitrate supported by ffmpeg

awesome.export("mashup.mp3", format="mp3", bitrate="192k")

More other parameters supported by ffmpeg can be achieved by passing a list to the'parameters' parameter. The first one in the list should be an option, and the second one is the corresponding parameter. Pay special attention to
, these parameters have not been confirmed, the supported parameters may be limited by the specific ffmpeg / avlib you are using. Build

 Use the default MP3 quality 0 (equivalent to lame -V0) 
-lame is an MP3 encoding The -V setting is the VBR compression level, and the quality decreases from 0 to 9 (Translator's Note)
awesome.export("mashup.mp3", format="mp3", parameters=["-q:a", " 0"])

mix to two channels and set the output volume percentage (amplified to 150% of the original)
awesome.export("mashup.mp3", format="mp3", parameters=["-ac", "2" , "-vol", "150"])

0x02 Debugging

Most of the problems that people encounter at runtime are related to the use of ffmpeg/avlib conversion format, pydub provides a logger output subprocess call to help you track down the problem

> >> import logging

>>> l = logging.getLogger("pydub.converter")
>>> l.setLevel(logging.DEBUG)
>>> l.addHandler(logging.StreamHandler())

>>> AudioSegment .from_file("./test/data/test1.mp3")
subprocess.call([&39;ffmpeg&39;, &39;-y&39;, &39;-i&39;, &39;/var/folders/71/42k8g72x4pq09tfp920d033r0000gn/T/ tmpeZTgMy&39;,&39;-vn&39;, &39;-f&39;, &39;wav&39;, &39;/var/folders/71/42k8g72x4pq09tfp920d033r0000gn/T/tmpK5aLcZ&39;])

Don’t worry about the temporary files generated during the conversion, they will be automatically cleared .

0x03 Install

Installing pydub is easy, but don't forget to install ffmpeg/avlib (the next part is also in this document)

pip install pydub

or you can install the latest development version (dev version) from github, or you can use it like @v0. Release version like 12.0 replaces @master……

pip install git+https://github.com/jiaaro/pydub.git@master

-or-

git clone https://github.com/jiaaro/pydub.git

0x04 depends on

you You can open or save WAV files using pure Python only. In order to open or save non-WAV files-such as MP3-you need ffmepg or libav.

playback

When you install one of these, you can play audio (simpleaudio is highly recommended, even if you have installed ffmpeg/libav)

play(sound)

0x05 or you can install ffmpeg

z0ffz Mac libzav1 or ffmpegz0ffz Use homebrew):

 libav
brew install libav --with-libvorbis --with-sdl --with-theora

OR

ffmpeg
brew install ffmpeg --with-libvorbis --with-sdl2 --with-theora
zzLinux (using libav
aptitude):

aptitude -get install libav-tools libavcodec-extra

OR

ffmpeg
apt-get install ffmpeg libavcodec-extra

Windows:

  1. Download and extract libav from the Windows binaries provided here;
  2. add libav /zzbin folder to your environment variable (zpip install PATH); z178py177 ; Z178z

0x06 Notes

AudioSegment object is immutable

Ogg export and the official description of the default encoder

Ogg did not point outHow to use the encoder, these choices are thrown to the user. Vorbis and Theora are part of a large set of available encoders (see RFC page 3) that can be used for encapsulated data. For convenience,
will use vorbis by default when no encoder is specified for output in ogg format, like this:

from pydub import AudioSegment
song = AudioSegment.from_mp3("test/data/test1.mp3")
song.export(" out.ogg", format="ogg")
Is the same as:
song.export("out.ogg", format="ogg", codec="libvorbis")

0x08 Usage example

Suppose you have one full mp4 and flv video folder, and you want to convert them to mp3, so that you can listen to the sound of these videos with your MP3 player.

import os
import glob
from pydub import AudioSegment

video_dir = &39;/home/johndoe/downloaded_videos/&39; The path to the folder where you have saved the video
extension_list = (&39;*.mp4&39;, &39;*.flvvideo_39;)z16dirzforzos. extension in extension_list:
for video in glob.glob(extension):
mp3_filename = os.path.splitext(os.path.basename(video))[0] + &39;.mp3&39;
AudioSegment.from_file(video). export(mp3_filename, format=&39;mp3&39;)

How about another example?

from glob import glob
from pydub import AudioSegment

playlist_songs = [AudioSegment.from_mp3(mp3_file) for mp3_file in glob("*.mp3")]

first_song = playlist_songs.pop(0)z16 only contains the first slice of the first song of z16 (In milliseconds)
beginning_of_song = first_song[:30*1000]

playlist = beginning_of_song
for song in playlist_songs:
We don’t want the end to sound like a sudden stop, so we add a 10-second fade
playlist = playlist.append( song, crossfade=(10 * 1000))

Let us add a fade out to the end of the last song
playlist = playlist.fade_out(30)

Hmm... I also want to know how long it is (len(audio_segment) return value Also in milliseconds)
playlist_length = len(playlist) / (1000*60)

Now save it!
out_f = open("%s_minute_playlist.mp3"% playlist_length, &39;wb&39;)

playlist.export(out_f, format=&39;mp3&39;)

pydub API documentation

If you are looking for some special features, take a look at the source code Maybe it's a good idea. Most of the core functions are in pydub/audio_segment.py-part of the AudioSegment method is in the pydub/effects.py module, and is added to AudioSegment by registering the effect processing (register_pydub_effect() decorator function).

has not been officially included in the document:

    sound2 = AudioSegment.from_file("/path/to/another_sound.wav", format="wav")

    sound1 gains 6 dB, and then attenuates 3.5 dB
    louder = sound1 + 6
    quieter = sound1-3.5

    Add sound2 to sound1
    combined = sound1 + sound2

    sound1 repeats three times
    repeated = sound1 * 3

    5 seconds before
    duration_in_16soundz of gin_16seconds = sound1[:5000]

    the last five seconds of sound1
    end = sound1[-5000:]

    divide the first 5 seconds of sound1
    slices = sound1[::5000]

    advanced operations, if you have raw audio data:
    sound = AudioSegment (
    original audio data (bytes type)
    data=b&39;...&39;,

    2 byte (16 bit) sampling
    sample_width=2,

    44.1 kHz frame rate
    frame_rate=44100,

    Stereo
    channels=2
    )

    Any operation that connects multiple AudioSegment objects ensures that these objects have the same number of channels, frame rate, sample rate, bit depth, etc. When these attributes do not match, the lower-quality sound will be upgraded to the same sound quality as the higher-quality object, so as not to reduce the sound quality: mono will be converted to dual-channel, and the sample rate/frame will be low The speed will be upgraded to the required quality. If you don't want these behaviors to take place, then you may need to explicitly reduce the number of channels, bits, etc. through appropriate class methods.

    AudioSegment(...).from_file()

    opens an audio file as an AudioSegment instance, and returns this instance. For convenience, some other packages are provided here, but you should only use them directly.

    from pydub import AudioSegment

    wave and raw do not need to use ffmpeg
    wav_audio = AudioSegment.from_file("/path/to/sound.wav", format="wav")
    raw_audio = AudioSegment.from_file("/path/to/sound.raw", format="raw",
    frame_rate=44100, channels=2, sample_width=2)

    All other formats need to use ffmpeg
    mp3_audio = AudioSegment.from_file("/path/to/sound.mp3", format="mp3")

    Use a file you have already opened (advanced …ish)
    with open("/path/to/sound.wav", "rb") as wav_file:
    audio_segment = AudioSegment.from_file(wav_file, format="wav")

    At the same time, the corresponding python 3.6 or above supports the os.PathLike solution
    from pathlib import Path
    wav_path = Path("path/to/sound.wav")
    wav_audio = AudioSegment.from_file(wav_path)

    The first parameter is the file read path ( Passed as a string) or a file handle for reading. Keyword parameters supported by

    :

    only. This additional information is needed because the original audio file does not have a file header with its own information like a WAV file.
  • sample_width | Example: 2 raw only — Use 1 for 8-bit audio, 2 for 16-bit (CD quality) and 4 for 32-bit. This is the number of bytes per sample.
  • channels | Example: 1 raw only — 1 means mono, 2 means dual channels.
  • frame_rate | Example: 2 raw only — also known as the sampling rate, its value is generally 44100 (44.1kHz-CD audio), or 48000 (48kHz-DVD audio)

AudioSegment(...).export()

to write the AudioSegment object File-Returns the file handle of an output file (however, you don't have to do anything for this).

from pydub import AudioSegment
sound = AudioSegment.from_file("/path/to/sound.wav", format="wav")

simple export
file_handle = sound.export("/path/to/output.mp3", format="mp3 ")

More complex export
file_handle = sound.export("/path/to/output.mp3",
format="mp3",
bitrate="192k",
tags={"album": "The Bends" , "artist": "Radiohead"},
cover="/path/to/albumcovers/radioheadthebends.jpg")

divide the sound into 5 second segments and export
for i, chunk in enumerate(sound[::5000] ):
with open("sound-%s.mp3"% i, "wb") as f:
chunk.export(f, format="mp3")

The first parameter is the location for output (in string type), or a file handle for output. If you do not give the output file or file handle, a temporary file will be created. Keyword parameters supported by

:

  • codec | Example: "libvorbis" For content that may be encoded with different codecs, you can specify the codec you want to use when encoding. For example, the "ogg" format usually uses the "libvorbis" encoder. (ffmpeg required)
  • bitrate | Example: "128k" For the compressed format, you can set the encoder used for encoding (requires ffmpeg). Each This kind of encoder accepts different bit rate parameters, please refer to ffmpeg documentation for details (bit rate parameters are usually written as -b, -ba or -a:b).
  • tags | Example: {"album": "1989", "artist": " Taylor Swift"} allows you to provide media information tags to the encoder (requires ffmpeg). Not all formats can use tags (mp3 can).
  • parameters | Example: ["-ac", "2"] Enter additional command line parameters To call ffmepg. These parameters can be added to the end of the call (in the output file section).
  • id3v2_version | Example: "3" | Default: "4" Set the ID3v2 version used by ffmpeg to add tags to the output file. If you want windows file manager displays the label, use "3" here (source).
  • cover | Example: "/path/to/imgfile.png" allows you to add a cover to the audio file (the path to the cover image). Currently, only MP3 can use this keyword parameter. The cover must be jpeg, png , bmp, or tiff format files.
  • AudioSegment.empty()

    creates an AudioSegment object with a duration of zero.

    from pydub import AudioSegment
    empty = AudioSegment.empty()

    len(empty) == 0z18 useful 0z18 For the loop that gathers many audios together:

    from pydub import AudioSegment

    sounds = [
    AudioSegment.from_wav("sound1.wav"),
    AudioSegment.from_wav("sound2.wav"),
    AudioSegment.from3.wav("sound ),
    ]

    playlist = AudioSegment.empty()
    for sound in sounds:
    playlist += sound

    AudioSegment.silent()

    creates a silent audio segment, which can be used as a placeholder, to save intervals, or as a Place other sounds on the canvas.

    from pydub import AudioSegment

    ten_second_silence = AudioSegment.silent(duration=10000)

    Supported keyword parameters:

    right_channel = AudioSegment.from_wav("sounds1.wav")
    zzstereo_16zzound16 = AudioSegment.from_mono_audiosegments(left_channel, right_channel)

    AudioSegment(...).dBFS

    returns the loudness of the AudioSegment object in dBFS (equivalent to the loudness of the largest possible number of db). The maximum amplitude of a square wave will be simply set as 0 dBFS (Maximum Loudness), but the maximum amplitude of a sine wave will be set to -3 dBFS.

     from pydub import AudioSegment
    sound = AudioSegment.from_file("sound1.wav")

    loudness = sound.dBFS

    AudioSegment(...).channels

    The number of channels of the audio segment (1 means mono, 2 means dual channels)

    from AudioSound = sound.dBFS

    AudioSegment(...).channels

    ("sound1.wav")

    channel_count = sound.channels

    AudioSegment(...).sample_width

    Number of bytes in each sample (1 means 8 bit, 2 means 16 bit, etc). CD Audio is 16 bit, (sample width of 2 bytes).

    from pydub import AudioSegment
    sound = AudioSegment.from_file("sound1.wav")

    bytes_per_sample = sound.sample_width

    AudioSegment(...).frame_rate

    The sampling rate of CD audio will be 44.1kHz, which means the frame rate will be 44100 (frame_width) The general values ​​are 44100 (CD), 48000 (DVD), 22050, 24000, 12000 and 11025.

    from pydub import AudioSegment
    sound = AudioSegment.from_file("sound1.wav")

    frames_per_second = sound.frame_rate

    zz" each frame_rate
    z18_z

    zz"11z" Frame (frame)" the number of bytes. Each frame contains one sample for each channel (so for dual channels, each frame is sampled twice when it is played). frame_width and channe ls * sample_width is the same. For CD audio this value will be 4 (2 dual channels, each sample is 2 bytes).

    from pydub import AudioSegment
    sound = AudioSegment.from_file("sound1.wav")

    bytes_per_frame = sound.egframe_width
    zment010zAudioSegment …).Rms

    is a measure of loudness. Used to calculate dBFS, a unit you should use in most cases. Loudness is logarithmic (rms not), which makes dB a more natural measure of loudness .

    from pydub import AudioSegment
    sound = AudioSegment.from_file("sound1.wav")

    loudness = sound.rms

    AudioSegment(...).max

    The maximum amplitude of each sample in AudioSegment. For operations like volume normalizationIt is very useful (provided in pydub.effects.normalize).

    from pydub import AudioSegment
    sound = AudioSegment.from_file("sound1.wav")

    peak_amplitude = sound.max

    zzAudioSegment(...).max_dBFSz11AudioSegment(...).max_dBFSz11 The value of the maximum possible amplitude). It is useful for operations like volume normalization (provided in pydub.effects.normalize).

    from pydub import AudioSegment
    sound = AudioSegment.from_file("sound1.wav")

    normalized_sound = sound. apply_gain(-sound.max_dBFS)

    AudioSegment(...).duration_seconds

    returns the duration of AudioSegment in seconds (len(sound) returns milliseconds). This method is provided for convenience; it calls len() internally.

    from pydub import AudioSegment
    sound = AudioSegment.from_file("sound1.wav")

    assert sound.duration_seconds == (len(sound) / 1000.0)

    AudioSegment(...).raw_data

    The raw audio data of the audio segment. For what other audio library collaboration or use It is useful when you need a weird API for audio data in byte stream format. It is also useful when you want to implement effects or other direct digital signal processors (DPS).

    You probably don't need this, but if you do... you'll know.

    from pydub import AudioSegment
    sound = AudioSegment.from_file("soun d1.wav")

    raw_audio_data = sound.raw_data

    AudioSegment(...).frame_count()

    returns the number of frames of an AudioSegment. You can get the number of frames (used for slices) by using a keyword parameter ms in the audio AudioSegment. Etc.).

     from pydub import AudioSegment
    sound = AudioSegment.from_file("sound1.wav")

    number_of_frames_in_sound = sound.frame_count()

    number_of_frames_in_200ms_of_frames_in_200ms_of_sound = sound.200ms_of_sound = sound.200ms_of_sound = supported by default keyword parameter sound. : None (Enter the duration of AudioSegment) After specifying this parameter, the method will return AudioSegment the number of frames in X milliseconds

    AudioSegment(...).append()

    returns a new AudioSegment object by adding another AudioSegment object. For the added object (the one added to the end), fade can be used arbitrarily. AudioSegment(...).append() will be used internally when using + to add AudioSegment objects. The default value of

    is 100ms (0.1 seconds). The fade will be used to eliminate the crackling and popping sounds.

    from pydub import AudioSegment
    sound1 = AudioSegment.from_file("sound1.wav")
    sound2 = AudioSegment.from_file("sound2.wa16")
    z_file("sound2.wa16") Default 100 ms desalination
    combined = sound1.append(sound2)

    5000 ms desalination
    combined_with_5_sec_crossfade = sound1.append(sound2, crossfade=5000)

    no desalination
    no_crossfade1=0 sound1.appendzno_crossfade1=0 soundzzzade2 = soundzade16 sound1 + sound2

    Supported keyword parameters:

    sound2 = AudioSegment.from_file("so und2.wav")

    played_togther = sound1.overlay(sound2)

    sound2_starts_after_delay = sound1.overlay(sound2, position=5000)

    volume_of_sound1_reduced_during_overlay = sound1.overlay_during_overlay=soundsound_overlay2, sound1_overlay_during_overlay = sound1.ue)

    sound2_plays_twice = sound1.overlay(sound2, times=2)

    Assuming that sound1 is 30 seconds long, and sound2 is 5 seconds long:
    sound2_plays_a_lot = sound1.overlay(sound2, times=10000)
    len(sound_zzz0)_zzz0) == 1016(sound_zzzlen) Supported keyword parameters:


    gives sound1 a gain of 3.5 dB
    louder_via_method = sound1.apply_gain(+3.5)
    louder_via_operator = sound1 + 3.5

    gives sound1 attenuation 5. 7 dB
    quieter_via_method = sound1.apply_gain(-5.7)
    quieter_via_operator = sound1-5.7

    AudioSegment(...).fade()

    A more general (more flexible) fade method. You can specify start and end, or specify one of the two Duration (e.g., start and duration).

    from pydub import AudioSegment
    sound1 = AudioSegment.from_file("sound1.wav")

    fade_louder_for_3_seconds_in_middle = sound1.fade(to_gain_middle = sound1.fade(to_gain_be_second_and_be_sound_second_1650) = sound1. to_gain=-3.5, start=2000, end=3000)

    moreThe simple and easy way is to use the .fade_in() method. Note: -120dB is basically silent.
    fade_in_the_hard_way = sound1.fade(from_gain=-120.0, start=0, duration=5000)
    fade_out_the_hard_way = sound1.fade(to_gain=-120.0 , end=0, duration=5000)

    supported keyword parameters

  • duration | Example: 5000 | There is no default value for the fade duration (in milliseconds). It is passed directly to .fade()
  • AudioSegment(…).fade_in( )

    fades in from mute at the beginning of AudioSegment. It uses .fade() internally. The keyword parameter supported by

    is


    Attenuates the left channel by 6dB, and boosts the right channel by 2dB_adly_gain_balance. +2)

    applies gains to the left and right channels of a dual-channel AudioSegment. If the AudioSegment is mono, it will be converted to dual-channel before applying the gain. Both gain parameters of

    are in dB. Point out.

    AudioSegment(...).pan()

    from pydub import AudioSegment
    sound1 = AudioSegment.from_file("sound1.wav")

    pan the sound 15% to the right
    pannedzz_right = sound1.pan to the pandub z16%16 left
    panned_left = sound1.pan(-0.50)

    accepts a position parameter, pan amount, which should be between -1.0 (100% left) and +1.0 (100% right).
    When pan_amount == 0.0, the left and right balance (that is, the position of the sound image) will not change. Changing the sound image of

    will not affect the perceived volume, but because the volume on one side will decrease after the change, the volume on the other side needs to be increased. Compensation. When the sound image is shifted greatly to the left, the left channel will be boosted by 3dB and the right channel will be muted (and vice versa).

    AudioSegment(...).get_array_of_samples()

    returns the original audio data in the form of a sample array . Note: If the audio has multiple channels, the samples of each channel will be linked and stored – for example, the two-channel audio will look like this: [sample_1_L, sample_1_R, sample_2_L, sample_2_R, …].

    This method is mainly Used to use effects and other processing.

    from pydub import AudioSegment
    sound = AudioSegment.from_file("sound1.wav")

    samples = sound.get_array_of_samples()

    Then modify the samples...

    new_sound = sound._spawn(samples)

    Note that when using numpy or scipy, you need to convert them back to the array

    import array
    import numpy as np
    from pydub import AudioSegment

    sound = AudioSegment.from_file("sound1.wav")
    samples = sound. The sample operation of audio data
    shifted_samples = np.right_shift(samples, 1)

    Now we need to convert them into an array.array
    shifted_samples_array = array.array(sound.array_type, shifted_samples)

    new_sound = sound._spawn(shifted_samples_array)1016zzz get_dc_offset()

    returns a value between -1.0 and 1.0 to indicate that the channel is in DC offset. This method uses audioop.avg() to count b and normalizes the result by sampling the maximum value. Keyword parameters supported by

  • channel | Example: 2 | Default: None Select left channel (1) or right channel (2) to remove DC offset. If the value is None, remove from all available channels. If the segment is mono, this value will be ignored.
  • offset | Example: -0.1 | Default value: None The value of the offset that will be removed from the channel. If this value is None, the offset will be automatically calculated. The offset value must be between -1.0 and 1.0.
  • ​​effect

    is a collection of DSP effects that take effect through the AudioSegment object

    AudioSegment(...).invert_phase()

    creates one of the AudioSegment Copy and invert the phase of the signal. The anti-phase wave of noise can be generated to suppress or eliminate noise.

    technology Category Latest News