[Python] 오디오 데이터 전처리 하기

728x90

안녕하세요 오늘은 파이썬 라이브러리를 활용하여 오디오 데이터 전처리 작업을 해보려고 합니다.

음성 AI 학습용 데이터를 만들다 보면 앞뒤 무음, 배경 잡음, 샘플레이트 불일치 같은 문제가 자주 발생합니다.

특히 직접 녹음한 경우에는 버튼을 누르는 소리, 에어컨 소리, 주변 환경음이 많이 들어가는데

파이썬은 이런 오디오 데이터에 대해 전처리 작업을 할 수 있는 다양한 라이브러리를 제공합니다.

오늘은 그중에서도 librosa, scipy, noisereduce, soundfile을 사용할 예정입니다.

1. preprocess_audio.py

import argparse, csv, os
from pathlib import Path
import numpy as np
import soundfile as sf
import librosa
from scipy.signal import butter, filtfilt
import noisereduce as nr
from tqdm import tqdm
from concurrent.futures import ProcessPoolExecutor, as_completed

# ===== 설정 =====
TARGET_SR = 24000
HP_HZ = 80
LP_HZ = 12000
TRIM_TOP_DB = 35
NR_STRENGTH = 0.12
PEAK_DBFS = -1.0
MIN_SEC, MAX_SEC = 3.0, 7.0
BITS = 'PCM_16'

def butter_filter(signal, sr, cutoff, btype, order=5):
    nyq = 0.5 * sr
    norm = cutoff / nyq
    b, a = butter(order, norm, btype=btype)
    return filtfilt(b, a, signal)

def peak_normalize(x, target_dbfs=-1.0, eps=1e-9):
    peak = np.max(np.abs(x)) + eps
    target_lin = 10 ** (target_dbfs / 20.0)
    return x * min(target_lin / peak, 1.0)

def trim_silence(x, sr, top_db=35):
    xt, _ = librosa.effects.trim(x, top_db=top_db)
    return xt

def reduce_noise_soft(x, sr):
    seg = int(sr * 0.25)
    noise_ref = np.concatenate([x[:seg], x[-seg:]]) if len(x) >= seg*2 else x
    return nr.reduce_noise(y=x, sr=sr, y_noise=noise_ref,
                           prop_decrease=NR_STRENGTH,
                           stationary=False)

def process_file(in_path: Path, out_dir: Path):
    try:
        y, _ = librosa.load(in_path, sr=TARGET_SR, mono=True)
        if HP_HZ: y = butter_filter(y, TARGET_SR, HP_HZ, 'highpass')
        if LP_HZ < TARGET_SR/2: y = butter_filter(y, TARGET_SR, LP_HZ, 'lowpass')
        y = reduce_noise_soft(y, TARGET_SR)
        y = trim_silence(y, TARGET_SR, TRIM_TOP_DB)
        y = peak_normalize(y, PEAK_DBFS)

        out_path = out_dir / (in_path.stem + ".wav")
        sf.write(out_path, y, TARGET_SR, subtype=BITS)
        dur = len(y) / TARGET_SR
        return (str(in_path), str(out_path), dur, "ok", "")
    except Exception as e:
        return (str(in_path), "", 0.0, "error", repr(e))

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--in_dir", required=True)
    parser.add_argument("--out_dir", required=True)
    parser.add_argument("--workers", type=int, default=os.cpu_count())
    args = parser.parse_args()

    in_dir, out_dir = Path(args.in_dir), Path(args.out_dir)
    out_dir.mkdir(parents=True, exist_ok=True)

    files = list(in_dir.rglob("*.wav"))
    rows = []
    with ProcessPoolExecutor(max_workers=args.workers) as ex:
        futures = {ex.submit(process_file, f, out_dir): f for f in files}
        for fut in tqdm(as_completed(futures), total=len(files)):
            rows.append(fut.result())

    # CSV 리포트
    with open(out_dir / "report.csv", "w", newline="", encoding="utf-8") as f:
        w = csv.writer(f)
        w.writerow(["src", "dst", "duration_sec", "status", "error"])
        w.writerows(rows)

    # QC: 길이 분류
    bad_short, bad_long = out_dir / "_bad_too_short", out_dir / "_bad_too_long"
    bad_short.mkdir(exist_ok=True)
    bad_long.mkdir(exist_ok=True)
    for src, dst, dur, status, _ in rows:
        if status == "ok" and dst:
            if dur < MIN_SEC:
                Path(dst).replace(bad_short / Path(dst).name)
            elif dur > MAX_SEC:
                Path(dst).replace(bad_long / Path(dst).name)

if __name__ == "__main__":
    main()

이 스크립트를 사용하면 대규모 음성 데이터셋도 자동으로 깨끗하게 만들 수 있습니다.
필요에 따라 필터 강도, 무음 기준, 샘플레이트만 조정하면 다양한 음성 AI 프로젝트에 활용 가능합니다.

모노 변환 + 샘플레이트 통일
→ 예제에서는 24kHz(TARGET_SR=24000)로 맞춤
하이패스 필터 (80Hz)
→ 에어컨, 발자국, 웅웅거림 같은 저역 잡음 제거
로우패스 필터 (12kHz)
→ 초고역 지직거림 제거 (음성 명료도는 유지)
스펙트럴 노이즈 리덕션
→ 주변 잡음을 완화 (강도 NR_STRENGTH로 조절 가능)
앞/뒤 무음 제거
→ -35dB 이하 무음 구간 자동 삭제 (말 끊김 방지 위해 값 조절 가능)
피크 정규화 (-1 dBFS)
→ 전체 음량을 일정하게 맞춰줌
QC 리포트 & 길이별 분류
→ 3초 미만, 7초 초과 파일 자동 분류

원하는 데이터를 제공받기 위해 파라미터를 조절할 수 있습니다!

말 끝이 잘린다 → TRIM_TOP_DB 값을 30으로 낮추기
잡음이 여전히 크다 → NR_STRENGTH를 0.15~0.18로 올리기
금속성/로봇 느낌 → NR_STRENGTH 낮추고 LP_HZ를 11kHz로 내리기
샘플레이트 변경 → TARGET_SR를 22050, 44100 등 모델에 맞게 조정

2. 전처리 파일 실행

python preprocess_audio.py --in_dir "raw_audio" --out_dir "clean_audio" --workers 8

- 원본 파일 경로: raw_audio

- 처리 후 파일 경로: clean_audio

감사합니다.

'Python' 카테고리의 다른 글

[Python] librosa로 WAV 파일 무음 제거하기 (1)	2025.08.17
[Python] PyTorch 활용해서 손글씨 데이터를 숫자로 분류하기 (2)	2025.08.13
[Python] nn.Embedding 사용해보기 (3)	2025.08.07
[Python] 폴더 안 이미지 자동 리사이즈 하기 ( WebP 변환 후 일괄 저장) (0)	2025.07.10
[Python] 티스토리 웹 크롤링하기 (requests, BeautifulSoup) (2)	2025.07.10

팀노바 & Stickode 개발자 블로그

[Python] 오디오 데이터 전처리 하기

'Python' 카테고리의 다른 글

티스토리툴바

[Python] 오디오 데이터 전처리 하기

'Python' 카테고리의 다른 글

관련글

티스토리툴바