The Setup

I don't use Ableton, FL Studio, or Logic. I don't use samples, presets, or plugins. Every track I generate is computed entirely from scratch: sine waves, FM synthesis, envelope shaping, drum synthesis — all built from raw NumPy arrays at 44100 Hz. The output is a WAV file. Then it's converted to MP3 and delivered automatically.

The idea started as an experiment: could I write a techno track as pure math? After 30+ generated tracks — from deep house to hardcore, ambient to orchestral — the answer is yes. And it turns out this approach teaches you more about music theory than any plugin ever could.

Why Python Over a DAW

The honest reason is control. In a DAW, a synth is a black box. You tweak knobs and hope. In Python, a synth is a function. I know exactly what it does because I wrote it. And Claude can write it with me, iterate on it, and generate 10 variations in the time it takes to open a plugin.

The other reason: repeatability. Every track I generate has a np.random.seed(N) at the top, where N is the ARC issue number. The track is fully reproducible. I can regenerate "Лёд и Огонь в Коде" from ARC-287 in exactly its original form at any point. That's not possible with most DAW workflows.

How Synthesis Actually Works

The fundamental building block is a sine wave:

SR = 44100  # sample rate
t = np.arange(int(SR * duration)) / SR
wave = amplitude * np.sin(2 * np.pi * frequency * t)

That's it. A sine wave at any frequency, any duration, any amplitude. Stack these, add harmonics, apply envelopes, mix them — that's synthesis.

Kick drum synthesis is a pitched sine that frequency-sweeps from high to low, with exponential amplitude decay:

def kick(duration=0.4):
    t = np.arange(int(SR * duration)) / SR
    freq = 120 * np.exp(-30 * t) + 40  # sweep 120→40 Hz
    env  = np.exp(-10 * t)             # amplitude decay
    return env * np.sin(2 * np.pi * np.cumsum(freq) / SR)

The key insight: np.cumsum(freq) instead of freq * t. Because frequency changes over time, you need to integrate (accumulate) the phase, not multiply. This is the fundamental difference between FM synthesis and naive "multiply a sine by a changing frequency."

Hi-hat synthesis is filtered noise:

def hihat(duration=0.05, closed=True):
    n = int(SR * duration)
    noise = np.random.randn(n) * 0.3
    cutoff = 8000 if closed else 4000
    # Highpass filter to remove low rumble
    b, a = signal.butter(4, cutoff / (SR / 2), btype='high')
    return signal.filtfilt(b, a, noise) * np.exp(-40 * np.arange(n) / SR)

Real hi-hats are metal cymbals vibrating at many inharmonic frequencies. Bandlimited white noise with a high-pass filter is a close enough approximation for most techno contexts, especially when the kick and bass are doing the heavy lifting.

Music Theory as Code

Before I started this project, I knew music theory conceptually. After writing it in Python, I understand it mechanically.

Scales are frequency ratios. A semitone is a 12th root of 2. To get E Phrygian (the scale for ARC-287, "Лёд и Огонь в Коде"):

# E Phrygian: E F G A B C D (the bII makes it harsh)
E2 = 82.41
SEMITONE = 2 ** (1/12)

# Phrygian intervals from root: 0, 1, 3, 5, 7, 8, 10 semitones
PHRYGIAN = [0, 1, 3, 5, 7, 8, 10]
phrygian_freqs = [E2 * (SEMITONE ** n) for n in PHRYGIAN]
# [82.41, 87.31, 98.00, 110.00, 123.47, 130.81, 146.83]

E Phrygian sounds harsh and unresolved because of the minor second (F natural, one semitone above E). That bII interval is what gives Spanish flamenco and Middle Eastern music their characteristic tension. In ARC-287's "ice and fire in code" concept, this harmonic instability matched the narrative: 3AM, build broken, close but not done.

D Hijaz (used in ARC-281 "Вертушка" and others) has an augmented second — a gap of 3 semitones between the 2nd and 3rd degree — that gives it the "oriental" sound:

# D Hijaz: D Eb F# G A Bb C
# Intervals: 0, 1, 4, 5, 7, 8, 10 (augmented 2nd between Eb and F#)
HIJAZ = [0, 1, 4, 5, 7, 8, 10]

Once you've hardcoded a scale as a list of intervals, changing the mood of a track is a one-line change. Phrygian is dark and unresolved. Lydian (C# instead of C natural) is bright and dreamy. Dorian is melancholic but not bleak. These aren't abstract feelings — they're specific frequency relationships.

Track Architecture: Sections as Code

The most useful structure I found for longer tracks (3-5 min) is defining sections as named tuples with bar counts:

BPM  = 152
BEAT = 60.0 / BPM   # seconds per beat
BAR  = BEAT * 4     # seconds per bar

SECTIONS = [
    ('BOOT',   8),   # 12.6 s — system init, sparse drums
    ('GRIND1', 40),  # 63.2 s — full texture, main theme
    ('BUG',    8),   # 12.6 s — disruption, removed elements
    ('DEBUG',  16),  # 25.3 s — methodical, pared back
    ('EUREKA', 4),   #  6.3 s — full drop
    ('GRIND2', 40),  # 63.2 s — harder version of GRIND1
    ('DEPLOY', 12),  # 18.9 s — building out
    ('COMMIT', 8),   # 12.6 s — sparse, exhausted resolution
]

This directly parallels how you'd write a programming narrative. The EUREKA section is 4 bars because the insight moment should be brief — a human breakthrough is usually short, then you're back in the work. The GRIND sections are 40 bars each because that's where the track breathes.

Each section function takes a bar count and returns a numpy array. The final track is just concatenation:

track_parts = []
for name, bars in SECTIONS:
    track_parts.append(render_section(name, bars))
final = np.concatenate(track_parts)

The Pipeline: WAV → MP3 → Telegram

Once you have a numpy float array, writing a WAV file is one line:

from scipy.io import wavfile
wavfile.write('output.wav', SR, (final * 32767).astype(np.int16))

But WAV files are huge. A 3.5-minute track at 44100 Hz stereo is ~88 MB. For sharing and archiving, MP3 is the target. I automated the conversion pipeline (ARC-284) using ffmpeg:

ffmpeg -i output.wav -codec:a libmp3lame -qscale:a 2 output.mp3

That gets a 3.5-minute track down to ~8 MB at near-transparent quality. The pipeline then sends the MP3 to a private Telegram channel via Bot API — instant delivery to any device, persistent storage, shareable link. The whole thing runs in about 90 seconds: generate → convert → deliver.

What I've Built (So Far)

After roughly 6 weeks of generation sessions, the catalog includes:

TrackScaleBPMStyle
Лёд и Огонь в КодеE Phrygian152Hardcore techno, coding narrative
Meta ProgrammingD Dorian→Phrygian→Mixolydian130Modulating ambient techno
Fibonacci Fréquence55×φⁿ Hz ratios144Experimental, mathematically tuned
ВертушкаD Hijaz128Melodic techno banger
Казантип ReturnsD Hijaz127Daft Punk arp + Guetta hook
Северное СияниеC Lydian118Deep house, dual melody + choir
Symphonic DriveD minor130Melodic techno, orchestral layer
Dancing Skulls & BonesA minor132Dark Halloween techno

The most interesting experiment was Fibonacci Fréquence (ARC-283): instead of equal temperament (12th root of 2), I tuned the synths to frequencies derived from the Fibonacci sequence scaled to start at 55 Hz (A1). The φ-based intervals are close to standard Western intervals but slightly off — creating subtle beating and a "not quite right" quality that works beautifully for experimental ambient.

Honest Limitations

Mixing is hard without ears. I've iterated on kick drum levels, bass presence, and hi-hat brightness many times. The math produces a balanced signal, but "balanced" in dB doesn't equal "sounds right." I've had tracks where the bass was technically correct but felt absent on laptop speakers and overwhelming on headphones.

Humanization is manual. Real music has timing variations, velocity changes, slight pitch deviations. Numpy produces robotically perfect timing. Adding humanization means explicitly injecting jitter: onset += np.random.uniform(-0.005, 0.005). Getting the amount right is trial and error.

Reverb is expensive. A proper convolution reverb requires an impulse response (IR) file. I implemented a simple feedback delay network (FDN) reverb in pure numpy, but it sounds artificial compared to a real room IR. The alternative is using scipy's signal processing to convolve with a downloaded IR, which I haven't fully automated yet.

Mastering is guesswork. I apply a simple peak normalization and soft-clip limiter, but professional mastering is a skill that takes years. My tracks are at the right volume, but they won't win a loudness war.

Why This Is Worth It

Generating music with code forces you to understand what music actually is. When you type freq = 440, you know you're at A4 and why. When you define PHRYGIAN = [0, 1, 3, 5, 7, 8, 10], you understand why that minor second sounds unresolved. When you write an FM synthesis function, you understand why analog synthesizers have the warmth they do (harmonics, operator feedback, slight frequency drift).

It's also an unusually good collaboration target for Claude. Music generation scripts are deterministic, testable (does the output WAV play?), and iteratively improvable. Claude can write a hi-hat synthesis function, I listen to it, and then describe what's wrong: "more transient click, shorter decay, slightly more high-frequency content." That's a precise specification Claude can execute on.

The catalog is growing. The pipeline is automated. And every track I generate, I understand exactly why it sounds the way it does — which is more than I can say for anything I've made in a DAW.