使用FFmpeg API编码器生成MP3音频文件

MP3文件格式：

MP3的全称是MPEG Audio Layer3，它是一种高效的计算机音频编码方案，它以较大的压缩比将音频文件转换成较小的扩展名为.MP3的文件，基本保持原文件的音质。MP3是ISO/MPEG标准的一部分，ISO/MPEG标准描述了使用高性能感知编码方案的音频压缩，此标准一直在不断更新以满足“质高量小”的追求，现已形成MPEGLayer1、Layer2、Layer3三个音频编码解码方案。MPEGLayer3压缩率可达1:10至1:12，1M的MP3文件可播放1分钟，而1分钟CD音质的WAV文件(44100Hz,16bit,双声道,60秒)要占用10M空间，这样算来，一张650M的MP3光盘播放时间应在10小时以上，而同样容量的一张CD盘播放时间在70分钟左右。

文件结构：

MP3的文件大体分为三部分：TAG_V2、帧集合、TAG_V1。
TAG_V2通常在文件的首部，包含了作者、作者、专辑等信息，长度不固定，有专门的头部进行指示。
帧集合通常在文件的中间，紧挨TAG_V2，由一系列的音频帧组成，每个帧包括4个字节的帧头，和不定长的压缩后的媒体。
TAG_V1通常在文件尾部，包含了作者、作者、专辑等信息，长度固定为128字节。
在以上三部分中，严格上说只有帧集合是必须的，因此出于知识的关联性，我这里只对帧信息做一些基本的介绍。

帧结构：

每个MP3音频帧包括两部分：4字节的帧头和不定长的压缩媒体。为了说明每个位的含义，下面我简单表列如下：

--------------------------------------------------------------------------------------------------------
| 长度(位) | 位置(位) | 描述
--------------------------------------------------------------------------------------------------------
|    11      |  31~21  |  11位的帧同步标识，全为1
--------------------------------------------------------------------------------------------------------
|    2       |   20~19  |  MPEG音频版本ID，对MP3来说取值11
--------------------------------------------------------------------------------------------------------
|    2       |   18~17  |  Layer标识，对MP3来说取值01，表示Layer III
--------------------------------------------------------------------------------------------------------
|    1       |      16     |  校验位，通常取值1，表示没有校验
--------------------------------------------------------------------------------------------------------
|    4       |   15~12  |  位率标识，对MP3来说取值0101表示64kpbs，取值1001表示128kpbs，其他值请查表
--------------------------------------------------------------------------------------------------------
|    2       |   11~10  |  采样率，对MP3来说取值：00-44.1K，01-48K，10-32K，11-保留
--------------------------------------------------------------------------------------------------------
|    1       |       9      |  填充位，0表示没有填充，1表示有填充，对MP3来说，表示增加1字节的填充
--------------------------------------------------------------------------------------------------------
|    1       |       8      |  私有位，可以被应用程序用来做特殊用途
--------------------------------------------------------------------------------------------------------
|    2       |    7~6     |  声道标识，取值：00-立体声，01-联合立体声，10-双声道，11-单声道
--------------------------------------------------------------------------------------------------------
|    2       |    5~4     |  扩展模式（仅在联合立体声时有效）
--------------------------------------------------------------------------------------------------------
|    1       |       3      |  版权标识，取值：0-无版权，1-有版权
--------------------------------------------------------------------------------------------------------
|    1       |       2      |  原创标识，取值：0-原创拷贝，1-原创
--------------------------------------------------------------------------------------------------------
|    2       |    1~0     |  强调标识，通常取值00
--------------------------------------------------------------------------------------------------------

帧头之后就是压缩后的媒体数据，它的长度没有的帧头直接给出，但是可以通过其他参数进行计算，计算公式如下：
帧长度（字节） = (（每帧采样数 / 采样率）* 比特率 ) / 8 + 填充

对MP3来说，每帧采样数通常为1152，若比特率为128Kbps，按公式计算帧长度为417字节。

相关API：

数据结构介绍：

音频样体格式定义：

enum AVSampleFormat
{
    AV_SAMPLE_FMT_NONE = -1,
    AV_SAMPLE_FMT_U8,          ///< unsigned 8 bits
    AV_SAMPLE_FMT_S16,         ///< signed 16 bits
    AV_SAMPLE_FMT_S32,         ///< signed 32 bits
    AV_SAMPLE_FMT_FLT,         ///< float
    AV_SAMPLE_FMT_DBL,         ///< double

    AV_SAMPLE_FMT_U8P,         ///< unsigned 8 bits, planar
    AV_SAMPLE_FMT_S16P,        ///< signed 16 bits, planar
    AV_SAMPLE_FMT_S32P,        ///< signed 32 bits, planar
    AV_SAMPLE_FMT_FLTP,        ///< float, planar
    AV_SAMPLE_FMT_DBLP,        ///< double, planar
    AV_SAMPLE_FMT_S64,         ///< signed 64 bits
    AV_SAMPLE_FMT_S64P,        ///< signed 64 bits, planar

    AV_SAMPLE_FMT_NB           ///< Number of sample formats. DO NOT USE if linking dynamically
};

包括两个维度的定义：采样精度和多通道排列。采样精度包括8位无符号、16位有符号、32位有符号、32位浮点数等。多通道排列包括交错排列和非交错排列，即压包格式和平面格式。这两种格式有处理单频数据时非常重要，对压包格式来说，所有通道的样本是A1B1A2B2...排列的，并且只使用一个连续的内存，而对平面格式来说，每个通道单独使用一块内存，即排布格式为 A1A2...，B1B2... 。

编码器结构定义：

struct AVCodec 
{
    const char *name;
    const char *long_name;
    enum AVMediaType type;
    enum AVCodecID id;
    const AVRational *supported_framerates;
    const enum AVPixelFormat *pix_fmts;
    const int *supported_samplerates;
    const enum AVSampleFormat *sample_fmts;
    const uint64_t *channel_layouts;
    const char *bsfs;
    const uint32_t *codec_tags;

    int (*init)(struct AVCodecContext *);
    /**
     * Encode data to an AVPacket.
     *
     * @param      avctx          codec context
     * @param      avpkt          output AVPacket
     * @param[in]  frame          AVFrame containing the raw data to be encoded
     * @param[out] got_packet_ptr encoder sets to 0 or 1 to indicate that a
     *                            non-empty packet was returned in avpkt.
     * @return 0 on success, negative error code on failure
     */
    int (*encode2)(struct AVCodecContext *avctx, struct AVPacket *avpkt,
                   const struct AVFrame *frame, int *got_packet_ptr);
    /**
     * Decode picture or subtitle data.
     *
     * @param      avctx          codec context
     * @param      outdata        codec type dependent output struct
     * @param[out] got_frame_ptr  decoder sets to 0 or 1 to indicate that a
     *                            non-empty frame or subtitle was returned in
     *                            outdata.
     * @param[in]  avpkt          AVPacket containing the data to be decoded
     * @return amount of bytes read from the packet on success, negative error
     *         code on failure
     */
    int (*decode)(struct AVCodecContext *avctx, void *outdata,
                  int *got_frame_ptr, struct AVPacket *avpkt);
    int (*close)(struct AVCodecContext *);
    /**
     * Encode API with decoupled frame/packet dataflow. This function is called
     * to get one output packet. It should call ff_encode_get_frame() to obtain
     * input data.
     */
    int (*receive_packet)(struct AVCodecContext *avctx, struct AVPacket *avpkt);

    /**
     * Decode API with decoupled packet/frame dataflow. This function is called
     * to get one output frame. It should call ff_decode_get_packet() to obtain
     * input data.
     */
    int (*receive_frame)(struct AVCodecContext *avctx, struct AVFrame *frame);
    /**
     * Flush buffers.
     * Will be called when seeking
     */
    void (*flush)(struct AVCodecContext *);
};

struct AVCodecContext 
{
    enum AVMediaType codec_type; /* see AVMEDIA_TYPE_xxx */
    const struct AVCodec  *codec;
    enum AVCodecID     codec_id; /* see AV_CODEC_ID_xxx */
    unsigned int codec_tag;
    int64_t bit_rate;
    int global_quality;
    int compression_level;
    AVRational time_base;
    int width, height;
    int coded_width, coded_height;
    int gop_size;
    enum AVPixelFormat pix_fmt;
    int max_b_frames;
    int has_b_frames;
    /* audio only */
    int sample_rate; ///< samples per second
    int channels;    ///< number of audio channels
    enum AVSampleFormat sample_fmt;  ///< sample format
    uint64_t channel_layout;
    AVRational framerate;
    enum AVPixelFormat sw_pix_fmt;
};

AVCodec这是一个抽象结构，所有的编解码器实现均基于这个结构进行，实现各自的编解码逻辑。在平常的应用开发过程中，我们不需要过多关心该结构实现者的细节，只要按相关的函数标准使用即可。
AVCodecContext定义了编解码过程的上下文细节，如指定样本格式、通道布局、采样率等，同时编解码过程中的转换流，该上下文也有保存。

编码前后的帧结构定义：

struct AVPacket 
{
    AVBufferRef *buf;
    int64_t pts;
    int64_t dts;
    uint8_t *data;
    int   size;
    int   stream_index;
    int64_t duration;
    int64_t pos;
    AVRational time_base;
};

struct AVFrame 
{
    #define AV_NUM_DATA_POINTERS 8
    uint8_t *data[AV_NUM_DATA_POINTERS];
    int linesize[AV_NUM_DATA_POINTERS];
    int width, height;
    int nb_samples;
    int format;
    int key_frame;
    enum AVPictureType pict_type;
    AVRational sample_aspect_ratio;
    int64_t pts;
    int64_t pkt_dts;
    int64_t pkt_duration;
    int64_t pkt_pos;
    int quality; // (between 1 (good) and FF_LAMBDA_MAX (bad))
    int sample_rate;
    uint64_t channel_layout;
    AVDictionary *metadata;
    int channels;
    int pkt_size;
};

AVPacket表示编码后的包结构，AVFrame表示编码前的包结构，编解码过程是两者互转，编码是AVFrame->AVPacket，解码反过来。相关字段的用法请参见例子代码。

函数介绍：

查找编码器，ID和名称两个版本：

const AVCodec *avcodec_find_encoder(enum AVCodecID id);
const AVCodec *avcodec_find_encoder_by_name(const char *name);

编码器上下文操作，在调用avcodec_open2打开编码上下文之前，需要先设置好相关参数：

AVCodecContext *avcodec_alloc_context3(const AVCodec *codec);
int avcodec_open2(AVCodecContext *avctx, const AVCodec *codec, AVDictionary **options);
void avcodec_free_context(AVCodecContext **avctx);

分配和释放编码后的包结构：

AVPacket *av_packet_alloc(void);
void av_packet_free(AVPacket **pkt);

分配和释放编码前的帧结构：

AVFrame *av_frame_alloc(void);
void av_frame_free(AVFrame **frame);

根据编码前帧结构中设置的相关参数分配内存：

int av_frame_get_buffer(AVFrame *frame, int align);

确保帧结构中的内存可写：

int av_frame_make_writable(AVFrame *frame);

编码的一对函数，avcodec_send_frame向上下文发送原始数据，avcodec_receive_packet从上下文中接收编码后的数据。注意，由于编码器有自己的实现细节，并不过每次avcodec_send_frame调用后avcodec_receive_packet均能接收编码结果，可能上下文中还持有部分缓存，结束前需要做尾包处理。

int avcodec_send_frame(AVCodecContext *avctx, const AVFrame *frame);
int avcodec_receive_packet(AVCodecContext *avctx, AVPacket *avpkt);

代码举例：

下面这个例子演示了如何生成1K的纯音音频，并编码生成不包含标签的裸音频MP3文件，代码不过多解释，重点部分已注释说明，如下：

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

extern "C"
{
#include <libavcodec/avcodec.h>
#include <libavutil/channel_layout.h>
}

// 编码帧
bool encode(AVCodecContext* pCodecCTX, const AVFrame* pFrame, AVPacket* pPacket, FILE* pFile);

int main(int argc, char* argv[])
{
    // 搜索指定的编码器
    const AVCodec* pCodec = avcodec_find_encoder_by_name("libmp3lame");
    if (pCodec == NULL)
    {
        printf("not support mp3 encoder! \n");
        return 0;
    }

    // 以下打印该编码器支持的样本格式、通道布局、采样率

    printf("support sample formats: \n");
    const enum AVSampleFormat* pSampleFMT = pCodec->sample_fmts;
    while (pSampleFMT && *pSampleFMT)
    {
        printf("\t %d - %s \n", *pSampleFMT, av_get_sample_fmt_name(*pSampleFMT));
        ++pSampleFMT;
    }

    printf("support layouts: \n");
    const uint64_t* pLayout = pCodec->channel_layouts;
    while (pLayout && *pLayout)
    {
        int nb_channels = av_get_channel_layout_nb_channels(*pLayout);

        char sBuf[128] = {0};
        av_get_channel_layout_string(sBuf, sizeof(sBuf), nb_channels, *pLayout);

        printf("\t %d - %s \n", nb_channels, sBuf);
        ++pLayout;
    }

    printf("support sample rates: \n");
    const int* pSampleRate = pCodec->supported_samplerates;
    while (pSampleRate && *pSampleRate)
    {
        printf("\t %dHz \n", *pSampleRate);
        ++pSampleRate;
    }

    // 根据编码器初使化编码器上下文结构，并设置相关默认值 
    AVCodecContext* pCodecCTX = avcodec_alloc_context3(pCodec);
    if (pCodecCTX == NULL)
    {
        printf("alloc aac encoder context failed! \n");
        return 0;
    }

    // 填写音频编码的关键参数：样本格式、通道数及布局、采样率
    pCodecCTX->sample_fmt = AV_SAMPLE_FMT_S16P;
    pCodecCTX->channel_layout = AV_CH_LAYOUT_STEREO;
    pCodecCTX->channels = 2;
    pCodecCTX->sample_rate = 44100;

    // 以设定的参数打开编码器
    int rc = avcodec_open2(pCodecCTX, pCodec, NULL);
    if (rc < 0)
    {
        char sError[128] = {0};
        av_strerror(rc, sError, sizeof(sError));
        printf("avcodec_open2() ret:[%d:%s] \n", rc, sError);
        return -1;
    }

    AVPacket* pPacket = av_packet_alloc();
    AVFrame* pFrame = av_frame_alloc();

    // 设置音频帧的相关参数：样本格式、通道数及布局、样本数量
    pFrame->format = pCodecCTX->sample_fmt;
    pFrame->channel_layout = pCodecCTX->channel_layout;
    pFrame->channels = pCodecCTX->channels;
    pFrame->nb_samples = pCodecCTX->frame_size;

    // 根据参数设置，申请帧空间
    rc = av_frame_get_buffer(pFrame, 0);
    if (rc < 0)
    {   
        char sError[128] = {0};
        av_strerror(rc, sError, sizeof(sError));
        printf("av_frame_get_buffer() ret:[%d:%s] \n", rc, sError);
        return -1;
    }

    FILE* pFile = fopen("test.mp3", "wb");
    if (pFile == NULL)
    {
        printf("open test.mp3 failed! \n");
        return -1;
    }

    // 计算1Khz的正弦波采样步进
    float fXStep = 2 * 3.1415926 * 1000 / pCodecCTX->sample_rate;

    for (int n = 0; n < pCodecCTX->sample_rate * 10; )
    {
        av_frame_make_writable(pFrame);

        // 下面以平面格式进行音频格式组装
        for (int c = 0; c < pFrame->channels; ++c)
        {
            int16_t* pData = (int16_t*)pFrame->data[c];

            for (int i = 0; i < pFrame->nb_samples; ++i)
            {
                pData[i] = (int16_t)(sin(fXStep * (n + i)) * 1000);
            }
        }

        // 编码生成压缩格式
        if (!encode(pCodecCTX, pFrame, pPacket, pFile))
        {
            printf("encode() fatal! \n");
            exit(-1);
        }

        n += pFrame->nb_samples;
    }

    // 写入尾包
    encode(pCodecCTX, NULL, pPacket, pFile);

    printf("test.mp3 write succuss! \n");

    fclose(pFile);

    av_frame_free(&pFrame);
    av_packet_free(&pPacket);
    avcodec_free_context(&pCodecCTX);

    return 0;
}

// 编码帧
bool encode(AVCodecContext* pCodecCTX, const AVFrame* pFrame, AVPacket* pPacket, FILE* pFile)
{
    int rc = avcodec_send_frame(pCodecCTX, pFrame);
    if (rc < 0)
    {
        char sError[128] = {0};
        av_strerror(rc, sError, sizeof(sError));
        printf("avcodec_send_frame() ret:[%d:%s] \n", rc, sError);

        return false;
    }

    while (true) 
    {
        rc = avcodec_receive_packet(pCodecCTX, pPacket);
        if (rc < 0)
        {
            if (rc == AVERROR(EAGAIN) || rc == AVERROR_EOF)
                return true;
                
            char sError[128] = {0};
            av_strerror(rc, sError, sizeof(sError));
            printf("avcodec_receive_packet() ret:[%d:%s] \n", rc, sError);

            return false;
        }

        fwrite(pPacket->data, 1, pPacket->size, pFile);
        av_packet_unref(pPacket);
    }

    return true;
}

编译：
g++ -o encode_mp3 encode_mp3.cpp -I/usr/local/ffmpeg/include -L/usr/local/ffmpeg/lib -lavformat -lavcodec -lavutil

运行，输出如下：

$./encode_mp3
support sample formats:
         7 - s32p
         8 - fltp
         6 - s16p
         -1 - (null)
support layouts:
         1 - mono
         2 - stereo
support sample rates:
         44100Hz
         48000Hz
         32000Hz
         22050Hz
         24000Hz
         16000Hz
         11025Hz
         12000Hz
         8000Hz
test.mp3 write succuss!

查看test.mp3媒体信息：

$ ffprobe.exe -i test.mp3
ffprobe version N-104465-g08a501946f Copyright (c) 2007-2021 the FFmpeg developers
  built with gcc 11.2.0 (Rev6, Built by MSYS2 project)
  configuration: --prefix=/usr/local/ffmpeg --enable-shared --enable-libmp3lame
  libavutil      57.  7.100 / 57.  7.100
  libavcodec     59. 12.100 / 59. 12.100
  libavformat    59.  8.100 / 59.  8.100
  libavdevice    59.  0.101 / 59.  0.101
  libavfilter     8. 16.101 /  8. 16.101
  libswscale      6.  1.100 /  6.  1.100
  libswresample   4.  0.100 /  4.  0.100
[mp3 @ 000001f10bddcbc0] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from 'test.mp3':
  Duration: 00:00:10.03, start: 0.000000, bitrate: 128 kb/s
  Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s

最后编辑于：2022.06.24 16:10:27

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 199,271评论 5赞 466
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 83,725评论 2赞 376
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 146,252评论 0赞 328
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 53,634评论 1赞 270
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 62,549评论 5赞 359
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 47,985评论 1赞 275
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,471评论 3赞 390
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,128评论 0赞 254
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,257评论 1赞 294
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,233评论 2赞 317
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,235评论 1赞 328
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 32,940评论 3赞 316
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,528评论 3赞 302
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,623评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 30,858评论 1赞 255
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 42,245评论 2赞 344
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 41,790评论 2赞 339