MP3文件格式:
MP3的全称是MPEG Audio Layer3,它是一种高效的计算机音频编码方案,它以较大的压缩比将音频文件转换成较小的扩展名为.MP3的文件,基本保持原文件的音质。MP3是ISO/MPEG标准的一部分,ISO/MPEG标准描述了使用高性能感知编码方案的音频压缩,此标准一直在不断更新以满足“质高量小”的追求,现已形成MPEGLayer1、Layer2、Layer3三个音频编码解码方案。MPEGLayer3压缩率可达1:10至1:12,1M的MP3文件可播放1分钟,而1分钟CD音质的WAV文件(44100Hz,16bit,双声道,60秒)要占用10M空间,这样算来,一张650M的MP3光盘播放时间应在10小时以上,而同样容量的一张CD盘播放时间在70分钟左右。
文件结构:
MP3的文件大体分为三部分:TAG_V2、帧集合、TAG_V1。
TAG_V2通常在文件的首部,包含了作者、作者、专辑等信息,长度不固定,有专门的头部进行指示。
帧集合通常在文件的中间,紧挨TAG_V2,由一系列的音频帧组成,每个帧包括4个字节的帧头,和不定长的压缩后的媒体。
TAG_V1通常在文件尾部,包含了作者、作者、专辑等信息,长度固定为128字节。
在以上三部分中,严格上说只有帧集合是必须的,因此出于知识的关联性,我这里只对帧信息做一些基本的介绍。
帧结构:
每个MP3音频帧包括两部分:4字节的帧头 和 不定长的压缩媒体。为了说明每个位的含义,下面我简单表列如下:
--------------------------------------------------------------------------------------------------------
| 长度(位) | 位置(位) | 描述
--------------------------------------------------------------------------------------------------------
| 11 | 31~21 | 11位的帧同步标识,全为1
--------------------------------------------------------------------------------------------------------
| 2 | 20~19 | MPEG音频版本ID,对MP3来说取值11
--------------------------------------------------------------------------------------------------------
| 2 | 18~17 | Layer标识,对MP3来说取值01,表示Layer III
--------------------------------------------------------------------------------------------------------
| 1 | 16 | 校验位,通常取值1,表示没有校验
--------------------------------------------------------------------------------------------------------
| 4 | 15~12 | 位率标识,对MP3来说取值0101表示64kpbs,取值1001表示128kpbs,其他值请查表
--------------------------------------------------------------------------------------------------------
| 2 | 11~10 | 采样率,对MP3来说取值:00-44.1K,01-48K,10-32K,11-保留
--------------------------------------------------------------------------------------------------------
| 1 | 9 | 填充位,0表示没有填充,1表示有填充,对MP3来说,表示增加1字节的填充
--------------------------------------------------------------------------------------------------------
| 1 | 8 | 私有位,可以被应用程序用来做特殊用途
--------------------------------------------------------------------------------------------------------
| 2 | 7~6 | 声道标识,取值:00-立体声,01-联合立体声,10-双声道,11-单声道
--------------------------------------------------------------------------------------------------------
| 2 | 5~4 | 扩展模式(仅在联合立体声时有效)
--------------------------------------------------------------------------------------------------------
| 1 | 3 | 版权标识,取值:0-无版权,1-有版权
--------------------------------------------------------------------------------------------------------
| 1 | 2 | 原创标识,取值:0-原创拷贝,1-原创
--------------------------------------------------------------------------------------------------------
| 2 | 1~0 | 强调标识,通常取值00
--------------------------------------------------------------------------------------------------------
帧头之后就是压缩后的媒体数据,它的长度没有的帧头直接给出,但是可以通过其他参数进行计算,计算公式如下:
帧长度(字节) = ((每帧采样数 / 采样率)* 比特率 ) / 8 + 填充
对MP3来说,每帧采样数通常为1152,若比特率为128Kbps,按公式计算帧长度为417字节。
相关API:
数据结构介绍:
音频样体格式定义:
enum AVSampleFormat
{
AV_SAMPLE_FMT_NONE = -1,
AV_SAMPLE_FMT_U8, ///< unsigned 8 bits
AV_SAMPLE_FMT_S16, ///< signed 16 bits
AV_SAMPLE_FMT_S32, ///< signed 32 bits
AV_SAMPLE_FMT_FLT, ///< float
AV_SAMPLE_FMT_DBL, ///< double
AV_SAMPLE_FMT_U8P, ///< unsigned 8 bits, planar
AV_SAMPLE_FMT_S16P, ///< signed 16 bits, planar
AV_SAMPLE_FMT_S32P, ///< signed 32 bits, planar
AV_SAMPLE_FMT_FLTP, ///< float, planar
AV_SAMPLE_FMT_DBLP, ///< double, planar
AV_SAMPLE_FMT_S64, ///< signed 64 bits
AV_SAMPLE_FMT_S64P, ///< signed 64 bits, planar
AV_SAMPLE_FMT_NB ///< Number of sample formats. DO NOT USE if linking dynamically
};
包括两个维度的定义:采样精度和多通道排列。采样精度包括8位无符号、16位有符号、32位有符号、32位浮点数等。多通道排列包括交错排列和非交错排列,即压包格式和平面格式。这两种格式有处理单频数据时非常重要,对压包格式来说,所有通道的样本是A1B1A2B2...排列的,并且只使用一个连续的内存,而对平面格式来说,每个通道单独使用一块内存,即排布格式为 A1A2...,B1B2... 。
编码器结构定义:
struct AVCodec
{
const char *name;
const char *long_name;
enum AVMediaType type;
enum AVCodecID id;
const AVRational *supported_framerates;
const enum AVPixelFormat *pix_fmts;
const int *supported_samplerates;
const enum AVSampleFormat *sample_fmts;
const uint64_t *channel_layouts;
const char *bsfs;
const uint32_t *codec_tags;
int (*init)(struct AVCodecContext *);
/**
* Encode data to an AVPacket.
*
* @param avctx codec context
* @param avpkt output AVPacket
* @param[in] frame AVFrame containing the raw data to be encoded
* @param[out] got_packet_ptr encoder sets to 0 or 1 to indicate that a
* non-empty packet was returned in avpkt.
* @return 0 on success, negative error code on failure
*/
int (*encode2)(struct AVCodecContext *avctx, struct AVPacket *avpkt,
const struct AVFrame *frame, int *got_packet_ptr);
/**
* Decode picture or subtitle data.
*
* @param avctx codec context
* @param outdata codec type dependent output struct
* @param[out] got_frame_ptr decoder sets to 0 or 1 to indicate that a
* non-empty frame or subtitle was returned in
* outdata.
* @param[in] avpkt AVPacket containing the data to be decoded
* @return amount of bytes read from the packet on success, negative error
* code on failure
*/
int (*decode)(struct AVCodecContext *avctx, void *outdata,
int *got_frame_ptr, struct AVPacket *avpkt);
int (*close)(struct AVCodecContext *);
/**
* Encode API with decoupled frame/packet dataflow. This function is called
* to get one output packet. It should call ff_encode_get_frame() to obtain
* input data.
*/
int (*receive_packet)(struct AVCodecContext *avctx, struct AVPacket *avpkt);
/**
* Decode API with decoupled packet/frame dataflow. This function is called
* to get one output frame. It should call ff_decode_get_packet() to obtain
* input data.
*/
int (*receive_frame)(struct AVCodecContext *avctx, struct AVFrame *frame);
/**
* Flush buffers.
* Will be called when seeking
*/
void (*flush)(struct AVCodecContext *);
};
struct AVCodecContext
{
enum AVMediaType codec_type; /* see AVMEDIA_TYPE_xxx */
const struct AVCodec *codec;
enum AVCodecID codec_id; /* see AV_CODEC_ID_xxx */
unsigned int codec_tag;
int64_t bit_rate;
int global_quality;
int compression_level;
AVRational time_base;
int width, height;
int coded_width, coded_height;
int gop_size;
enum AVPixelFormat pix_fmt;
int max_b_frames;
int has_b_frames;
/* audio only */
int sample_rate; ///< samples per second
int channels; ///< number of audio channels
enum AVSampleFormat sample_fmt; ///< sample format
uint64_t channel_layout;
AVRational framerate;
enum AVPixelFormat sw_pix_fmt;
};
AVCodec这是一个抽象结构,所有的编解码器实现均基于这个结构进行,实现各自的编解码逻辑。在平常的应用开发过程中,我们不需要过多关心该结构实现者的细节,只要按相关的函数标准使用即可。
AVCodecContext定义了编解码过程的上下文细节,如指定样本格式、通道布局、采样率等,同时编解码过程中的转换流,该上下文也有保存。
编码前后的帧结构定义:
struct AVPacket
{
AVBufferRef *buf;
int64_t pts;
int64_t dts;
uint8_t *data;
int size;
int stream_index;
int64_t duration;
int64_t pos;
AVRational time_base;
};
struct AVFrame
{
#define AV_NUM_DATA_POINTERS 8
uint8_t *data[AV_NUM_DATA_POINTERS];
int linesize[AV_NUM_DATA_POINTERS];
int width, height;
int nb_samples;
int format;
int key_frame;
enum AVPictureType pict_type;
AVRational sample_aspect_ratio;
int64_t pts;
int64_t pkt_dts;
int64_t pkt_duration;
int64_t pkt_pos;
int quality; // (between 1 (good) and FF_LAMBDA_MAX (bad))
int sample_rate;
uint64_t channel_layout;
AVDictionary *metadata;
int channels;
int pkt_size;
};
AVPacket表示编码后的包结构,AVFrame表示编码前的包结构,编解码过程是两者互转,编码是AVFrame->AVPacket,解码反过来。相关字段的用法请参见例子代码。
函数介绍:
查找编码器,ID和名称两个版本:
const AVCodec *avcodec_find_encoder(enum AVCodecID id);
const AVCodec *avcodec_find_encoder_by_name(const char *name);
编码器上下文操作,在调用avcodec_open2打开编码上下文之前,需要先设置好相关参数:
AVCodecContext *avcodec_alloc_context3(const AVCodec *codec);
int avcodec_open2(AVCodecContext *avctx, const AVCodec *codec, AVDictionary **options);
void avcodec_free_context(AVCodecContext **avctx);
分配和释放编码后的包结构:
AVPacket *av_packet_alloc(void);
void av_packet_free(AVPacket **pkt);
分配和释放编码前的帧结构:
AVFrame *av_frame_alloc(void);
void av_frame_free(AVFrame **frame);
根据编码前帧结构中设置的相关参数分配内存:
int av_frame_get_buffer(AVFrame *frame, int align);
确保帧结构中的内存可写:
int av_frame_make_writable(AVFrame *frame);
编码的一对函数,avcodec_send_frame向上下文发送原始数据,avcodec_receive_packet从上下文中接收编码后的数据。注意,由于编码器有自己的实现细节,并不过每次avcodec_send_frame调用后avcodec_receive_packet均能接收编码结果,可能上下文中还持有部分缓存,结束前需要做尾包处理。
int avcodec_send_frame(AVCodecContext *avctx, const AVFrame *frame);
int avcodec_receive_packet(AVCodecContext *avctx, AVPacket *avpkt);
代码举例:
下面这个例子演示了如何生成1K的纯音音频,并编码生成不包含标签的裸音频MP3文件,代码不过多解释,重点部分已注释说明,如下:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
extern "C"
{
#include <libavcodec/avcodec.h>
#include <libavutil/channel_layout.h>
}
// 编码帧
bool encode(AVCodecContext* pCodecCTX, const AVFrame* pFrame, AVPacket* pPacket, FILE* pFile);
int main(int argc, char* argv[])
{
// 搜索指定的编码器
const AVCodec* pCodec = avcodec_find_encoder_by_name("libmp3lame");
if (pCodec == NULL)
{
printf("not support mp3 encoder! \n");
return 0;
}
// 以下打印该编码器支持的样本格式、通道布局、采样率
printf("support sample formats: \n");
const enum AVSampleFormat* pSampleFMT = pCodec->sample_fmts;
while (pSampleFMT && *pSampleFMT)
{
printf("\t %d - %s \n", *pSampleFMT, av_get_sample_fmt_name(*pSampleFMT));
++pSampleFMT;
}
printf("support layouts: \n");
const uint64_t* pLayout = pCodec->channel_layouts;
while (pLayout && *pLayout)
{
int nb_channels = av_get_channel_layout_nb_channels(*pLayout);
char sBuf[128] = {0};
av_get_channel_layout_string(sBuf, sizeof(sBuf), nb_channels, *pLayout);
printf("\t %d - %s \n", nb_channels, sBuf);
++pLayout;
}
printf("support sample rates: \n");
const int* pSampleRate = pCodec->supported_samplerates;
while (pSampleRate && *pSampleRate)
{
printf("\t %dHz \n", *pSampleRate);
++pSampleRate;
}
// 根据编码器初使化编码器上下文结构,并设置相关默认值
AVCodecContext* pCodecCTX = avcodec_alloc_context3(pCodec);
if (pCodecCTX == NULL)
{
printf("alloc aac encoder context failed! \n");
return 0;
}
// 填写音频编码的关键参数:样本格式、通道数及布局、采样率
pCodecCTX->sample_fmt = AV_SAMPLE_FMT_S16P;
pCodecCTX->channel_layout = AV_CH_LAYOUT_STEREO;
pCodecCTX->channels = 2;
pCodecCTX->sample_rate = 44100;
// 以设定的参数打开编码器
int rc = avcodec_open2(pCodecCTX, pCodec, NULL);
if (rc < 0)
{
char sError[128] = {0};
av_strerror(rc, sError, sizeof(sError));
printf("avcodec_open2() ret:[%d:%s] \n", rc, sError);
return -1;
}
AVPacket* pPacket = av_packet_alloc();
AVFrame* pFrame = av_frame_alloc();
// 设置音频帧的相关参数:样本格式、通道数及布局、样本数量
pFrame->format = pCodecCTX->sample_fmt;
pFrame->channel_layout = pCodecCTX->channel_layout;
pFrame->channels = pCodecCTX->channels;
pFrame->nb_samples = pCodecCTX->frame_size;
// 根据参数设置,申请帧空间
rc = av_frame_get_buffer(pFrame, 0);
if (rc < 0)
{
char sError[128] = {0};
av_strerror(rc, sError, sizeof(sError));
printf("av_frame_get_buffer() ret:[%d:%s] \n", rc, sError);
return -1;
}
FILE* pFile = fopen("test.mp3", "wb");
if (pFile == NULL)
{
printf("open test.mp3 failed! \n");
return -1;
}
// 计算1Khz的正弦波采样步进
float fXStep = 2 * 3.1415926 * 1000 / pCodecCTX->sample_rate;
for (int n = 0; n < pCodecCTX->sample_rate * 10; )
{
av_frame_make_writable(pFrame);
// 下面以平面格式进行音频格式组装
for (int c = 0; c < pFrame->channels; ++c)
{
int16_t* pData = (int16_t*)pFrame->data[c];
for (int i = 0; i < pFrame->nb_samples; ++i)
{
pData[i] = (int16_t)(sin(fXStep * (n + i)) * 1000);
}
}
// 编码生成压缩格式
if (!encode(pCodecCTX, pFrame, pPacket, pFile))
{
printf("encode() fatal! \n");
exit(-1);
}
n += pFrame->nb_samples;
}
// 写入尾包
encode(pCodecCTX, NULL, pPacket, pFile);
printf("test.mp3 write succuss! \n");
fclose(pFile);
av_frame_free(&pFrame);
av_packet_free(&pPacket);
avcodec_free_context(&pCodecCTX);
return 0;
}
// 编码帧
bool encode(AVCodecContext* pCodecCTX, const AVFrame* pFrame, AVPacket* pPacket, FILE* pFile)
{
int rc = avcodec_send_frame(pCodecCTX, pFrame);
if (rc < 0)
{
char sError[128] = {0};
av_strerror(rc, sError, sizeof(sError));
printf("avcodec_send_frame() ret:[%d:%s] \n", rc, sError);
return false;
}
while (true)
{
rc = avcodec_receive_packet(pCodecCTX, pPacket);
if (rc < 0)
{
if (rc == AVERROR(EAGAIN) || rc == AVERROR_EOF)
return true;
char sError[128] = {0};
av_strerror(rc, sError, sizeof(sError));
printf("avcodec_receive_packet() ret:[%d:%s] \n", rc, sError);
return false;
}
fwrite(pPacket->data, 1, pPacket->size, pFile);
av_packet_unref(pPacket);
}
return true;
}
编译:
g++ -o encode_mp3 encode_mp3.cpp -I/usr/local/ffmpeg/include -L/usr/local/ffmpeg/lib -lavformat -lavcodec -lavutil
运行,输出如下:
$./encode_mp3
support sample formats:
7 - s32p
8 - fltp
6 - s16p
-1 - (null)
support layouts:
1 - mono
2 - stereo
support sample rates:
44100Hz
48000Hz
32000Hz
22050Hz
24000Hz
16000Hz
11025Hz
12000Hz
8000Hz
test.mp3 write succuss!
查看test.mp3媒体信息:
$ ffprobe.exe -i test.mp3
ffprobe version N-104465-g08a501946f Copyright (c) 2007-2021 the FFmpeg developers
built with gcc 11.2.0 (Rev6, Built by MSYS2 project)
configuration: --prefix=/usr/local/ffmpeg --enable-shared --enable-libmp3lame
libavutil 57. 7.100 / 57. 7.100
libavcodec 59. 12.100 / 59. 12.100
libavformat 59. 8.100 / 59. 8.100
libavdevice 59. 0.101 / 59. 0.101
libavfilter 8. 16.101 / 8. 16.101
libswscale 6. 1.100 / 6. 1.100
libswresample 4. 0.100 / 4. 0.100
[mp3 @ 000001f10bddcbc0] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from 'test.mp3':
Duration: 00:00:10.03, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s