1/*
2FLAC audio decoder. Choice of public domain or MIT-0. See license statements at the end of this file.
3dr_flac - v0.12.33 - 2021-12-22
4
5David Reid - mackron@gmail.com
6
7GitHub: https://github.com/mackron/dr_libs
8*/
9
10/*
11RELEASE NOTES - v0.12.0
12=======================
13Version 0.12.0 has breaking API changes including changes to the existing API and the removal of deprecated APIs.
14
15
16Improved Client-Defined Memory Allocation
17-----------------------------------------
18The main change with this release is the addition of a more flexible way of implementing custom memory allocation routines. The
19existing system of DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE are still in place and will be used by default when no custom
20allocation callbacks are specified.
21
22To use the new system, you pass in a pointer to a drflac_allocation_callbacks object to drflac_open() and family, like this:
23
24 void* my_malloc(size_t sz, void* pUserData)
25 {
26 return malloc(sz);
27 }
28 void* my_realloc(void* p, size_t sz, void* pUserData)
29 {
30 return realloc(p, sz);
31 }
32 void my_free(void* p, void* pUserData)
33 {
34 free(p);
35 }
36
37 ...
38
39 drflac_allocation_callbacks allocationCallbacks;
40 allocationCallbacks.pUserData = &myData;
41 allocationCallbacks.onMalloc = my_malloc;
42 allocationCallbacks.onRealloc = my_realloc;
43 allocationCallbacks.onFree = my_free;
44 drflac* pFlac = drflac_open_file("my_file.flac", &allocationCallbacks);
45
46The advantage of this new system is that it allows you to specify user data which will be passed in to the allocation routines.
47
48Passing in null for the allocation callbacks object will cause dr_flac to use defaults which is the same as DRFLAC_MALLOC,
49DRFLAC_REALLOC and DRFLAC_FREE and the equivalent of how it worked in previous versions.
50
51Every API that opens a drflac object now takes this extra parameter. These include the following:
52
53 drflac_open()
54 drflac_open_relaxed()
55 drflac_open_with_metadata()
56 drflac_open_with_metadata_relaxed()
57 drflac_open_file()
58 drflac_open_file_with_metadata()
59 drflac_open_memory()
60 drflac_open_memory_with_metadata()
61 drflac_open_and_read_pcm_frames_s32()
62 drflac_open_and_read_pcm_frames_s16()
63 drflac_open_and_read_pcm_frames_f32()
64 drflac_open_file_and_read_pcm_frames_s32()
65 drflac_open_file_and_read_pcm_frames_s16()
66 drflac_open_file_and_read_pcm_frames_f32()
67 drflac_open_memory_and_read_pcm_frames_s32()
68 drflac_open_memory_and_read_pcm_frames_s16()
69 drflac_open_memory_and_read_pcm_frames_f32()
70
71
72
73Optimizations
74-------------
75Seeking performance has been greatly improved. A new binary search based seeking algorithm has been introduced which significantly
76improves performance over the brute force method which was used when no seek table was present. Seek table based seeking also takes
77advantage of the new binary search seeking system to further improve performance there as well. Note that this depends on CRC which
78means it will be disabled when DR_FLAC_NO_CRC is used.
79
80The SSE4.1 pipeline has been cleaned up and optimized. You should see some improvements with decoding speed of 24-bit files in
81particular. 16-bit streams should also see some improvement.
82
83drflac_read_pcm_frames_s16() has been optimized. Previously this sat on top of drflac_read_pcm_frames_s32() and performed it's s32
84to s16 conversion in a second pass. This is now all done in a single pass. This includes SSE2 and ARM NEON optimized paths.
85
86A minor optimization has been implemented for drflac_read_pcm_frames_s32(). This will now use an SSE2 optimized pipeline for stereo
87channel reconstruction which is the last part of the decoding process.
88
89The ARM build has seen a few improvements. The CLZ (count leading zeroes) and REV (byte swap) instructions are now used when
90compiling with GCC and Clang which is achieved using inline assembly. The CLZ instruction requires ARM architecture version 5 at
91compile time and the REV instruction requires ARM architecture version 6.
92
93An ARM NEON optimized pipeline has been implemented. To enable this you'll need to add -mfpu=neon to the command line when compiling.
94
95
96Removed APIs
97------------
98The following APIs were deprecated in version 0.11.0 and have been completely removed in version 0.12.0:
99
100 drflac_read_s32() -> drflac_read_pcm_frames_s32()
101 drflac_read_s16() -> drflac_read_pcm_frames_s16()
102 drflac_read_f32() -> drflac_read_pcm_frames_f32()
103 drflac_seek_to_sample() -> drflac_seek_to_pcm_frame()
104 drflac_open_and_decode_s32() -> drflac_open_and_read_pcm_frames_s32()
105 drflac_open_and_decode_s16() -> drflac_open_and_read_pcm_frames_s16()
106 drflac_open_and_decode_f32() -> drflac_open_and_read_pcm_frames_f32()
107 drflac_open_and_decode_file_s32() -> drflac_open_file_and_read_pcm_frames_s32()
108 drflac_open_and_decode_file_s16() -> drflac_open_file_and_read_pcm_frames_s16()
109 drflac_open_and_decode_file_f32() -> drflac_open_file_and_read_pcm_frames_f32()
110 drflac_open_and_decode_memory_s32() -> drflac_open_memory_and_read_pcm_frames_s32()
111 drflac_open_and_decode_memory_s16() -> drflac_open_memory_and_read_pcm_frames_s16()
112 drflac_open_and_decode_memory_f32() -> drflac_open_memroy_and_read_pcm_frames_f32()
113
114Prior versions of dr_flac operated on a per-sample basis whereas now it operates on PCM frames. The removed APIs all relate
115to the old per-sample APIs. You now need to use the "pcm_frame" versions.
116*/
117
118
119/*
120Introduction
121============
122dr_flac is a single file library. To use it, do something like the following in one .c file.
123
124 ```c
125 #define DR_FLAC_IMPLEMENTATION
126 #include "dr_flac.h"
127 ```
128
129You can then #include this file in other parts of the program as you would with any other header file. To decode audio data, do something like the following:
130
131 ```c
132 drflac* pFlac = drflac_open_file("MySong.flac", NULL);
133 if (pFlac == NULL) {
134 // Failed to open FLAC file
135 }
136
137 drflac_int32* pSamples = malloc(pFlac->totalPCMFrameCount * pFlac->channels * sizeof(drflac_int32));
138 drflac_uint64 numberOfInterleavedSamplesActuallyRead = drflac_read_pcm_frames_s32(pFlac, pFlac->totalPCMFrameCount, pSamples);
139 ```
140
141The drflac object represents the decoder. It is a transparent type so all the information you need, such as the number of channels and the bits per sample,
142should be directly accessible - just make sure you don't change their values. Samples are always output as interleaved signed 32-bit PCM. In the example above
143a native FLAC stream was opened, however dr_flac has seamless support for Ogg encapsulated FLAC streams as well.
144
145You do not need to decode the entire stream in one go - you just specify how many samples you'd like at any given time and the decoder will give you as many
146samples as it can, up to the amount requested. Later on when you need the next batch of samples, just call it again. Example:
147
148 ```c
149 while (drflac_read_pcm_frames_s32(pFlac, chunkSizeInPCMFrames, pChunkSamples) > 0) {
150 do_something();
151 }
152 ```
153
154You can seek to a specific PCM frame with `drflac_seek_to_pcm_frame()`.
155
156If you just want to quickly decode an entire FLAC file in one go you can do something like this:
157
158 ```c
159 unsigned int channels;
160 unsigned int sampleRate;
161 drflac_uint64 totalPCMFrameCount;
162 drflac_int32* pSampleData = drflac_open_file_and_read_pcm_frames_s32("MySong.flac", &channels, &sampleRate, &totalPCMFrameCount, NULL);
163 if (pSampleData == NULL) {
164 // Failed to open and decode FLAC file.
165 }
166
167 ...
168
169 drflac_free(pSampleData, NULL);
170 ```
171
172You can read samples as signed 16-bit integer and 32-bit floating-point PCM with the *_s16() and *_f32() family of APIs respectively, but note that these
173should be considered lossy.
174
175
176If you need access to metadata (album art, etc.), use `drflac_open_with_metadata()`, `drflac_open_file_with_metdata()` or `drflac_open_memory_with_metadata()`.
177The rationale for keeping these APIs separate is that they're slightly slower than the normal versions and also just a little bit harder to use. dr_flac
178reports metadata to the application through the use of a callback, and every metadata block is reported before `drflac_open_with_metdata()` returns.
179
180The main opening APIs (`drflac_open()`, etc.) will fail if the header is not present. The presents a problem in certain scenarios such as broadcast style
181streams or internet radio where the header may not be present because the user has started playback mid-stream. To handle this, use the relaxed APIs:
182
183 `drflac_open_relaxed()`
184 `drflac_open_with_metadata_relaxed()`
185
186It is not recommended to use these APIs for file based streams because a missing header would usually indicate a corrupt or perverse file. In addition, these
187APIs can take a long time to initialize because they may need to spend a lot of time finding the first frame.
188
189
190
191Build Options
192=============
193#define these options before including this file.
194
195#define DR_FLAC_NO_STDIO
196 Disable `drflac_open_file()` and family.
197
198#define DR_FLAC_NO_OGG
199 Disables support for Ogg/FLAC streams.
200
201#define DR_FLAC_BUFFER_SIZE <number>
202 Defines the size of the internal buffer to store data from onRead(). This buffer is used to reduce the number of calls back to the client for more data.
203 Larger values means more memory, but better performance. My tests show diminishing returns after about 4KB (which is the default). Consider reducing this if
204 you have a very efficient implementation of onRead(), or increase it if it's very inefficient. Must be a multiple of 8.
205
206#define DR_FLAC_NO_CRC
207 Disables CRC checks. This will offer a performance boost when CRC is unnecessary. This will disable binary search seeking. When seeking, the seek table will
208 be used if available. Otherwise the seek will be performed using brute force.
209
210#define DR_FLAC_NO_SIMD
211 Disables SIMD optimizations (SSE on x86/x64 architectures, NEON on ARM architectures). Use this if you are having compatibility issues with your compiler.
212
213
214
215Notes
216=====
217- dr_flac does not support changing the sample rate nor channel count mid stream.
218- dr_flac is not thread-safe, but its APIs can be called from any thread so long as you do your own synchronization.
219- When using Ogg encapsulation, a corrupted metadata block will result in `drflac_open_with_metadata()` and `drflac_open()` returning inconsistent samples due
220 to differences in corrupted stream recorvery logic between the two APIs.
221*/
222
223#ifndef dr_flac_h
224#define dr_flac_h
225
226#ifdef __cplusplus
227extern "C" {
228#endif
229
230#define DRFLAC_STRINGIFY(x) #x
231#define DRFLAC_XSTRINGIFY(x) DRFLAC_STRINGIFY(x)
232
233#define DRFLAC_VERSION_MAJOR 0
234#define DRFLAC_VERSION_MINOR 12
235#define DRFLAC_VERSION_REVISION 33
236#define DRFLAC_VERSION_STRING DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MAJOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MINOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_REVISION)
237
238#include <stddef.h> /* For size_t. */
239
240/* Sized types. */
241typedef signed char drflac_int8;
242typedef unsigned char drflac_uint8;
243typedef signed short drflac_int16;
244typedef unsigned short drflac_uint16;
245typedef signed int drflac_int32;
246typedef unsigned int drflac_uint32;
247#if defined(_MSC_VER) && !defined(__clang__)
248 typedef signed __int64 drflac_int64;
249 typedef unsigned __int64 drflac_uint64;
250#else
251 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
252 #pragma GCC diagnostic push
253 #pragma GCC diagnostic ignored "-Wlong-long"
254 #if defined(__clang__)
255 #pragma GCC diagnostic ignored "-Wc++11-long-long"
256 #endif
257 #endif
258 typedef signed long long drflac_int64;
259 typedef unsigned long long drflac_uint64;
260 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
261 #pragma GCC diagnostic pop
262 #endif
263#endif
264#if defined(__LP64__) || defined(_WIN64) || (defined(__x86_64__) && !defined(__ILP32__)) || defined(_M_X64) || defined(__ia64) || defined(_M_IA64) || defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__)
265 typedef drflac_uint64 drflac_uintptr;
266#else
267 typedef drflac_uint32 drflac_uintptr;
268#endif
269typedef drflac_uint8 drflac_bool8;
270typedef drflac_uint32 drflac_bool32;
271#define DRFLAC_TRUE 1
272#define DRFLAC_FALSE 0
273
274#if !defined(DRFLAC_API)
275 #if defined(DRFLAC_DLL)
276 #if defined(_WIN32)
277 #define DRFLAC_DLL_IMPORT __declspec(dllimport)
278 #define DRFLAC_DLL_EXPORT __declspec(dllexport)
279 #define DRFLAC_DLL_PRIVATE static
280 #else
281 #if defined(__GNUC__) && __GNUC__ >= 4
282 #define DRFLAC_DLL_IMPORT __attribute__((visibility("default")))
283 #define DRFLAC_DLL_EXPORT __attribute__((visibility("default")))
284 #define DRFLAC_DLL_PRIVATE __attribute__((visibility("hidden")))
285 #else
286 #define DRFLAC_DLL_IMPORT
287 #define DRFLAC_DLL_EXPORT
288 #define DRFLAC_DLL_PRIVATE static
289 #endif
290 #endif
291
292 #if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
293 #define DRFLAC_API DRFLAC_DLL_EXPORT
294 #else
295 #define DRFLAC_API DRFLAC_DLL_IMPORT
296 #endif
297 #define DRFLAC_PRIVATE DRFLAC_DLL_PRIVATE
298 #else
299 #define DRFLAC_API extern
300 #define DRFLAC_PRIVATE static
301 #endif
302#endif
303
304#if defined(_MSC_VER) && _MSC_VER >= 1700 /* Visual Studio 2012 */
305 #define DRFLAC_DEPRECATED __declspec(deprecated)
306#elif (defined(__GNUC__) && __GNUC__ >= 4) /* GCC 4 */
307 #define DRFLAC_DEPRECATED __attribute__((deprecated))
308#elif defined(__has_feature) /* Clang */
309 #if __has_feature(attribute_deprecated)
310 #define DRFLAC_DEPRECATED __attribute__((deprecated))
311 #else
312 #define DRFLAC_DEPRECATED
313 #endif
314#else
315 #define DRFLAC_DEPRECATED
316#endif
317
318DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision);
319DRFLAC_API const char* drflac_version_string(void);
320
321/*
322As data is read from the client it is placed into an internal buffer for fast access. This controls the size of that buffer. Larger values means more speed,
323but also more memory. In my testing there is diminishing returns after about 4KB, but you can fiddle with this to suit your own needs. Must be a multiple of 8.
324*/
325#ifndef DR_FLAC_BUFFER_SIZE
326#define DR_FLAC_BUFFER_SIZE 4096
327#endif
328
329/* Check if we can enable 64-bit optimizations. */
330#if defined(_WIN64) || defined(_LP64) || defined(__LP64__)
331#define DRFLAC_64BIT
332#endif
333
334#ifdef DRFLAC_64BIT
335typedef drflac_uint64 drflac_cache_t;
336#else
337typedef drflac_uint32 drflac_cache_t;
338#endif
339
340/* The various metadata block types. */
341#define DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO 0
342#define DRFLAC_METADATA_BLOCK_TYPE_PADDING 1
343#define DRFLAC_METADATA_BLOCK_TYPE_APPLICATION 2
344#define DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE 3
345#define DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT 4
346#define DRFLAC_METADATA_BLOCK_TYPE_CUESHEET 5
347#define DRFLAC_METADATA_BLOCK_TYPE_PICTURE 6
348#define DRFLAC_METADATA_BLOCK_TYPE_INVALID 127
349
350/* The various picture types specified in the PICTURE block. */
351#define DRFLAC_PICTURE_TYPE_OTHER 0
352#define DRFLAC_PICTURE_TYPE_FILE_ICON 1
353#define DRFLAC_PICTURE_TYPE_OTHER_FILE_ICON 2
354#define DRFLAC_PICTURE_TYPE_COVER_FRONT 3
355#define DRFLAC_PICTURE_TYPE_COVER_BACK 4
356#define DRFLAC_PICTURE_TYPE_LEAFLET_PAGE 5
357#define DRFLAC_PICTURE_TYPE_MEDIA 6
358#define DRFLAC_PICTURE_TYPE_LEAD_ARTIST 7
359#define DRFLAC_PICTURE_TYPE_ARTIST 8
360#define DRFLAC_PICTURE_TYPE_CONDUCTOR 9
361#define DRFLAC_PICTURE_TYPE_BAND 10
362#define DRFLAC_PICTURE_TYPE_COMPOSER 11
363#define DRFLAC_PICTURE_TYPE_LYRICIST 12
364#define DRFLAC_PICTURE_TYPE_RECORDING_LOCATION 13
365#define DRFLAC_PICTURE_TYPE_DURING_RECORDING 14
366#define DRFLAC_PICTURE_TYPE_DURING_PERFORMANCE 15
367#define DRFLAC_PICTURE_TYPE_SCREEN_CAPTURE 16
368#define DRFLAC_PICTURE_TYPE_BRIGHT_COLORED_FISH 17
369#define DRFLAC_PICTURE_TYPE_ILLUSTRATION 18
370#define DRFLAC_PICTURE_TYPE_BAND_LOGOTYPE 19
371#define DRFLAC_PICTURE_TYPE_PUBLISHER_LOGOTYPE 20
372
373typedef enum
374{
375 drflac_container_native,
376 drflac_container_ogg,
377 drflac_container_unknown
378} drflac_container;
379
380typedef enum
381{
382 drflac_seek_origin_start,
383 drflac_seek_origin_current
384} drflac_seek_origin;
385
386/* Packing is important on this structure because we map this directly to the raw data within the SEEKTABLE metadata block. */
387#pragma pack(2)
388typedef struct
389{
390 drflac_uint64 firstPCMFrame;
391 drflac_uint64 flacFrameOffset; /* The offset from the first byte of the header of the first frame. */
392 drflac_uint16 pcmFrameCount;
393} drflac_seekpoint;
394#pragma pack()
395
396typedef struct
397{
398 drflac_uint16 minBlockSizeInPCMFrames;
399 drflac_uint16 maxBlockSizeInPCMFrames;
400 drflac_uint32 minFrameSizeInPCMFrames;
401 drflac_uint32 maxFrameSizeInPCMFrames;
402 drflac_uint32 sampleRate;
403 drflac_uint8 channels;
404 drflac_uint8 bitsPerSample;
405 drflac_uint64 totalPCMFrameCount;
406 drflac_uint8 md5[16];
407} drflac_streaminfo;
408
409typedef struct
410{
411 /*
412 The metadata type. Use this to know how to interpret the data below. Will be set to one of the
413 DRFLAC_METADATA_BLOCK_TYPE_* tokens.
414 */
415 drflac_uint32 type;
416
417 /*
418 A pointer to the raw data. This points to a temporary buffer so don't hold on to it. It's best to
419 not modify the contents of this buffer. Use the structures below for more meaningful and structured
420 information about the metadata. It's possible for this to be null.
421 */
422 const void* pRawData;
423
424 /* The size in bytes of the block and the buffer pointed to by pRawData if it's non-NULL. */
425 drflac_uint32 rawDataSize;
426
427 union
428 {
429 drflac_streaminfo streaminfo;
430
431 struct
432 {
433 int unused;
434 } padding;
435
436 struct
437 {
438 drflac_uint32 id;
439 const void* pData;
440 drflac_uint32 dataSize;
441 } application;
442
443 struct
444 {
445 drflac_uint32 seekpointCount;
446 const drflac_seekpoint* pSeekpoints;
447 } seektable;
448
449 struct
450 {
451 drflac_uint32 vendorLength;
452 const char* vendor;
453 drflac_uint32 commentCount;
454 const void* pComments;
455 } vorbis_comment;
456
457 struct
458 {
459 char catalog[128];
460 drflac_uint64 leadInSampleCount;
461 drflac_bool32 isCD;
462 drflac_uint8 trackCount;
463 const void* pTrackData;
464 } cuesheet;
465
466 struct
467 {
468 drflac_uint32 type;
469 drflac_uint32 mimeLength;
470 const char* mime;
471 drflac_uint32 descriptionLength;
472 const char* description;
473 drflac_uint32 width;
474 drflac_uint32 height;
475 drflac_uint32 colorDepth;
476 drflac_uint32 indexColorCount;
477 drflac_uint32 pictureDataSize;
478 const drflac_uint8* pPictureData;
479 } picture;
480 } data;
481} drflac_metadata;
482
483
484/*
485Callback for when data needs to be read from the client.
486
487
488Parameters
489----------
490pUserData (in)
491 The user data that was passed to drflac_open() and family.
492
493pBufferOut (out)
494 The output buffer.
495
496bytesToRead (in)
497 The number of bytes to read.
498
499
500Return Value
501------------
502The number of bytes actually read.
503
504
505Remarks
506-------
507A return value of less than bytesToRead indicates the end of the stream. Do _not_ return from this callback until either the entire bytesToRead is filled or
508you have reached the end of the stream.
509*/
510typedef size_t (* drflac_read_proc)(void* pUserData, void* pBufferOut, size_t bytesToRead);
511
512/*
513Callback for when data needs to be seeked.
514
515
516Parameters
517----------
518pUserData (in)
519 The user data that was passed to drflac_open() and family.
520
521offset (in)
522 The number of bytes to move, relative to the origin. Will never be negative.
523
524origin (in)
525 The origin of the seek - the current position or the start of the stream.
526
527
528Return Value
529------------
530Whether or not the seek was successful.
531
532
533Remarks
534-------
535The offset will never be negative. Whether or not it is relative to the beginning or current position is determined by the "origin" parameter which will be
536either drflac_seek_origin_start or drflac_seek_origin_current.
537
538When seeking to a PCM frame using drflac_seek_to_pcm_frame(), dr_flac may call this with an offset beyond the end of the FLAC stream. This needs to be detected
539and handled by returning DRFLAC_FALSE.
540*/
541typedef drflac_bool32 (* drflac_seek_proc)(void* pUserData, int offset, drflac_seek_origin origin);
542
543/*
544Callback for when a metadata block is read.
545
546
547Parameters
548----------
549pUserData (in)
550 The user data that was passed to drflac_open() and family.
551
552pMetadata (in)
553 A pointer to a structure containing the data of the metadata block.
554
555
556Remarks
557-------
558Use pMetadata->type to determine which metadata block is being handled and how to read the data. This
559will be set to one of the DRFLAC_METADATA_BLOCK_TYPE_* tokens.
560*/
561typedef void (* drflac_meta_proc)(void* pUserData, drflac_metadata* pMetadata);
562
563
564typedef struct
565{
566 void* pUserData;
567 void* (* onMalloc)(size_t sz, void* pUserData);
568 void* (* onRealloc)(void* p, size_t sz, void* pUserData);
569 void (* onFree)(void* p, void* pUserData);
570} drflac_allocation_callbacks;
571
572/* Structure for internal use. Only used for decoders opened with drflac_open_memory. */
573typedef struct
574{
575 const drflac_uint8* data;
576 size_t dataSize;
577 size_t currentReadPos;
578} drflac__memory_stream;
579
580/* Structure for internal use. Used for bit streaming. */
581typedef struct
582{
583 /* The function to call when more data needs to be read. */
584 drflac_read_proc onRead;
585
586 /* The function to call when the current read position needs to be moved. */
587 drflac_seek_proc onSeek;
588
589 /* The user data to pass around to onRead and onSeek. */
590 void* pUserData;
591
592
593 /*
594 The number of unaligned bytes in the L2 cache. This will always be 0 until the end of the stream is hit. At the end of the
595 stream there will be a number of bytes that don't cleanly fit in an L1 cache line, so we use this variable to know whether
596 or not the bistreamer needs to run on a slower path to read those last bytes. This will never be more than sizeof(drflac_cache_t).
597 */
598 size_t unalignedByteCount;
599
600 /* The content of the unaligned bytes. */
601 drflac_cache_t unalignedCache;
602
603 /* The index of the next valid cache line in the "L2" cache. */
604 drflac_uint32 nextL2Line;
605
606 /* The number of bits that have been consumed by the cache. This is used to determine how many valid bits are remaining. */
607 drflac_uint32 consumedBits;
608
609 /*
610 The cached data which was most recently read from the client. There are two levels of cache. Data flows as such:
611 Client -> L2 -> L1. The L2 -> L1 movement is aligned and runs on a fast path in just a few instructions.
612 */
613 drflac_cache_t cacheL2[DR_FLAC_BUFFER_SIZE/sizeof(drflac_cache_t)];
614 drflac_cache_t cache;
615
616 /*
617 CRC-16. This is updated whenever bits are read from the bit stream. Manually set this to 0 to reset the CRC. For FLAC, this
618 is reset to 0 at the beginning of each frame.
619 */
620 drflac_uint16 crc16;
621 drflac_cache_t crc16Cache; /* A cache for optimizing CRC calculations. This is filled when when the L1 cache is reloaded. */
622 drflac_uint32 crc16CacheIgnoredBytes; /* The number of bytes to ignore when updating the CRC-16 from the CRC-16 cache. */
623} drflac_bs;
624
625typedef struct
626{
627 /* The type of the subframe: SUBFRAME_CONSTANT, SUBFRAME_VERBATIM, SUBFRAME_FIXED or SUBFRAME_LPC. */
628 drflac_uint8 subframeType;
629
630 /* The number of wasted bits per sample as specified by the sub-frame header. */
631 drflac_uint8 wastedBitsPerSample;
632
633 /* The order to use for the prediction stage for SUBFRAME_FIXED and SUBFRAME_LPC. */
634 drflac_uint8 lpcOrder;
635
636 /* A pointer to the buffer containing the decoded samples in the subframe. This pointer is an offset from drflac::pExtraData. */
637 drflac_int32* pSamplesS32;
638} drflac_subframe;
639
640typedef struct
641{
642 /*
643 If the stream uses variable block sizes, this will be set to the index of the first PCM frame. If fixed block sizes are used, this will
644 always be set to 0. This is 64-bit because the decoded PCM frame number will be 36 bits.
645 */
646 drflac_uint64 pcmFrameNumber;
647
648 /*
649 If the stream uses fixed block sizes, this will be set to the frame number. If variable block sizes are used, this will always be 0. This
650 is 32-bit because in fixed block sizes, the maximum frame number will be 31 bits.
651 */
652 drflac_uint32 flacFrameNumber;
653
654 /* The sample rate of this frame. */
655 drflac_uint32 sampleRate;
656
657 /* The number of PCM frames in each sub-frame within this frame. */
658 drflac_uint16 blockSizeInPCMFrames;
659
660 /*
661 The channel assignment of this frame. This is not always set to the channel count. If interchannel decorrelation is being used this
662 will be set to DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE, DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE or DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE.
663 */
664 drflac_uint8 channelAssignment;
665
666 /* The number of bits per sample within this frame. */
667 drflac_uint8 bitsPerSample;
668
669 /* The frame's CRC. */
670 drflac_uint8 crc8;
671} drflac_frame_header;
672
673typedef struct
674{
675 /* The header. */
676 drflac_frame_header header;
677
678 /*
679 The number of PCM frames left to be read in this FLAC frame. This is initially set to the block size. As PCM frames are read,
680 this will be decremented. When it reaches 0, the decoder will see this frame as fully consumed and load the next frame.
681 */
682 drflac_uint32 pcmFramesRemaining;
683
684 /* The list of sub-frames within the frame. There is one sub-frame for each channel, and there's a maximum of 8 channels. */
685 drflac_subframe subframes[8];
686} drflac_frame;
687
688typedef struct
689{
690 /* The function to call when a metadata block is read. */
691 drflac_meta_proc onMeta;
692
693 /* The user data posted to the metadata callback function. */
694 void* pUserDataMD;
695
696 /* Memory allocation callbacks. */
697 drflac_allocation_callbacks allocationCallbacks;
698
699
700 /* The sample rate. Will be set to something like 44100. */
701 drflac_uint32 sampleRate;
702
703 /*
704 The number of channels. This will be set to 1 for monaural streams, 2 for stereo, etc. Maximum 8. This is set based on the
705 value specified in the STREAMINFO block.
706 */
707 drflac_uint8 channels;
708
709 /* The bits per sample. Will be set to something like 16, 24, etc. */
710 drflac_uint8 bitsPerSample;
711
712 /* The maximum block size, in samples. This number represents the number of samples in each channel (not combined). */
713 drflac_uint16 maxBlockSizeInPCMFrames;
714
715 /*
716 The total number of PCM Frames making up the stream. Can be 0 in which case it's still a valid stream, but just means
717 the total PCM frame count is unknown. Likely the case with streams like internet radio.
718 */
719 drflac_uint64 totalPCMFrameCount;
720
721
722 /* The container type. This is set based on whether or not the decoder was opened from a native or Ogg stream. */
723 drflac_container container;
724
725 /* The number of seekpoints in the seektable. */
726 drflac_uint32 seekpointCount;
727
728
729 /* Information about the frame the decoder is currently sitting on. */
730 drflac_frame currentFLACFrame;
731
732
733 /* The index of the PCM frame the decoder is currently sitting on. This is only used for seeking. */
734 drflac_uint64 currentPCMFrame;
735
736 /* The position of the first FLAC frame in the stream. This is only ever used for seeking. */
737 drflac_uint64 firstFLACFramePosInBytes;
738
739
740 /* A hack to avoid a malloc() when opening a decoder with drflac_open_memory(). */
741 drflac__memory_stream memoryStream;
742
743
744 /* A pointer to the decoded sample data. This is an offset of pExtraData. */
745 drflac_int32* pDecodedSamples;
746
747 /* A pointer to the seek table. This is an offset of pExtraData, or NULL if there is no seek table. */
748 drflac_seekpoint* pSeekpoints;
749
750 /* Internal use only. Only used with Ogg containers. Points to a drflac_oggbs object. This is an offset of pExtraData. */
751 void* _oggbs;
752
753 /* Internal use only. Used for profiling and testing different seeking modes. */
754 drflac_bool32 _noSeekTableSeek : 1;
755 drflac_bool32 _noBinarySearchSeek : 1;
756 drflac_bool32 _noBruteForceSeek : 1;
757
758 /* The bit streamer. The raw FLAC data is fed through this object. */
759 drflac_bs bs;
760
761 /* Variable length extra data. We attach this to the end of the object so we can avoid unnecessary mallocs. */
762 drflac_uint8 pExtraData[1];
763} drflac;
764
765
766/*
767Opens a FLAC decoder.
768
769
770Parameters
771----------
772onRead (in)
773 The function to call when data needs to be read from the client.
774
775onSeek (in)
776 The function to call when the read position of the client data needs to move.
777
778pUserData (in, optional)
779 A pointer to application defined data that will be passed to onRead and onSeek.
780
781pAllocationCallbacks (in, optional)
782 A pointer to application defined callbacks for managing memory allocations.
783
784
785Return Value
786------------
787Returns a pointer to an object representing the decoder.
788
789
790Remarks
791-------
792Close the decoder with `drflac_close()`.
793
794`pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
795
796This function will automatically detect whether or not you are attempting to open a native or Ogg encapsulated FLAC, both of which should work seamlessly
797without any manual intervention. Ogg encapsulation also works with multiplexed streams which basically means it can play FLAC encoded audio tracks in videos.
798
799This is the lowest level function for opening a FLAC stream. You can also use `drflac_open_file()` and `drflac_open_memory()` to open the stream from a file or
800from a block of memory respectively.
801
802The STREAMINFO block must be present for this to succeed. Use `drflac_open_relaxed()` to open a FLAC stream where the header may not be present.
803
804Use `drflac_open_with_metadata()` if you need access to metadata.
805
806
807Seek Also
808---------
809drflac_open_file()
810drflac_open_memory()
811drflac_open_with_metadata()
812drflac_close()
813*/
814DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
815
816/*
817Opens a FLAC stream with relaxed validation of the header block.
818
819
820Parameters
821----------
822onRead (in)
823 The function to call when data needs to be read from the client.
824
825onSeek (in)
826 The function to call when the read position of the client data needs to move.
827
828container (in)
829 Whether or not the FLAC stream is encapsulated using standard FLAC encapsulation or Ogg encapsulation.
830
831pUserData (in, optional)
832 A pointer to application defined data that will be passed to onRead and onSeek.
833
834pAllocationCallbacks (in, optional)
835 A pointer to application defined callbacks for managing memory allocations.
836
837
838Return Value
839------------
840A pointer to an object representing the decoder.
841
842
843Remarks
844-------
845The same as drflac_open(), except attempts to open the stream even when a header block is not present.
846
847Because the header is not necessarily available, the caller must explicitly define the container (Native or Ogg). Do not set this to `drflac_container_unknown`
848as that is for internal use only.
849
850Opening in relaxed mode will continue reading data from onRead until it finds a valid frame. If a frame is never found it will continue forever. To abort,
851force your `onRead` callback to return 0, which dr_flac will use as an indicator that the end of the stream was found.
852
853Use `drflac_open_with_metadata_relaxed()` if you need access to metadata.
854*/
855DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
856
857/*
858Opens a FLAC decoder and notifies the caller of the metadata chunks (album art, etc.).
859
860
861Parameters
862----------
863onRead (in)
864 The function to call when data needs to be read from the client.
865
866onSeek (in)
867 The function to call when the read position of the client data needs to move.
868
869onMeta (in)
870 The function to call for every metadata block.
871
872pUserData (in, optional)
873 A pointer to application defined data that will be passed to onRead, onSeek and onMeta.
874
875pAllocationCallbacks (in, optional)
876 A pointer to application defined callbacks for managing memory allocations.
877
878
879Return Value
880------------
881A pointer to an object representing the decoder.
882
883
884Remarks
885-------
886Close the decoder with `drflac_close()`.
887
888`pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
889
890This is slower than `drflac_open()`, so avoid this one if you don't need metadata. Internally, this will allocate and free memory on the heap for every
891metadata block except for STREAMINFO and PADDING blocks.
892
893The caller is notified of the metadata via the `onMeta` callback. All metadata blocks will be handled before the function returns. This callback takes a
894pointer to a `drflac_metadata` object which is a union containing the data of all relevant metadata blocks. Use the `type` member to discriminate against
895the different metadata types.
896
897The STREAMINFO block must be present for this to succeed. Use `drflac_open_with_metadata_relaxed()` to open a FLAC stream where the header may not be present.
898
899Note that this will behave inconsistently with `drflac_open()` if the stream is an Ogg encapsulated stream and a metadata block is corrupted. This is due to
900the way the Ogg stream recovers from corrupted pages. When `drflac_open_with_metadata()` is being used, the open routine will try to read the contents of the
901metadata block, whereas `drflac_open()` will simply seek past it (for the sake of efficiency). This inconsistency can result in different samples being
902returned depending on whether or not the stream is being opened with metadata.
903
904
905Seek Also
906---------
907drflac_open_file_with_metadata()
908drflac_open_memory_with_metadata()
909drflac_open()
910drflac_close()
911*/
912DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
913
914/*
915The same as drflac_open_with_metadata(), except attempts to open the stream even when a header block is not present.
916
917See Also
918--------
919drflac_open_with_metadata()
920drflac_open_relaxed()
921*/
922DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
923
924/*
925Closes the given FLAC decoder.
926
927
928Parameters
929----------
930pFlac (in)
931 The decoder to close.
932
933
934Remarks
935-------
936This will destroy the decoder object.
937
938
939See Also
940--------
941drflac_open()
942drflac_open_with_metadata()
943drflac_open_file()
944drflac_open_file_w()
945drflac_open_file_with_metadata()
946drflac_open_file_with_metadata_w()
947drflac_open_memory()
948drflac_open_memory_with_metadata()
949*/
950DRFLAC_API void drflac_close(drflac* pFlac);
951
952
953/*
954Reads sample data from the given FLAC decoder, output as interleaved signed 32-bit PCM.
955
956
957Parameters
958----------
959pFlac (in)
960 The decoder.
961
962framesToRead (in)
963 The number of PCM frames to read.
964
965pBufferOut (out, optional)
966 A pointer to the buffer that will receive the decoded samples.
967
968
969Return Value
970------------
971Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
972
973
974Remarks
975-------
976pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
977*/
978DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut);
979
980
981/*
982Reads sample data from the given FLAC decoder, output as interleaved signed 16-bit PCM.
983
984
985Parameters
986----------
987pFlac (in)
988 The decoder.
989
990framesToRead (in)
991 The number of PCM frames to read.
992
993pBufferOut (out, optional)
994 A pointer to the buffer that will receive the decoded samples.
995
996
997Return Value
998------------
999Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
1000
1001
1002Remarks
1003-------
1004pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
1005
1006Note that this is lossy for streams where the bits per sample is larger than 16.
1007*/
1008DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut);
1009
1010/*
1011Reads sample data from the given FLAC decoder, output as interleaved 32-bit floating point PCM.
1012
1013
1014Parameters
1015----------
1016pFlac (in)
1017 The decoder.
1018
1019framesToRead (in)
1020 The number of PCM frames to read.
1021
1022pBufferOut (out, optional)
1023 A pointer to the buffer that will receive the decoded samples.
1024
1025
1026Return Value
1027------------
1028Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
1029
1030
1031Remarks
1032-------
1033pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
1034
1035Note that this should be considered lossy due to the nature of floating point numbers not being able to exactly represent every possible number.
1036*/
1037DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut);
1038
1039/*
1040Seeks to the PCM frame at the given index.
1041
1042
1043Parameters
1044----------
1045pFlac (in)
1046 The decoder.
1047
1048pcmFrameIndex (in)
1049 The index of the PCM frame to seek to. See notes below.
1050
1051
1052Return Value
1053-------------
1054`DRFLAC_TRUE` if successful; `DRFLAC_FALSE` otherwise.
1055*/
1056DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex);
1057
1058
1059
1060#ifndef DR_FLAC_NO_STDIO
1061/*
1062Opens a FLAC decoder from the file at the given path.
1063
1064
1065Parameters
1066----------
1067pFileName (in)
1068 The path of the file to open, either absolute or relative to the current directory.
1069
1070pAllocationCallbacks (in, optional)
1071 A pointer to application defined callbacks for managing memory allocations.
1072
1073
1074Return Value
1075------------
1076A pointer to an object representing the decoder.
1077
1078
1079Remarks
1080-------
1081Close the decoder with drflac_close().
1082
1083
1084Remarks
1085-------
1086This will hold a handle to the file until the decoder is closed with drflac_close(). Some platforms will restrict the number of files a process can have open
1087at any given time, so keep this mind if you have many decoders open at the same time.
1088
1089
1090See Also
1091--------
1092drflac_open_file_with_metadata()
1093drflac_open()
1094drflac_close()
1095*/
1096DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
1097DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
1098
1099/*
1100Opens a FLAC decoder from the file at the given path and notifies the caller of the metadata chunks (album art, etc.)
1101
1102
1103Parameters
1104----------
1105pFileName (in)
1106 The path of the file to open, either absolute or relative to the current directory.
1107
1108pAllocationCallbacks (in, optional)
1109 A pointer to application defined callbacks for managing memory allocations.
1110
1111onMeta (in)
1112 The callback to fire for each metadata block.
1113
1114pUserData (in)
1115 A pointer to the user data to pass to the metadata callback.
1116
1117pAllocationCallbacks (in)
1118 A pointer to application defined callbacks for managing memory allocations.
1119
1120
1121Remarks
1122-------
1123Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
1124
1125
1126See Also
1127--------
1128drflac_open_with_metadata()
1129drflac_open()
1130drflac_close()
1131*/
1132DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1133DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1134#endif
1135
1136/*
1137Opens a FLAC decoder from a pre-allocated block of memory
1138
1139
1140Parameters
1141----------
1142pData (in)
1143 A pointer to the raw encoded FLAC data.
1144
1145dataSize (in)
1146 The size in bytes of `data`.
1147
1148pAllocationCallbacks (in)
1149 A pointer to application defined callbacks for managing memory allocations.
1150
1151
1152Return Value
1153------------
1154A pointer to an object representing the decoder.
1155
1156
1157Remarks
1158-------
1159This does not create a copy of the data. It is up to the application to ensure the buffer remains valid for the lifetime of the decoder.
1160
1161
1162See Also
1163--------
1164drflac_open()
1165drflac_close()
1166*/
1167DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks);
1168
1169/*
1170Opens a FLAC decoder from a pre-allocated block of memory and notifies the caller of the metadata chunks (album art, etc.)
1171
1172
1173Parameters
1174----------
1175pData (in)
1176 A pointer to the raw encoded FLAC data.
1177
1178dataSize (in)
1179 The size in bytes of `data`.
1180
1181onMeta (in)
1182 The callback to fire for each metadata block.
1183
1184pUserData (in)
1185 A pointer to the user data to pass to the metadata callback.
1186
1187pAllocationCallbacks (in)
1188 A pointer to application defined callbacks for managing memory allocations.
1189
1190
1191Remarks
1192-------
1193Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
1194
1195
1196See Also
1197-------
1198drflac_open_with_metadata()
1199drflac_open()
1200drflac_close()
1201*/
1202DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1203
1204
1205
1206/* High Level APIs */
1207
1208/*
1209Opens a FLAC stream from the given callbacks and fully decodes it in a single operation. The return value is a
1210pointer to the sample data as interleaved signed 32-bit PCM. The returned data must be freed with drflac_free().
1211
1212You can pass in custom memory allocation callbacks via the pAllocationCallbacks parameter. This can be NULL in which
1213case it will use DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
1214
1215Sometimes a FLAC file won't keep track of the total sample count. In this situation the function will continuously
1216read samples into a dynamically sized buffer on the heap until no samples are left.
1217
1218Do not call this function on a broadcast type of stream (like internet radio streams and whatnot).
1219*/
1220DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1221
1222/* Same as drflac_open_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1223DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1224
1225/* Same as drflac_open_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1226DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1227
1228#ifndef DR_FLAC_NO_STDIO
1229/* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a file. */
1230DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1231
1232/* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1233DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1234
1235/* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1236DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1237#endif
1238
1239/* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a block of memory. */
1240DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1241
1242/* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1243DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1244
1245/* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1246DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1247
1248/*
1249Frees memory that was allocated internally by dr_flac.
1250
1251Set pAllocationCallbacks to the same object that was passed to drflac_open_*_and_read_pcm_frames_*(). If you originally passed in NULL, pass in NULL for this.
1252*/
1253DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks);
1254
1255
1256/* Structure representing an iterator for vorbis comments in a VORBIS_COMMENT metadata block. */
1257typedef struct
1258{
1259 drflac_uint32 countRemaining;
1260 const char* pRunningData;
1261} drflac_vorbis_comment_iterator;
1262
1263/*
1264Initializes a vorbis comment iterator. This can be used for iterating over the vorbis comments in a VORBIS_COMMENT
1265metadata block.
1266*/
1267DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments);
1268
1269/*
1270Goes to the next vorbis comment in the given iterator. If null is returned it means there are no more comments. The
1271returned string is NOT null terminated.
1272*/
1273DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut);
1274
1275
1276/* Structure representing an iterator for cuesheet tracks in a CUESHEET metadata block. */
1277typedef struct
1278{
1279 drflac_uint32 countRemaining;
1280 const char* pRunningData;
1281} drflac_cuesheet_track_iterator;
1282
1283/* Packing is important on this structure because we map this directly to the raw data within the CUESHEET metadata block. */
1284#pragma pack(4)
1285typedef struct
1286{
1287 drflac_uint64 offset;
1288 drflac_uint8 index;
1289 drflac_uint8 reserved[3];
1290} drflac_cuesheet_track_index;
1291#pragma pack()
1292
1293typedef struct
1294{
1295 drflac_uint64 offset;
1296 drflac_uint8 trackNumber;
1297 char ISRC[12];
1298 drflac_bool8 isAudio;
1299 drflac_bool8 preEmphasis;
1300 drflac_uint8 indexCount;
1301 const drflac_cuesheet_track_index* pIndexPoints;
1302} drflac_cuesheet_track;
1303
1304/*
1305Initializes a cuesheet track iterator. This can be used for iterating over the cuesheet tracks in a CUESHEET metadata
1306block.
1307*/
1308DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData);
1309
1310/* Goes to the next cuesheet track in the given iterator. If DRFLAC_FALSE is returned it means there are no more comments. */
1311DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack);
1312
1313
1314#ifdef __cplusplus
1315}
1316#endif
1317#endif /* dr_flac_h */
1318
1319
1320/************************************************************************************************************************************************************
1321 ************************************************************************************************************************************************************
1322
1323 IMPLEMENTATION
1324
1325 ************************************************************************************************************************************************************
1326 ************************************************************************************************************************************************************/
1327#if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
1328#ifndef dr_flac_c
1329#define dr_flac_c
1330
1331/* Disable some annoying warnings. */
1332#if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
1333 #pragma GCC diagnostic push
1334 #if __GNUC__ >= 7
1335 #pragma GCC diagnostic ignored "-Wimplicit-fallthrough"
1336 #endif
1337#endif
1338
1339#ifdef __linux__
1340 #ifndef _BSD_SOURCE
1341 #define _BSD_SOURCE
1342 #endif
1343 #ifndef _DEFAULT_SOURCE
1344 #define _DEFAULT_SOURCE
1345 #endif
1346 #ifndef __USE_BSD
1347 #define __USE_BSD
1348 #endif
1349 #include <endian.h>
1350#endif
1351
1352#include <stdlib.h>
1353#include <string.h>
1354
1355#ifdef _MSC_VER
1356 #define DRFLAC_INLINE __forceinline
1357#elif defined(__GNUC__)
1358 /*
1359 I've had a bug report where GCC is emitting warnings about functions possibly not being inlineable. This warning happens when
1360 the __attribute__((always_inline)) attribute is defined without an "inline" statement. I think therefore there must be some
1361 case where "__inline__" is not always defined, thus the compiler emitting these warnings. When using -std=c89 or -ansi on the
1362 command line, we cannot use the "inline" keyword and instead need to use "__inline__". In an attempt to work around this issue
1363 I am using "__inline__" only when we're compiling in strict ANSI mode.
1364 */
1365 #if defined(__STRICT_ANSI__)
1366 #define DRFLAC_INLINE __inline__ __attribute__((always_inline))
1367 #else
1368 #define DRFLAC_INLINE inline __attribute__((always_inline))
1369 #endif
1370#elif defined(__WATCOMC__)
1371 #define DRFLAC_INLINE __inline
1372#else
1373 #define DRFLAC_INLINE
1374#endif
1375
1376/* CPU architecture. */
1377#if defined(__x86_64__) || defined(_M_X64)
1378 #define DRFLAC_X64
1379#elif defined(__i386) || defined(_M_IX86)
1380 #define DRFLAC_X86
1381#elif defined(__arm__) || defined(_M_ARM) || defined(_M_ARM64)
1382 #define DRFLAC_ARM
1383#endif
1384
1385/*
1386Intrinsics Support
1387
1388There's a bug in GCC 4.2.x which results in an incorrect compilation error when using _mm_slli_epi32() where it complains with
1389
1390 "error: shift must be an immediate"
1391
1392Unfortuantely dr_flac depends on this for a few things so we're just going to disable SSE on GCC 4.2 and below.
1393*/
1394#if !defined(DR_FLAC_NO_SIMD)
1395 #if defined(DRFLAC_X64) || defined(DRFLAC_X86)
1396 #if defined(_MSC_VER) && !defined(__clang__)
1397 /* MSVC. */
1398 #if _MSC_VER >= 1400 && !defined(DRFLAC_NO_SSE2) /* 2005 */
1399 #define DRFLAC_SUPPORT_SSE2
1400 #endif
1401 #if _MSC_VER >= 1600 && !defined(DRFLAC_NO_SSE41) /* 2010 */
1402 #define DRFLAC_SUPPORT_SSE41
1403 #endif
1404 #elif defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)))
1405 /* Assume GNUC-style. */
1406 #if defined(__SSE2__) && !defined(DRFLAC_NO_SSE2)
1407 #define DRFLAC_SUPPORT_SSE2
1408 #endif
1409 #if defined(__SSE4_1__) && !defined(DRFLAC_NO_SSE41)
1410 #define DRFLAC_SUPPORT_SSE41
1411 #endif
1412 #endif
1413
1414 /* If at this point we still haven't determined compiler support for the intrinsics just fall back to __has_include. */
1415 #if !defined(__GNUC__) && !defined(__clang__) && defined(__has_include)
1416 #if !defined(DRFLAC_SUPPORT_SSE2) && !defined(DRFLAC_NO_SSE2) && __has_include(<emmintrin.h>)
1417 #define DRFLAC_SUPPORT_SSE2
1418 #endif
1419 #if !defined(DRFLAC_SUPPORT_SSE41) && !defined(DRFLAC_NO_SSE41) && __has_include(<smmintrin.h>)
1420 #define DRFLAC_SUPPORT_SSE41
1421 #endif
1422 #endif
1423
1424 #if defined(DRFLAC_SUPPORT_SSE41)
1425 #include <smmintrin.h>
1426 #elif defined(DRFLAC_SUPPORT_SSE2)
1427 #include <emmintrin.h>
1428 #endif
1429 #endif
1430
1431 #if defined(DRFLAC_ARM)
1432 #if !defined(DRFLAC_NO_NEON) && (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
1433 #define DRFLAC_SUPPORT_NEON
1434 #endif
1435
1436 /* Fall back to looking for the #include file. */
1437 #if !defined(__GNUC__) && !defined(__clang__) && defined(__has_include)
1438 #if !defined(DRFLAC_SUPPORT_NEON) && !defined(DRFLAC_NO_NEON) && __has_include(<arm_neon.h>)
1439 #define DRFLAC_SUPPORT_NEON
1440 #endif
1441 #endif
1442
1443 #if defined(DRFLAC_SUPPORT_NEON)
1444 #include <arm_neon.h>
1445 #endif
1446 #endif
1447#endif
1448
1449/* Compile-time CPU feature support. */
1450#if !defined(DR_FLAC_NO_SIMD) && (defined(DRFLAC_X86) || defined(DRFLAC_X64))
1451 #if defined(_MSC_VER) && !defined(__clang__)
1452 #if _MSC_VER >= 1400
1453 #include <intrin.h>
1454 static void drflac__cpuid(int info[4], int fid)
1455 {
1456 __cpuid(info, fid);
1457 }
1458 #else
1459 #define DRFLAC_NO_CPUID
1460 #endif
1461 #else
1462 #if defined(__GNUC__) || defined(__clang__)
1463 static void drflac__cpuid(int info[4], int fid)
1464 {
1465 /*
1466 It looks like the -fPIC option uses the ebx register which GCC complains about. We can work around this by just using a different register, the
1467 specific register of which I'm letting the compiler decide on. The "k" prefix is used to specify a 32-bit register. The {...} syntax is for
1468 supporting different assembly dialects.
1469
1470 What's basically happening is that we're saving and restoring the ebx register manually.
1471 */
1472 #if defined(DRFLAC_X86) && defined(__PIC__)
1473 __asm__ __volatile__ (
1474 "xchg{l} {%%}ebx, %k1;"
1475 "cpuid;"
1476 "xchg{l} {%%}ebx, %k1;"
1477 : "=a"(info[0]), "=&r"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
1478 );
1479 #else
1480 __asm__ __volatile__ (
1481 "cpuid" : "=a"(info[0]), "=b"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
1482 );
1483 #endif
1484 }
1485 #else
1486 #define DRFLAC_NO_CPUID
1487 #endif
1488 #endif
1489#else
1490 #define DRFLAC_NO_CPUID
1491#endif
1492
1493static DRFLAC_INLINE drflac_bool32 drflac_has_sse2(void)
1494{
1495#if defined(DRFLAC_SUPPORT_SSE2)
1496 #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE2)
1497 #if defined(DRFLAC_X64)
1498 return DRFLAC_TRUE; /* 64-bit targets always support SSE2. */
1499 #elif (defined(_M_IX86_FP) && _M_IX86_FP == 2) || defined(__SSE2__)
1500 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate SSE2 code we can assume support. */
1501 #else
1502 #if defined(DRFLAC_NO_CPUID)
1503 return DRFLAC_FALSE;
1504 #else
1505 int info[4];
1506 drflac__cpuid(info, 1);
1507 return (info[3] & (1 << 26)) != 0;
1508 #endif
1509 #endif
1510 #else
1511 return DRFLAC_FALSE; /* SSE2 is only supported on x86 and x64 architectures. */
1512 #endif
1513#else
1514 return DRFLAC_FALSE; /* No compiler support. */
1515#endif
1516}
1517
1518static DRFLAC_INLINE drflac_bool32 drflac_has_sse41(void)
1519{
1520#if defined(DRFLAC_SUPPORT_SSE41)
1521 #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE41)
1522 #if defined(DRFLAC_X64)
1523 return DRFLAC_TRUE; /* 64-bit targets always support SSE4.1. */
1524 #elif (defined(_M_IX86_FP) && _M_IX86_FP == 2) || defined(__SSE4_1__)
1525 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate SSE41 code we can assume support. */
1526 #else
1527 #if defined(DRFLAC_NO_CPUID)
1528 return DRFLAC_FALSE;
1529 #else
1530 int info[4];
1531 drflac__cpuid(info, 1);
1532 return (info[2] & (1 << 19)) != 0;
1533 #endif
1534 #endif
1535 #else
1536 return DRFLAC_FALSE; /* SSE41 is only supported on x86 and x64 architectures. */
1537 #endif
1538#else
1539 return DRFLAC_FALSE; /* No compiler support. */
1540#endif
1541}
1542
1543
1544#if defined(_MSC_VER) && _MSC_VER >= 1500 && (defined(DRFLAC_X86) || defined(DRFLAC_X64)) && !defined(__clang__)
1545 #define DRFLAC_HAS_LZCNT_INTRINSIC
1546#elif (defined(__GNUC__) && ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 7)))
1547 #define DRFLAC_HAS_LZCNT_INTRINSIC
1548#elif defined(__clang__)
1549 #if defined(__has_builtin)
1550 #if __has_builtin(__builtin_clzll) || __has_builtin(__builtin_clzl)
1551 #define DRFLAC_HAS_LZCNT_INTRINSIC
1552 #endif
1553 #endif
1554#endif
1555
1556#if defined(_MSC_VER) && _MSC_VER >= 1400 && !defined(__clang__)
1557 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1558 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1559 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1560#elif defined(__clang__)
1561 #if defined(__has_builtin)
1562 #if __has_builtin(__builtin_bswap16)
1563 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1564 #endif
1565 #if __has_builtin(__builtin_bswap32)
1566 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1567 #endif
1568 #if __has_builtin(__builtin_bswap64)
1569 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1570 #endif
1571 #endif
1572#elif defined(__GNUC__)
1573 #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))
1574 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1575 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1576 #endif
1577 #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8))
1578 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1579 #endif
1580#elif defined(__WATCOMC__) && defined(__386__)
1581 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1582 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1583 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1584 extern __inline drflac_uint16 _watcom_bswap16(drflac_uint16);
1585 extern __inline drflac_uint32 _watcom_bswap32(drflac_uint32);
1586 extern __inline drflac_uint64 _watcom_bswap64(drflac_uint64);
1587#pragma aux _watcom_bswap16 = \
1588 "xchg al, ah" \
1589 parm [ax] \
1590 modify [ax];
1591#pragma aux _watcom_bswap32 = \
1592 "bswap eax" \
1593 parm [eax] \
1594 modify [eax];
1595#pragma aux _watcom_bswap64 = \
1596 "bswap eax" \
1597 "bswap edx" \
1598 "xchg eax,edx" \
1599 parm [eax edx] \
1600 modify [eax edx];
1601#endif
1602
1603
1604/* Standard library stuff. */
1605#ifndef DRFLAC_ASSERT
1606#include <assert.h>
1607#define DRFLAC_ASSERT(expression) assert(expression)
1608#endif
1609#ifndef DRFLAC_MALLOC
1610#define DRFLAC_MALLOC(sz) malloc((sz))
1611#endif
1612#ifndef DRFLAC_REALLOC
1613#define DRFLAC_REALLOC(p, sz) realloc((p), (sz))
1614#endif
1615#ifndef DRFLAC_FREE
1616#define DRFLAC_FREE(p) free((p))
1617#endif
1618#ifndef DRFLAC_COPY_MEMORY
1619#define DRFLAC_COPY_MEMORY(dst, src, sz) memcpy((dst), (src), (sz))
1620#endif
1621#ifndef DRFLAC_ZERO_MEMORY
1622#define DRFLAC_ZERO_MEMORY(p, sz) memset((p), 0, (sz))
1623#endif
1624#ifndef DRFLAC_ZERO_OBJECT
1625#define DRFLAC_ZERO_OBJECT(p) DRFLAC_ZERO_MEMORY((p), sizeof(*(p)))
1626#endif
1627
1628#define DRFLAC_MAX_SIMD_VECTOR_SIZE 64 /* 64 for AVX-512 in the future. */
1629
1630typedef drflac_int32 drflac_result;
1631#define DRFLAC_SUCCESS 0
1632#define DRFLAC_ERROR -1 /* A generic error. */
1633#define DRFLAC_INVALID_ARGS -2
1634#define DRFLAC_INVALID_OPERATION -3
1635#define DRFLAC_OUT_OF_MEMORY -4
1636#define DRFLAC_OUT_OF_RANGE -5
1637#define DRFLAC_ACCESS_DENIED -6
1638#define DRFLAC_DOES_NOT_EXIST -7
1639#define DRFLAC_ALREADY_EXISTS -8
1640#define DRFLAC_TOO_MANY_OPEN_FILES -9
1641#define DRFLAC_INVALID_FILE -10
1642#define DRFLAC_TOO_BIG -11
1643#define DRFLAC_PATH_TOO_LONG -12
1644#define DRFLAC_NAME_TOO_LONG -13
1645#define DRFLAC_NOT_DIRECTORY -14
1646#define DRFLAC_IS_DIRECTORY -15
1647#define DRFLAC_DIRECTORY_NOT_EMPTY -16
1648#define DRFLAC_END_OF_FILE -17
1649#define DRFLAC_NO_SPACE -18
1650#define DRFLAC_BUSY -19
1651#define DRFLAC_IO_ERROR -20
1652#define DRFLAC_INTERRUPT -21
1653#define DRFLAC_UNAVAILABLE -22
1654#define DRFLAC_ALREADY_IN_USE -23
1655#define DRFLAC_BAD_ADDRESS -24
1656#define DRFLAC_BAD_SEEK -25
1657#define DRFLAC_BAD_PIPE -26
1658#define DRFLAC_DEADLOCK -27
1659#define DRFLAC_TOO_MANY_LINKS -28
1660#define DRFLAC_NOT_IMPLEMENTED -29
1661#define DRFLAC_NO_MESSAGE -30
1662#define DRFLAC_BAD_MESSAGE -31
1663#define DRFLAC_NO_DATA_AVAILABLE -32
1664#define DRFLAC_INVALID_DATA -33
1665#define DRFLAC_TIMEOUT -34
1666#define DRFLAC_NO_NETWORK -35
1667#define DRFLAC_NOT_UNIQUE -36
1668#define DRFLAC_NOT_SOCKET -37
1669#define DRFLAC_NO_ADDRESS -38
1670#define DRFLAC_BAD_PROTOCOL -39
1671#define DRFLAC_PROTOCOL_UNAVAILABLE -40
1672#define DRFLAC_PROTOCOL_NOT_SUPPORTED -41
1673#define DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED -42
1674#define DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED -43
1675#define DRFLAC_SOCKET_NOT_SUPPORTED -44
1676#define DRFLAC_CONNECTION_RESET -45
1677#define DRFLAC_ALREADY_CONNECTED -46
1678#define DRFLAC_NOT_CONNECTED -47
1679#define DRFLAC_CONNECTION_REFUSED -48
1680#define DRFLAC_NO_HOST -49
1681#define DRFLAC_IN_PROGRESS -50
1682#define DRFLAC_CANCELLED -51
1683#define DRFLAC_MEMORY_ALREADY_MAPPED -52
1684#define DRFLAC_AT_END -53
1685#define DRFLAC_CRC_MISMATCH -128
1686
1687#define DRFLAC_SUBFRAME_CONSTANT 0
1688#define DRFLAC_SUBFRAME_VERBATIM 1
1689#define DRFLAC_SUBFRAME_FIXED 8
1690#define DRFLAC_SUBFRAME_LPC 32
1691#define DRFLAC_SUBFRAME_RESERVED 255
1692
1693#define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE 0
1694#define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2 1
1695
1696#define DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT 0
1697#define DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE 8
1698#define DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE 9
1699#define DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE 10
1700
1701#define drflac_align(x, a) ((((x) + (a) - 1) / (a)) * (a))
1702
1703
1704DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision)
1705{
1706 if (pMajor) {
1707 *pMajor = DRFLAC_VERSION_MAJOR;
1708 }
1709
1710 if (pMinor) {
1711 *pMinor = DRFLAC_VERSION_MINOR;
1712 }
1713
1714 if (pRevision) {
1715 *pRevision = DRFLAC_VERSION_REVISION;
1716 }
1717}
1718
1719DRFLAC_API const char* drflac_version_string(void)
1720{
1721 return DRFLAC_VERSION_STRING;
1722}
1723
1724
1725/* CPU caps. */
1726#if defined(__has_feature)
1727 #if __has_feature(thread_sanitizer)
1728 #define DRFLAC_NO_THREAD_SANITIZE __attribute__((no_sanitize("thread")))
1729 #else
1730 #define DRFLAC_NO_THREAD_SANITIZE
1731 #endif
1732#else
1733 #define DRFLAC_NO_THREAD_SANITIZE
1734#endif
1735
1736#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
1737static drflac_bool32 drflac__gIsLZCNTSupported = DRFLAC_FALSE;
1738#endif
1739
1740#ifndef DRFLAC_NO_CPUID
1741static drflac_bool32 drflac__gIsSSE2Supported = DRFLAC_FALSE;
1742static drflac_bool32 drflac__gIsSSE41Supported = DRFLAC_FALSE;
1743
1744/*
1745I've had a bug report that Clang's ThreadSanitizer presents a warning in this function. Having reviewed this, this does
1746actually make sense. However, since CPU caps should never differ for a running process, I don't think the trade off of
1747complicating internal API's by passing around CPU caps versus just disabling the warnings is worthwhile. I'm therefore
1748just going to disable these warnings. This is disabled via the DRFLAC_NO_THREAD_SANITIZE attribute.
1749*/
1750DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
1751{
1752 static drflac_bool32 isCPUCapsInitialized = DRFLAC_FALSE;
1753
1754 if (!isCPUCapsInitialized) {
1755 /* LZCNT */
1756#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
1757 int info[4] = {0};
1758 drflac__cpuid(info, 0x80000001);
1759 drflac__gIsLZCNTSupported = (info[2] & (1 << 5)) != 0;
1760#endif
1761
1762 /* SSE2 */
1763 drflac__gIsSSE2Supported = drflac_has_sse2();
1764
1765 /* SSE4.1 */
1766 drflac__gIsSSE41Supported = drflac_has_sse41();
1767
1768 /* Initialized. */
1769 isCPUCapsInitialized = DRFLAC_TRUE;
1770 }
1771}
1772#else
1773static drflac_bool32 drflac__gIsNEONSupported = DRFLAC_FALSE;
1774
1775static DRFLAC_INLINE drflac_bool32 drflac__has_neon(void)
1776{
1777#if defined(DRFLAC_SUPPORT_NEON)
1778 #if defined(DRFLAC_ARM) && !defined(DRFLAC_NO_NEON)
1779 #if (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
1780 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate NEON code we can assume support. */
1781 #else
1782 /* TODO: Runtime check. */
1783 return DRFLAC_FALSE;
1784 #endif
1785 #else
1786 return DRFLAC_FALSE; /* NEON is only supported on ARM architectures. */
1787 #endif
1788#else
1789 return DRFLAC_FALSE; /* No compiler support. */
1790#endif
1791}
1792
1793DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
1794{
1795 drflac__gIsNEONSupported = drflac__has_neon();
1796
1797#if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
1798 drflac__gIsLZCNTSupported = DRFLAC_TRUE;
1799#endif
1800}
1801#endif
1802
1803
1804/* Endian Management */
1805static DRFLAC_INLINE drflac_bool32 drflac__is_little_endian(void)
1806{
1807#if defined(DRFLAC_X86) || defined(DRFLAC_X64)
1808 return DRFLAC_TRUE;
1809#elif defined(__BYTE_ORDER) && defined(__LITTLE_ENDIAN) && __BYTE_ORDER == __LITTLE_ENDIAN
1810 return DRFLAC_TRUE;
1811#else
1812 int n = 1;
1813 return (*(char*)&n) == 1;
1814#endif
1815}
1816
1817static DRFLAC_INLINE drflac_uint16 drflac__swap_endian_uint16(drflac_uint16 n)
1818{
1819#ifdef DRFLAC_HAS_BYTESWAP16_INTRINSIC
1820 #if defined(_MSC_VER) && !defined(__clang__)
1821 return _byteswap_ushort(n);
1822 #elif defined(__GNUC__) || defined(__clang__)
1823 return __builtin_bswap16(n);
1824 #elif defined(__WATCOMC__) && defined(__386__)
1825 return _watcom_bswap16(n);
1826 #else
1827 #error "This compiler does not support the byte swap intrinsic."
1828 #endif
1829#else
1830 return ((n & 0xFF00) >> 8) |
1831 ((n & 0x00FF) << 8);
1832#endif
1833}
1834
1835static DRFLAC_INLINE drflac_uint32 drflac__swap_endian_uint32(drflac_uint32 n)
1836{
1837#ifdef DRFLAC_HAS_BYTESWAP32_INTRINSIC
1838 #if defined(_MSC_VER) && !defined(__clang__)
1839 return _byteswap_ulong(n);
1840 #elif defined(__GNUC__) || defined(__clang__)
1841 #if defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 6) && !defined(DRFLAC_64BIT) /* <-- 64-bit inline assembly has not been tested, so disabling for now. */
1842 /* Inline assembly optimized implementation for ARM. In my testing, GCC does not generate optimized code with __builtin_bswap32(). */
1843 drflac_uint32 r;
1844 __asm__ __volatile__ (
1845 #if defined(DRFLAC_64BIT)
1846 "rev %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(n) /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
1847 #else
1848 "rev %[out], %[in]" : [out]"=r"(r) : [in]"r"(n)
1849 #endif
1850 );
1851 return r;
1852 #else
1853 return __builtin_bswap32(n);
1854 #endif
1855 #elif defined(__WATCOMC__) && defined(__386__)
1856 return _watcom_bswap32(n);
1857 #else
1858 #error "This compiler does not support the byte swap intrinsic."
1859 #endif
1860#else
1861 return ((n & 0xFF000000) >> 24) |
1862 ((n & 0x00FF0000) >> 8) |
1863 ((n & 0x0000FF00) << 8) |
1864 ((n & 0x000000FF) << 24);
1865#endif
1866}
1867
1868static DRFLAC_INLINE drflac_uint64 drflac__swap_endian_uint64(drflac_uint64 n)
1869{
1870#ifdef DRFLAC_HAS_BYTESWAP64_INTRINSIC
1871 #if defined(_MSC_VER) && !defined(__clang__)
1872 return _byteswap_uint64(n);
1873 #elif defined(__GNUC__) || defined(__clang__)
1874 return __builtin_bswap64(n);
1875 #elif defined(__WATCOMC__) && defined(__386__)
1876 return _watcom_bswap64(n);
1877 #else
1878 #error "This compiler does not support the byte swap intrinsic."
1879 #endif
1880#else
1881 /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
1882 return ((n & ((drflac_uint64)0xFF000000 << 32)) >> 56) |
1883 ((n & ((drflac_uint64)0x00FF0000 << 32)) >> 40) |
1884 ((n & ((drflac_uint64)0x0000FF00 << 32)) >> 24) |
1885 ((n & ((drflac_uint64)0x000000FF << 32)) >> 8) |
1886 ((n & ((drflac_uint64)0xFF000000 )) << 8) |
1887 ((n & ((drflac_uint64)0x00FF0000 )) << 24) |
1888 ((n & ((drflac_uint64)0x0000FF00 )) << 40) |
1889 ((n & ((drflac_uint64)0x000000FF )) << 56);
1890#endif
1891}
1892
1893
1894static DRFLAC_INLINE drflac_uint16 drflac__be2host_16(drflac_uint16 n)
1895{
1896 if (drflac__is_little_endian()) {
1897 return drflac__swap_endian_uint16(n);
1898 }
1899
1900 return n;
1901}
1902
1903static DRFLAC_INLINE drflac_uint32 drflac__be2host_32(drflac_uint32 n)
1904{
1905 if (drflac__is_little_endian()) {
1906 return drflac__swap_endian_uint32(n);
1907 }
1908
1909 return n;
1910}
1911
1912static DRFLAC_INLINE drflac_uint64 drflac__be2host_64(drflac_uint64 n)
1913{
1914 if (drflac__is_little_endian()) {
1915 return drflac__swap_endian_uint64(n);
1916 }
1917
1918 return n;
1919}
1920
1921
1922static DRFLAC_INLINE drflac_uint32 drflac__le2host_32(drflac_uint32 n)
1923{
1924 if (!drflac__is_little_endian()) {
1925 return drflac__swap_endian_uint32(n);
1926 }
1927
1928 return n;
1929}
1930
1931
1932static DRFLAC_INLINE drflac_uint32 drflac__unsynchsafe_32(drflac_uint32 n)
1933{
1934 drflac_uint32 result = 0;
1935 result |= (n & 0x7F000000) >> 3;
1936 result |= (n & 0x007F0000) >> 2;
1937 result |= (n & 0x00007F00) >> 1;
1938 result |= (n & 0x0000007F) >> 0;
1939
1940 return result;
1941}
1942
1943
1944
1945/* The CRC code below is based on this document: http://zlib.net/crc_v3.txt */
1946static drflac_uint8 drflac__crc8_table[] = {
1947 0x00, 0x07, 0x0E, 0x09, 0x1C, 0x1B, 0x12, 0x15, 0x38, 0x3F, 0x36, 0x31, 0x24, 0x23, 0x2A, 0x2D,
1948 0x70, 0x77, 0x7E, 0x79, 0x6C, 0x6B, 0x62, 0x65, 0x48, 0x4F, 0x46, 0x41, 0x54, 0x53, 0x5A, 0x5D,
1949 0xE0, 0xE7, 0xEE, 0xE9, 0xFC, 0xFB, 0xF2, 0xF5, 0xD8, 0xDF, 0xD6, 0xD1, 0xC4, 0xC3, 0xCA, 0xCD,
1950 0x90, 0x97, 0x9E, 0x99, 0x8C, 0x8B, 0x82, 0x85, 0xA8, 0xAF, 0xA6, 0xA1, 0xB4, 0xB3, 0xBA, 0xBD,
1951 0xC7, 0xC0, 0xC9, 0xCE, 0xDB, 0xDC, 0xD5, 0xD2, 0xFF, 0xF8, 0xF1, 0xF6, 0xE3, 0xE4, 0xED, 0xEA,
1952 0xB7, 0xB0, 0xB9, 0xBE, 0xAB, 0xAC, 0xA5, 0xA2, 0x8F, 0x88, 0x81, 0x86, 0x93, 0x94, 0x9D, 0x9A,
1953 0x27, 0x20, 0x29, 0x2E, 0x3B, 0x3C, 0x35, 0x32, 0x1F, 0x18, 0x11, 0x16, 0x03, 0x04, 0x0D, 0x0A,
1954 0x57, 0x50, 0x59, 0x5E, 0x4B, 0x4C, 0x45, 0x42, 0x6F, 0x68, 0x61, 0x66, 0x73, 0x74, 0x7D, 0x7A,
1955 0x89, 0x8E, 0x87, 0x80, 0x95, 0x92, 0x9B, 0x9C, 0xB1, 0xB6, 0xBF, 0xB8, 0xAD, 0xAA, 0xA3, 0xA4,
1956 0xF9, 0xFE, 0xF7, 0xF0, 0xE5, 0xE2, 0xEB, 0xEC, 0xC1, 0xC6, 0xCF, 0xC8, 0xDD, 0xDA, 0xD3, 0xD4,
1957 0x69, 0x6E, 0x67, 0x60, 0x75, 0x72, 0x7B, 0x7C, 0x51, 0x56, 0x5F, 0x58, 0x4D, 0x4A, 0x43, 0x44,
1958 0x19, 0x1E, 0x17, 0x10, 0x05, 0x02, 0x0B, 0x0C, 0x21, 0x26, 0x2F, 0x28, 0x3D, 0x3A, 0x33, 0x34,
1959 0x4E, 0x49, 0x40, 0x47, 0x52, 0x55, 0x5C, 0x5B, 0x76, 0x71, 0x78, 0x7F, 0x6A, 0x6D, 0x64, 0x63,
1960 0x3E, 0x39, 0x30, 0x37, 0x22, 0x25, 0x2C, 0x2B, 0x06, 0x01, 0x08, 0x0F, 0x1A, 0x1D, 0x14, 0x13,
1961 0xAE, 0xA9, 0xA0, 0xA7, 0xB2, 0xB5, 0xBC, 0xBB, 0x96, 0x91, 0x98, 0x9F, 0x8A, 0x8D, 0x84, 0x83,
1962 0xDE, 0xD9, 0xD0, 0xD7, 0xC2, 0xC5, 0xCC, 0xCB, 0xE6, 0xE1, 0xE8, 0xEF, 0xFA, 0xFD, 0xF4, 0xF3
1963};
1964
1965static drflac_uint16 drflac__crc16_table[] = {
1966 0x0000, 0x8005, 0x800F, 0x000A, 0x801B, 0x001E, 0x0014, 0x8011,
1967 0x8033, 0x0036, 0x003C, 0x8039, 0x0028, 0x802D, 0x8027, 0x0022,
1968 0x8063, 0x0066, 0x006C, 0x8069, 0x0078, 0x807D, 0x8077, 0x0072,
1969 0x0050, 0x8055, 0x805F, 0x005A, 0x804B, 0x004E, 0x0044, 0x8041,
1970 0x80C3, 0x00C6, 0x00CC, 0x80C9, 0x00D8, 0x80DD, 0x80D7, 0x00D2,
1971 0x00F0, 0x80F5, 0x80FF, 0x00FA, 0x80EB, 0x00EE, 0x00E4, 0x80E1,
1972 0x00A0, 0x80A5, 0x80AF, 0x00AA, 0x80BB, 0x00BE, 0x00B4, 0x80B1,
1973 0x8093, 0x0096, 0x009C, 0x8099, 0x0088, 0x808D, 0x8087, 0x0082,
1974 0x8183, 0x0186, 0x018C, 0x8189, 0x0198, 0x819D, 0x8197, 0x0192,
1975 0x01B0, 0x81B5, 0x81BF, 0x01BA, 0x81AB, 0x01AE, 0x01A4, 0x81A1,
1976 0x01E0, 0x81E5, 0x81EF, 0x01EA, 0x81FB, 0x01FE, 0x01F4, 0x81F1,
1977 0x81D3, 0x01D6, 0x01DC, 0x81D9, 0x01C8, 0x81CD, 0x81C7, 0x01C2,
1978 0x0140, 0x8145, 0x814F, 0x014A, 0x815B, 0x015E, 0x0154, 0x8151,
1979 0x8173, 0x0176, 0x017C, 0x8179, 0x0168, 0x816D, 0x8167, 0x0162,
1980 0x8123, 0x0126, 0x012C, 0x8129, 0x0138, 0x813D, 0x8137, 0x0132,
1981 0x0110, 0x8115, 0x811F, 0x011A, 0x810B, 0x010E, 0x0104, 0x8101,
1982 0x8303, 0x0306, 0x030C, 0x8309, 0x0318, 0x831D, 0x8317, 0x0312,
1983 0x0330, 0x8335, 0x833F, 0x033A, 0x832B, 0x032E, 0x0324, 0x8321,
1984 0x0360, 0x8365, 0x836F, 0x036A, 0x837B, 0x037E, 0x0374, 0x8371,
1985 0x8353, 0x0356, 0x035C, 0x8359, 0x0348, 0x834D, 0x8347, 0x0342,
1986 0x03C0, 0x83C5, 0x83CF, 0x03CA, 0x83DB, 0x03DE, 0x03D4, 0x83D1,
1987 0x83F3, 0x03F6, 0x03FC, 0x83F9, 0x03E8, 0x83ED, 0x83E7, 0x03E2,
1988 0x83A3, 0x03A6, 0x03AC, 0x83A9, 0x03B8, 0x83BD, 0x83B7, 0x03B2,
1989 0x0390, 0x8395, 0x839F, 0x039A, 0x838B, 0x038E, 0x0384, 0x8381,
1990 0x0280, 0x8285, 0x828F, 0x028A, 0x829B, 0x029E, 0x0294, 0x8291,
1991 0x82B3, 0x02B6, 0x02BC, 0x82B9, 0x02A8, 0x82AD, 0x82A7, 0x02A2,
1992 0x82E3, 0x02E6, 0x02EC, 0x82E9, 0x02F8, 0x82FD, 0x82F7, 0x02F2,
1993 0x02D0, 0x82D5, 0x82DF, 0x02DA, 0x82CB, 0x02CE, 0x02C4, 0x82C1,
1994 0x8243, 0x0246, 0x024C, 0x8249, 0x0258, 0x825D, 0x8257, 0x0252,
1995 0x0270, 0x8275, 0x827F, 0x027A, 0x826B, 0x026E, 0x0264, 0x8261,
1996 0x0220, 0x8225, 0x822F, 0x022A, 0x823B, 0x023E, 0x0234, 0x8231,
1997 0x8213, 0x0216, 0x021C, 0x8219, 0x0208, 0x820D, 0x8207, 0x0202
1998};
1999
2000static DRFLAC_INLINE drflac_uint8 drflac_crc8_byte(drflac_uint8 crc, drflac_uint8 data)
2001{
2002 return drflac__crc8_table[crc ^ data];
2003}
2004
2005static DRFLAC_INLINE drflac_uint8 drflac_crc8(drflac_uint8 crc, drflac_uint32 data, drflac_uint32 count)
2006{
2007#ifdef DR_FLAC_NO_CRC
2008 (void)crc;
2009 (void)data;
2010 (void)count;
2011 return 0;
2012#else
2013#if 0
2014 /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc8(crc, 0, 8);") */
2015 drflac_uint8 p = 0x07;
2016 for (int i = count-1; i >= 0; --i) {
2017 drflac_uint8 bit = (data & (1 << i)) >> i;
2018 if (crc & 0x80) {
2019 crc = ((crc << 1) | bit) ^ p;
2020 } else {
2021 crc = ((crc << 1) | bit);
2022 }
2023 }
2024 return crc;
2025#else
2026 drflac_uint32 wholeBytes;
2027 drflac_uint32 leftoverBits;
2028 drflac_uint64 leftoverDataMask;
2029
2030 static drflac_uint64 leftoverDataMaskTable[8] = {
2031 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2032 };
2033
2034 DRFLAC_ASSERT(count <= 32);
2035
2036 wholeBytes = count >> 3;
2037 leftoverBits = count - (wholeBytes*8);
2038 leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2039
2040 switch (wholeBytes) {
2041 case 4: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
2042 case 3: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
2043 case 2: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
2044 case 1: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
2045 case 0: if (leftoverBits > 0) crc = (drflac_uint8)((crc << leftoverBits) ^ drflac__crc8_table[(crc >> (8 - leftoverBits)) ^ (data & leftoverDataMask)]);
2046 }
2047 return crc;
2048#endif
2049#endif
2050}
2051
2052static DRFLAC_INLINE drflac_uint16 drflac_crc16_byte(drflac_uint16 crc, drflac_uint8 data)
2053{
2054 return (crc << 8) ^ drflac__crc16_table[(drflac_uint8)(crc >> 8) ^ data];
2055}
2056
2057static DRFLAC_INLINE drflac_uint16 drflac_crc16_cache(drflac_uint16 crc, drflac_cache_t data)
2058{
2059#ifdef DRFLAC_64BIT
2060 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
2061 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
2062 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
2063 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
2064#endif
2065 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
2066 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
2067 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 8) & 0xFF));
2068 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 0) & 0xFF));
2069
2070 return crc;
2071}
2072
2073static DRFLAC_INLINE drflac_uint16 drflac_crc16_bytes(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 byteCount)
2074{
2075 switch (byteCount)
2076 {
2077#ifdef DRFLAC_64BIT
2078 case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
2079 case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
2080 case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
2081 case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
2082#endif
2083 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
2084 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
2085 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 8) & 0xFF));
2086 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 0) & 0xFF));
2087 }
2088
2089 return crc;
2090}
2091
2092#if 0
2093static DRFLAC_INLINE drflac_uint16 drflac_crc16__32bit(drflac_uint16 crc, drflac_uint32 data, drflac_uint32 count)
2094{
2095#ifdef DR_FLAC_NO_CRC
2096 (void)crc;
2097 (void)data;
2098 (void)count;
2099 return 0;
2100#else
2101#if 0
2102 /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc16(crc, 0, 16);") */
2103 drflac_uint16 p = 0x8005;
2104 for (int i = count-1; i >= 0; --i) {
2105 drflac_uint16 bit = (data & (1ULL << i)) >> i;
2106 if (r & 0x8000) {
2107 r = ((r << 1) | bit) ^ p;
2108 } else {
2109 r = ((r << 1) | bit);
2110 }
2111 }
2112
2113 return crc;
2114#else
2115 drflac_uint32 wholeBytes;
2116 drflac_uint32 leftoverBits;
2117 drflac_uint64 leftoverDataMask;
2118
2119 static drflac_uint64 leftoverDataMaskTable[8] = {
2120 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2121 };
2122
2123 DRFLAC_ASSERT(count <= 64);
2124
2125 wholeBytes = count >> 3;
2126 leftoverBits = count & 7;
2127 leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2128
2129 switch (wholeBytes) {
2130 default:
2131 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
2132 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
2133 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
2134 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
2135 case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
2136 }
2137 return crc;
2138#endif
2139#endif
2140}
2141
2142static DRFLAC_INLINE drflac_uint16 drflac_crc16__64bit(drflac_uint16 crc, drflac_uint64 data, drflac_uint32 count)
2143{
2144#ifdef DR_FLAC_NO_CRC
2145 (void)crc;
2146 (void)data;
2147 (void)count;
2148 return 0;
2149#else
2150 drflac_uint32 wholeBytes;
2151 drflac_uint32 leftoverBits;
2152 drflac_uint64 leftoverDataMask;
2153
2154 static drflac_uint64 leftoverDataMaskTable[8] = {
2155 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2156 };
2157
2158 DRFLAC_ASSERT(count <= 64);
2159
2160 wholeBytes = count >> 3;
2161 leftoverBits = count & 7;
2162 leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2163
2164 switch (wholeBytes) {
2165 default:
2166 case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 << 32) << leftoverBits)) >> (56 + leftoverBits))); /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
2167 case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 << 32) << leftoverBits)) >> (48 + leftoverBits)));
2168 case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 << 32) << leftoverBits)) >> (40 + leftoverBits)));
2169 case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF << 32) << leftoverBits)) >> (32 + leftoverBits)));
2170 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 ) << leftoverBits)) >> (24 + leftoverBits)));
2171 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 ) << leftoverBits)) >> (16 + leftoverBits)));
2172 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 ) << leftoverBits)) >> ( 8 + leftoverBits)));
2173 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF ) << leftoverBits)) >> ( 0 + leftoverBits)));
2174 case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
2175 }
2176 return crc;
2177#endif
2178}
2179
2180
2181static DRFLAC_INLINE drflac_uint16 drflac_crc16(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 count)
2182{
2183#ifdef DRFLAC_64BIT
2184 return drflac_crc16__64bit(crc, data, count);
2185#else
2186 return drflac_crc16__32bit(crc, data, count);
2187#endif
2188}
2189#endif
2190
2191
2192#ifdef DRFLAC_64BIT
2193#define drflac__be2host__cache_line drflac__be2host_64
2194#else
2195#define drflac__be2host__cache_line drflac__be2host_32
2196#endif
2197
2198/*
2199BIT READING ATTEMPT #2
2200
2201This uses a 32- or 64-bit bit-shifted cache - as bits are read, the cache is shifted such that the first valid bit is sitting
2202on the most significant bit. It uses the notion of an L1 and L2 cache (borrowed from CPU architecture), where the L1 cache
2203is a 32- or 64-bit unsigned integer (depending on whether or not a 32- or 64-bit build is being compiled) and the L2 is an
2204array of "cache lines", with each cache line being the same size as the L1. The L2 is a buffer of about 4KB and is where data
2205from onRead() is read into.
2206*/
2207#define DRFLAC_CACHE_L1_SIZE_BYTES(bs) (sizeof((bs)->cache))
2208#define DRFLAC_CACHE_L1_SIZE_BITS(bs) (sizeof((bs)->cache)*8)
2209#define DRFLAC_CACHE_L1_BITS_REMAINING(bs) (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (bs)->consumedBits)
2210#define DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount) (~((~(drflac_cache_t)0) >> (_bitCount)))
2211#define DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, _bitCount) (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (_bitCount))
2212#define DRFLAC_CACHE_L1_SELECT(bs, _bitCount) (((bs)->cache) & DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount))
2213#define DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, _bitCount) (DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)))
2214#define DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, _bitCount)(DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> (DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)) & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1)))
2215#define DRFLAC_CACHE_L2_SIZE_BYTES(bs) (sizeof((bs)->cacheL2))
2216#define DRFLAC_CACHE_L2_LINE_COUNT(bs) (DRFLAC_CACHE_L2_SIZE_BYTES(bs) / sizeof((bs)->cacheL2[0]))
2217#define DRFLAC_CACHE_L2_LINES_REMAINING(bs) (DRFLAC_CACHE_L2_LINE_COUNT(bs) - (bs)->nextL2Line)
2218
2219
2220#ifndef DR_FLAC_NO_CRC
2221static DRFLAC_INLINE void drflac__reset_crc16(drflac_bs* bs)
2222{
2223 bs->crc16 = 0;
2224 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2225}
2226
2227static DRFLAC_INLINE void drflac__update_crc16(drflac_bs* bs)
2228{
2229 if (bs->crc16CacheIgnoredBytes == 0) {
2230 bs->crc16 = drflac_crc16_cache(bs->crc16, bs->crc16Cache);
2231 } else {
2232 bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache, DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bs->crc16CacheIgnoredBytes);
2233 bs->crc16CacheIgnoredBytes = 0;
2234 }
2235}
2236
2237static DRFLAC_INLINE drflac_uint16 drflac__flush_crc16(drflac_bs* bs)
2238{
2239 /* We should never be flushing in a situation where we are not aligned on a byte boundary. */
2240 DRFLAC_ASSERT((DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7) == 0);
2241
2242 /*
2243 The bits that were read from the L1 cache need to be accumulated. The number of bytes needing to be accumulated is determined
2244 by the number of bits that have been consumed.
2245 */
2246 if (DRFLAC_CACHE_L1_BITS_REMAINING(bs) == 0) {
2247 drflac__update_crc16(bs);
2248 } else {
2249 /* We only accumulate the consumed bits. */
2250 bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache >> DRFLAC_CACHE_L1_BITS_REMAINING(bs), (bs->consumedBits >> 3) - bs->crc16CacheIgnoredBytes);
2251
2252 /*
2253 The bits that we just accumulated should never be accumulated again. We need to keep track of how many bytes were accumulated
2254 so we can handle that later.
2255 */
2256 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2257 }
2258
2259 return bs->crc16;
2260}
2261#endif
2262
2263static DRFLAC_INLINE drflac_bool32 drflac__reload_l1_cache_from_l2(drflac_bs* bs)
2264{
2265 size_t bytesRead;
2266 size_t alignedL1LineCount;
2267
2268 /* Fast path. Try loading straight from L2. */
2269 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
2270 bs->cache = bs->cacheL2[bs->nextL2Line++];
2271 return DRFLAC_TRUE;
2272 }
2273
2274 /*
2275 If we get here it means we've run out of data in the L2 cache. We'll need to fetch more from the client, if there's
2276 any left.
2277 */
2278 if (bs->unalignedByteCount > 0) {
2279 return DRFLAC_FALSE; /* If we have any unaligned bytes it means there's no more aligned bytes left in the client. */
2280 }
2281
2282 bytesRead = bs->onRead(bs->pUserData, bs->cacheL2, DRFLAC_CACHE_L2_SIZE_BYTES(bs));
2283
2284 bs->nextL2Line = 0;
2285 if (bytesRead == DRFLAC_CACHE_L2_SIZE_BYTES(bs)) {
2286 bs->cache = bs->cacheL2[bs->nextL2Line++];
2287 return DRFLAC_TRUE;
2288 }
2289
2290
2291 /*
2292 If we get here it means we were unable to retrieve enough data to fill the entire L2 cache. It probably
2293 means we've just reached the end of the file. We need to move the valid data down to the end of the buffer
2294 and adjust the index of the next line accordingly. Also keep in mind that the L2 cache must be aligned to
2295 the size of the L1 so we'll need to seek backwards by any misaligned bytes.
2296 */
2297 alignedL1LineCount = bytesRead / DRFLAC_CACHE_L1_SIZE_BYTES(bs);
2298
2299 /* We need to keep track of any unaligned bytes for later use. */
2300 bs->unalignedByteCount = bytesRead - (alignedL1LineCount * DRFLAC_CACHE_L1_SIZE_BYTES(bs));
2301 if (bs->unalignedByteCount > 0) {
2302 bs->unalignedCache = bs->cacheL2[alignedL1LineCount];
2303 }
2304
2305 if (alignedL1LineCount > 0) {
2306 size_t offset = DRFLAC_CACHE_L2_LINE_COUNT(bs) - alignedL1LineCount;
2307 size_t i;
2308 for (i = alignedL1LineCount; i > 0; --i) {
2309 bs->cacheL2[i-1 + offset] = bs->cacheL2[i-1];
2310 }
2311
2312 bs->nextL2Line = (drflac_uint32)offset;
2313 bs->cache = bs->cacheL2[bs->nextL2Line++];
2314 return DRFLAC_TRUE;
2315 } else {
2316 /* If we get into this branch it means we weren't able to load any L1-aligned data. */
2317 bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs);
2318 return DRFLAC_FALSE;
2319 }
2320}
2321
2322static drflac_bool32 drflac__reload_cache(drflac_bs* bs)
2323{
2324 size_t bytesRead;
2325
2326#ifndef DR_FLAC_NO_CRC
2327 drflac__update_crc16(bs);
2328#endif
2329
2330 /* Fast path. Try just moving the next value in the L2 cache to the L1 cache. */
2331 if (drflac__reload_l1_cache_from_l2(bs)) {
2332 bs->cache = drflac__be2host__cache_line(bs->cache);
2333 bs->consumedBits = 0;
2334#ifndef DR_FLAC_NO_CRC
2335 bs->crc16Cache = bs->cache;
2336#endif
2337 return DRFLAC_TRUE;
2338 }
2339
2340 /* Slow path. */
2341
2342 /*
2343 If we get here it means we have failed to load the L1 cache from the L2. Likely we've just reached the end of the stream and the last
2344 few bytes did not meet the alignment requirements for the L2 cache. In this case we need to fall back to a slower path and read the
2345 data from the unaligned cache.
2346 */
2347 bytesRead = bs->unalignedByteCount;
2348 if (bytesRead == 0) {
2349 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs); /* <-- The stream has been exhausted, so marked the bits as consumed. */
2350 return DRFLAC_FALSE;
2351 }
2352
2353 DRFLAC_ASSERT(bytesRead < DRFLAC_CACHE_L1_SIZE_BYTES(bs));
2354 bs->consumedBits = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bytesRead) * 8;
2355
2356 bs->cache = drflac__be2host__cache_line(bs->unalignedCache);
2357 bs->cache &= DRFLAC_CACHE_L1_SELECTION_MASK(DRFLAC_CACHE_L1_BITS_REMAINING(bs)); /* <-- Make sure the consumed bits are always set to zero. Other parts of the library depend on this property. */
2358 bs->unalignedByteCount = 0; /* <-- At this point the unaligned bytes have been moved into the cache and we thus have no more unaligned bytes. */
2359
2360#ifndef DR_FLAC_NO_CRC
2361 bs->crc16Cache = bs->cache >> bs->consumedBits;
2362 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2363#endif
2364 return DRFLAC_TRUE;
2365}
2366
2367static void drflac__reset_cache(drflac_bs* bs)
2368{
2369 bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs); /* <-- This clears the L2 cache. */
2370 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs); /* <-- This clears the L1 cache. */
2371 bs->cache = 0;
2372 bs->unalignedByteCount = 0; /* <-- This clears the trailing unaligned bytes. */
2373 bs->unalignedCache = 0;
2374
2375#ifndef DR_FLAC_NO_CRC
2376 bs->crc16Cache = 0;
2377 bs->crc16CacheIgnoredBytes = 0;
2378#endif
2379}
2380
2381
2382static DRFLAC_INLINE drflac_bool32 drflac__read_uint32(drflac_bs* bs, unsigned int bitCount, drflac_uint32* pResultOut)
2383{
2384 DRFLAC_ASSERT(bs != NULL);
2385 DRFLAC_ASSERT(pResultOut != NULL);
2386 DRFLAC_ASSERT(bitCount > 0);
2387 DRFLAC_ASSERT(bitCount <= 32);
2388
2389 if (bs->consumedBits == DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2390 if (!drflac__reload_cache(bs)) {
2391 return DRFLAC_FALSE;
2392 }
2393 }
2394
2395 if (bitCount <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2396 /*
2397 If we want to load all 32-bits from a 32-bit cache we need to do it slightly differently because we can't do
2398 a 32-bit shift on a 32-bit integer. This will never be the case on 64-bit caches, so we can have a slightly
2399 more optimal solution for this.
2400 */
2401#ifdef DRFLAC_64BIT
2402 *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
2403 bs->consumedBits += bitCount;
2404 bs->cache <<= bitCount;
2405#else
2406 if (bitCount < DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2407 *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
2408 bs->consumedBits += bitCount;
2409 bs->cache <<= bitCount;
2410 } else {
2411 /* Cannot shift by 32-bits, so need to do it differently. */
2412 *pResultOut = (drflac_uint32)bs->cache;
2413 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs);
2414 bs->cache = 0;
2415 }
2416#endif
2417
2418 return DRFLAC_TRUE;
2419 } else {
2420 /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
2421 drflac_uint32 bitCountHi = DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2422 drflac_uint32 bitCountLo = bitCount - bitCountHi;
2423 drflac_uint32 resultHi;
2424
2425 DRFLAC_ASSERT(bitCountHi > 0);
2426 DRFLAC_ASSERT(bitCountHi < 32);
2427 resultHi = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountHi);
2428
2429 if (!drflac__reload_cache(bs)) {
2430 return DRFLAC_FALSE;
2431 }
2432
2433 *pResultOut = (resultHi << bitCountLo) | (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountLo);
2434 bs->consumedBits += bitCountLo;
2435 bs->cache <<= bitCountLo;
2436 return DRFLAC_TRUE;
2437 }
2438}
2439
2440static drflac_bool32 drflac__read_int32(drflac_bs* bs, unsigned int bitCount, drflac_int32* pResult)
2441{
2442 drflac_uint32 result;
2443
2444 DRFLAC_ASSERT(bs != NULL);
2445 DRFLAC_ASSERT(pResult != NULL);
2446 DRFLAC_ASSERT(bitCount > 0);
2447 DRFLAC_ASSERT(bitCount <= 32);
2448
2449 if (!drflac__read_uint32(bs, bitCount, &result)) {
2450 return DRFLAC_FALSE;
2451 }
2452
2453 /* Do not attempt to shift by 32 as it's undefined. */
2454 if (bitCount < 32) {
2455 drflac_uint32 signbit;
2456 signbit = ((result >> (bitCount-1)) & 0x01);
2457 result |= (~signbit + 1) << bitCount;
2458 }
2459
2460 *pResult = (drflac_int32)result;
2461 return DRFLAC_TRUE;
2462}
2463
2464#ifdef DRFLAC_64BIT
2465static drflac_bool32 drflac__read_uint64(drflac_bs* bs, unsigned int bitCount, drflac_uint64* pResultOut)
2466{
2467 drflac_uint32 resultHi;
2468 drflac_uint32 resultLo;
2469
2470 DRFLAC_ASSERT(bitCount <= 64);
2471 DRFLAC_ASSERT(bitCount > 32);
2472
2473 if (!drflac__read_uint32(bs, bitCount - 32, &resultHi)) {
2474 return DRFLAC_FALSE;
2475 }
2476
2477 if (!drflac__read_uint32(bs, 32, &resultLo)) {
2478 return DRFLAC_FALSE;
2479 }
2480
2481 *pResultOut = (((drflac_uint64)resultHi) << 32) | ((drflac_uint64)resultLo);
2482 return DRFLAC_TRUE;
2483}
2484#endif
2485
2486/* Function below is unused, but leaving it here in case I need to quickly add it again. */
2487#if 0
2488static drflac_bool32 drflac__read_int64(drflac_bs* bs, unsigned int bitCount, drflac_int64* pResultOut)
2489{
2490 drflac_uint64 result;
2491 drflac_uint64 signbit;
2492
2493 DRFLAC_ASSERT(bitCount <= 64);
2494
2495 if (!drflac__read_uint64(bs, bitCount, &result)) {
2496 return DRFLAC_FALSE;
2497 }
2498
2499 signbit = ((result >> (bitCount-1)) & 0x01);
2500 result |= (~signbit + 1) << bitCount;
2501
2502 *pResultOut = (drflac_int64)result;
2503 return DRFLAC_TRUE;
2504}
2505#endif
2506
2507static drflac_bool32 drflac__read_uint16(drflac_bs* bs, unsigned int bitCount, drflac_uint16* pResult)
2508{
2509 drflac_uint32 result;
2510
2511 DRFLAC_ASSERT(bs != NULL);
2512 DRFLAC_ASSERT(pResult != NULL);
2513 DRFLAC_ASSERT(bitCount > 0);
2514 DRFLAC_ASSERT(bitCount <= 16);
2515
2516 if (!drflac__read_uint32(bs, bitCount, &result)) {
2517 return DRFLAC_FALSE;
2518 }
2519
2520 *pResult = (drflac_uint16)result;
2521 return DRFLAC_TRUE;
2522}
2523
2524#if 0
2525static drflac_bool32 drflac__read_int16(drflac_bs* bs, unsigned int bitCount, drflac_int16* pResult)
2526{
2527 drflac_int32 result;
2528
2529 DRFLAC_ASSERT(bs != NULL);
2530 DRFLAC_ASSERT(pResult != NULL);
2531 DRFLAC_ASSERT(bitCount > 0);
2532 DRFLAC_ASSERT(bitCount <= 16);
2533
2534 if (!drflac__read_int32(bs, bitCount, &result)) {
2535 return DRFLAC_FALSE;
2536 }
2537
2538 *pResult = (drflac_int16)result;
2539 return DRFLAC_TRUE;
2540}
2541#endif
2542
2543static drflac_bool32 drflac__read_uint8(drflac_bs* bs, unsigned int bitCount, drflac_uint8* pResult)
2544{
2545 drflac_uint32 result;
2546
2547 DRFLAC_ASSERT(bs != NULL);
2548 DRFLAC_ASSERT(pResult != NULL);
2549 DRFLAC_ASSERT(bitCount > 0);
2550 DRFLAC_ASSERT(bitCount <= 8);
2551
2552 if (!drflac__read_uint32(bs, bitCount, &result)) {
2553 return DRFLAC_FALSE;
2554 }
2555
2556 *pResult = (drflac_uint8)result;
2557 return DRFLAC_TRUE;
2558}
2559
2560static drflac_bool32 drflac__read_int8(drflac_bs* bs, unsigned int bitCount, drflac_int8* pResult)
2561{
2562 drflac_int32 result;
2563
2564 DRFLAC_ASSERT(bs != NULL);
2565 DRFLAC_ASSERT(pResult != NULL);
2566 DRFLAC_ASSERT(bitCount > 0);
2567 DRFLAC_ASSERT(bitCount <= 8);
2568
2569 if (!drflac__read_int32(bs, bitCount, &result)) {
2570 return DRFLAC_FALSE;
2571 }
2572
2573 *pResult = (drflac_int8)result;
2574 return DRFLAC_TRUE;
2575}
2576
2577
2578static drflac_bool32 drflac__seek_bits(drflac_bs* bs, size_t bitsToSeek)
2579{
2580 if (bitsToSeek <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2581 bs->consumedBits += (drflac_uint32)bitsToSeek;
2582 bs->cache <<= bitsToSeek;
2583 return DRFLAC_TRUE;
2584 } else {
2585 /* It straddles the cached data. This function isn't called too frequently so I'm favouring simplicity here. */
2586 bitsToSeek -= DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2587 bs->consumedBits += DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2588 bs->cache = 0;
2589
2590 /* Simple case. Seek in groups of the same number as bits that fit within a cache line. */
2591#ifdef DRFLAC_64BIT
2592 while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2593 drflac_uint64 bin;
2594 if (!drflac__read_uint64(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
2595 return DRFLAC_FALSE;
2596 }
2597 bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
2598 }
2599#else
2600 while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2601 drflac_uint32 bin;
2602 if (!drflac__read_uint32(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
2603 return DRFLAC_FALSE;
2604 }
2605 bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
2606 }
2607#endif
2608
2609 /* Whole leftover bytes. */
2610 while (bitsToSeek >= 8) {
2611 drflac_uint8 bin;
2612 if (!drflac__read_uint8(bs, 8, &bin)) {
2613 return DRFLAC_FALSE;
2614 }
2615 bitsToSeek -= 8;
2616 }
2617
2618 /* Leftover bits. */
2619 if (bitsToSeek > 0) {
2620 drflac_uint8 bin;
2621 if (!drflac__read_uint8(bs, (drflac_uint32)bitsToSeek, &bin)) {
2622 return DRFLAC_FALSE;
2623 }
2624 bitsToSeek = 0; /* <-- Necessary for the assert below. */
2625 }
2626
2627 DRFLAC_ASSERT(bitsToSeek == 0);
2628 return DRFLAC_TRUE;
2629 }
2630}
2631
2632
2633/* This function moves the bit streamer to the first bit after the sync code (bit 15 of the of the frame header). It will also update the CRC-16. */
2634static drflac_bool32 drflac__find_and_seek_to_next_sync_code(drflac_bs* bs)
2635{
2636 DRFLAC_ASSERT(bs != NULL);
2637
2638 /*
2639 The sync code is always aligned to 8 bits. This is convenient for us because it means we can do byte-aligned movements. The first
2640 thing to do is align to the next byte.
2641 */
2642 if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
2643 return DRFLAC_FALSE;
2644 }
2645
2646 for (;;) {
2647 drflac_uint8 hi;
2648
2649#ifndef DR_FLAC_NO_CRC
2650 drflac__reset_crc16(bs);
2651#endif
2652
2653 if (!drflac__read_uint8(bs, 8, &hi)) {
2654 return DRFLAC_FALSE;
2655 }
2656
2657 if (hi == 0xFF) {
2658 drflac_uint8 lo;
2659 if (!drflac__read_uint8(bs, 6, &lo)) {
2660 return DRFLAC_FALSE;
2661 }
2662
2663 if (lo == 0x3E) {
2664 return DRFLAC_TRUE;
2665 } else {
2666 if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
2667 return DRFLAC_FALSE;
2668 }
2669 }
2670 }
2671 }
2672
2673 /* Should never get here. */
2674 /*return DRFLAC_FALSE;*/
2675}
2676
2677
2678#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
2679#define DRFLAC_IMPLEMENT_CLZ_LZCNT
2680#endif
2681#if defined(_MSC_VER) && _MSC_VER >= 1400 && (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(__clang__)
2682#define DRFLAC_IMPLEMENT_CLZ_MSVC
2683#endif
2684#if defined(__WATCOMC__) && defined(__386__)
2685#define DRFLAC_IMPLEMENT_CLZ_WATCOM
2686#endif
2687
2688static DRFLAC_INLINE drflac_uint32 drflac__clz_software(drflac_cache_t x)
2689{
2690 drflac_uint32 n;
2691 static drflac_uint32 clz_table_4[] = {
2692 0,
2693 4,
2694 3, 3,
2695 2, 2, 2, 2,
2696 1, 1, 1, 1, 1, 1, 1, 1
2697 };
2698
2699 if (x == 0) {
2700 return sizeof(x)*8;
2701 }
2702
2703 n = clz_table_4[x >> (sizeof(x)*8 - 4)];
2704 if (n == 0) {
2705#ifdef DRFLAC_64BIT
2706 if ((x & ((drflac_uint64)0xFFFFFFFF << 32)) == 0) { n = 32; x <<= 32; }
2707 if ((x & ((drflac_uint64)0xFFFF0000 << 32)) == 0) { n += 16; x <<= 16; }
2708 if ((x & ((drflac_uint64)0xFF000000 << 32)) == 0) { n += 8; x <<= 8; }
2709 if ((x & ((drflac_uint64)0xF0000000 << 32)) == 0) { n += 4; x <<= 4; }
2710#else
2711 if ((x & 0xFFFF0000) == 0) { n = 16; x <<= 16; }
2712 if ((x & 0xFF000000) == 0) { n += 8; x <<= 8; }
2713 if ((x & 0xF0000000) == 0) { n += 4; x <<= 4; }
2714#endif
2715 n += clz_table_4[x >> (sizeof(x)*8 - 4)];
2716 }
2717
2718 return n - 1;
2719}
2720
2721#ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
2722static DRFLAC_INLINE drflac_bool32 drflac__is_lzcnt_supported(void)
2723{
2724 /* Fast compile time check for ARM. */
2725#if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
2726 return DRFLAC_TRUE;
2727#else
2728 /* If the compiler itself does not support the intrinsic then we'll need to return false. */
2729 #ifdef DRFLAC_HAS_LZCNT_INTRINSIC
2730 return drflac__gIsLZCNTSupported;
2731 #else
2732 return DRFLAC_FALSE;
2733 #endif
2734#endif
2735}
2736
2737static DRFLAC_INLINE drflac_uint32 drflac__clz_lzcnt(drflac_cache_t x)
2738{
2739 /*
2740 It's critical for competitive decoding performance that this function be highly optimal. With MSVC we can use the __lzcnt64() and __lzcnt() intrinsics
2741 to achieve good performance, however on GCC and Clang it's a little bit more annoying. The __builtin_clzl() and __builtin_clzll() intrinsics leave
2742 it undefined as to the return value when `x` is 0. We need this to be well defined as returning 32 or 64, depending on whether or not it's a 32- or
2743 64-bit build. To work around this we would need to add a conditional to check for the x = 0 case, but this creates unnecessary inefficiency. To work
2744 around this problem I have written some inline assembly to emit the LZCNT (x86) or CLZ (ARM) instruction directly which removes the need to include
2745 the conditional. This has worked well in the past, but for some reason Clang's MSVC compatible driver, clang-cl, does not seem to be handling this
2746 in the same way as the normal Clang driver. It seems that `clang-cl` is just outputting the wrong results sometimes, maybe due to some register
2747 getting clobbered?
2748
2749 I'm not sure if this is a bug with dr_flac's inlined assembly (most likely), a bug in `clang-cl` or just a misunderstanding on my part with inline
2750 assembly rules for `clang-cl`. If somebody can identify an error in dr_flac's inlined assembly I'm happy to get that fixed.
2751
2752 Fortunately there is an easy workaround for this. Clang implements MSVC-specific intrinsics for compatibility. It also defines _MSC_VER for extra
2753 compatibility. We can therefore just check for _MSC_VER and use the MSVC intrinsic which, fortunately for us, Clang supports. It would still be nice
2754 to know how to fix the inlined assembly for correctness sake, however.
2755 */
2756
2757#if defined(_MSC_VER) /*&& !defined(__clang__)*/ /* <-- Intentionally wanting Clang to use the MSVC __lzcnt64/__lzcnt intrinsics due to above ^. */
2758 #ifdef DRFLAC_64BIT
2759 return (drflac_uint32)__lzcnt64(x);
2760 #else
2761 return (drflac_uint32)__lzcnt(x);
2762 #endif
2763#else
2764 #if defined(__GNUC__) || defined(__clang__)
2765 #if defined(DRFLAC_X64)
2766 {
2767 drflac_uint64 r;
2768 __asm__ __volatile__ (
2769 "lzcnt{ %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
2770 );
2771
2772 return (drflac_uint32)r;
2773 }
2774 #elif defined(DRFLAC_X86)
2775 {
2776 drflac_uint32 r;
2777 __asm__ __volatile__ (
2778 "lzcnt{l %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
2779 );
2780
2781 return r;
2782 }
2783 #elif defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5) && !defined(DRFLAC_64BIT) /* <-- I haven't tested 64-bit inline assembly, so only enabling this for the 32-bit build for now. */
2784 {
2785 unsigned int r;
2786 __asm__ __volatile__ (
2787 #if defined(DRFLAC_64BIT)
2788 "clz %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(x) /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
2789 #else
2790 "clz %[out], %[in]" : [out]"=r"(r) : [in]"r"(x)
2791 #endif
2792 );
2793
2794 return r;
2795 }
2796 #else
2797 if (x == 0) {
2798 return sizeof(x)*8;
2799 }
2800 #ifdef DRFLAC_64BIT
2801 return (drflac_uint32)__builtin_clzll((drflac_uint64)x);
2802 #else
2803 return (drflac_uint32)__builtin_clzl((drflac_uint32)x);
2804 #endif
2805 #endif
2806 #else
2807 /* Unsupported compiler. */
2808 #error "This compiler does not support the lzcnt intrinsic."
2809 #endif
2810#endif
2811}
2812#endif
2813
2814#ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
2815#include <intrin.h> /* For BitScanReverse(). */
2816
2817static DRFLAC_INLINE drflac_uint32 drflac__clz_msvc(drflac_cache_t x)
2818{
2819 drflac_uint32 n;
2820
2821 if (x == 0) {
2822 return sizeof(x)*8;
2823 }
2824
2825#ifdef DRFLAC_64BIT
2826 _BitScanReverse64((unsigned long*)&n, x);
2827#else
2828 _BitScanReverse((unsigned long*)&n, x);
2829#endif
2830 return sizeof(x)*8 - n - 1;
2831}
2832#endif
2833
2834#ifdef DRFLAC_IMPLEMENT_CLZ_WATCOM
2835static __inline drflac_uint32 drflac__clz_watcom (drflac_uint32);
2836#pragma aux drflac__clz_watcom = \
2837 "bsr eax, eax" \
2838 "xor eax, 31" \
2839 parm [eax] nomemory \
2840 value [eax] \
2841 modify exact [eax] nomemory;
2842#endif
2843
2844static DRFLAC_INLINE drflac_uint32 drflac__clz(drflac_cache_t x)
2845{
2846#ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
2847 if (drflac__is_lzcnt_supported()) {
2848 return drflac__clz_lzcnt(x);
2849 } else
2850#endif
2851 {
2852#ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
2853 return drflac__clz_msvc(x);
2854#elif defined(DRFLAC_IMPLEMENT_CLZ_WATCOM)
2855 return (x == 0) ? sizeof(x)*8 : drflac__clz_watcom(x);
2856#else
2857 return drflac__clz_software(x);
2858#endif
2859 }
2860}
2861
2862
2863static DRFLAC_INLINE drflac_bool32 drflac__seek_past_next_set_bit(drflac_bs* bs, unsigned int* pOffsetOut)
2864{
2865 drflac_uint32 zeroCounter = 0;
2866 drflac_uint32 setBitOffsetPlus1;
2867
2868 while (bs->cache == 0) {
2869 zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2870 if (!drflac__reload_cache(bs)) {
2871 return DRFLAC_FALSE;
2872 }
2873 }
2874
2875 setBitOffsetPlus1 = drflac__clz(bs->cache);
2876 setBitOffsetPlus1 += 1;
2877
2878 bs->consumedBits += setBitOffsetPlus1;
2879 bs->cache <<= setBitOffsetPlus1;
2880
2881 *pOffsetOut = zeroCounter + setBitOffsetPlus1 - 1;
2882 return DRFLAC_TRUE;
2883}
2884
2885
2886
2887static drflac_bool32 drflac__seek_to_byte(drflac_bs* bs, drflac_uint64 offsetFromStart)
2888{
2889 DRFLAC_ASSERT(bs != NULL);
2890 DRFLAC_ASSERT(offsetFromStart > 0);
2891
2892 /*
2893 Seeking from the start is not quite as trivial as it sounds because the onSeek callback takes a signed 32-bit integer (which
2894 is intentional because it simplifies the implementation of the onSeek callbacks), however offsetFromStart is unsigned 64-bit.
2895 To resolve we just need to do an initial seek from the start, and then a series of offset seeks to make up the remainder.
2896 */
2897 if (offsetFromStart > 0x7FFFFFFF) {
2898 drflac_uint64 bytesRemaining = offsetFromStart;
2899 if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) {
2900 return DRFLAC_FALSE;
2901 }
2902 bytesRemaining -= 0x7FFFFFFF;
2903
2904 while (bytesRemaining > 0x7FFFFFFF) {
2905 if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) {
2906 return DRFLAC_FALSE;
2907 }
2908 bytesRemaining -= 0x7FFFFFFF;
2909 }
2910
2911 if (bytesRemaining > 0) {
2912 if (!bs->onSeek(bs->pUserData, (int)bytesRemaining, drflac_seek_origin_current)) {
2913 return DRFLAC_FALSE;
2914 }
2915 }
2916 } else {
2917 if (!bs->onSeek(bs->pUserData, (int)offsetFromStart, drflac_seek_origin_start)) {
2918 return DRFLAC_FALSE;
2919 }
2920 }
2921
2922 /* The cache should be reset to force a reload of fresh data from the client. */
2923 drflac__reset_cache(bs);
2924 return DRFLAC_TRUE;
2925}
2926
2927
2928static drflac_result drflac__read_utf8_coded_number(drflac_bs* bs, drflac_uint64* pNumberOut, drflac_uint8* pCRCOut)
2929{
2930 drflac_uint8 crc;
2931 drflac_uint64 result;
2932 drflac_uint8 utf8[7] = {0};
2933 int byteCount;
2934 int i;
2935
2936 DRFLAC_ASSERT(bs != NULL);
2937 DRFLAC_ASSERT(pNumberOut != NULL);
2938 DRFLAC_ASSERT(pCRCOut != NULL);
2939
2940 crc = *pCRCOut;
2941
2942 if (!drflac__read_uint8(bs, 8, utf8)) {
2943 *pNumberOut = 0;
2944 return DRFLAC_AT_END;
2945 }
2946 crc = drflac_crc8(crc, utf8[0], 8);
2947
2948 if ((utf8[0] & 0x80) == 0) {
2949 *pNumberOut = utf8[0];
2950 *pCRCOut = crc;
2951 return DRFLAC_SUCCESS;
2952 }
2953
2954 /*byteCount = 1;*/
2955 if ((utf8[0] & 0xE0) == 0xC0) {
2956 byteCount = 2;
2957 } else if ((utf8[0] & 0xF0) == 0xE0) {
2958 byteCount = 3;
2959 } else if ((utf8[0] & 0xF8) == 0xF0) {
2960 byteCount = 4;
2961 } else if ((utf8[0] & 0xFC) == 0xF8) {
2962 byteCount = 5;
2963 } else if ((utf8[0] & 0xFE) == 0xFC) {
2964 byteCount = 6;
2965 } else if ((utf8[0] & 0xFF) == 0xFE) {
2966 byteCount = 7;
2967 } else {
2968 *pNumberOut = 0;
2969 return DRFLAC_CRC_MISMATCH; /* Bad UTF-8 encoding. */
2970 }
2971
2972 /* Read extra bytes. */
2973 DRFLAC_ASSERT(byteCount > 1);
2974
2975 result = (drflac_uint64)(utf8[0] & (0xFF >> (byteCount + 1)));
2976 for (i = 1; i < byteCount; ++i) {
2977 if (!drflac__read_uint8(bs, 8, utf8 + i)) {
2978 *pNumberOut = 0;
2979 return DRFLAC_AT_END;
2980 }
2981 crc = drflac_crc8(crc, utf8[i], 8);
2982
2983 result = (result << 6) | (utf8[i] & 0x3F);
2984 }
2985
2986 *pNumberOut = result;
2987 *pCRCOut = crc;
2988 return DRFLAC_SUCCESS;
2989}
2990
2991
2992
2993/*
2994The next two functions are responsible for calculating the prediction.
2995
2996When the bits per sample is >16 we need to use 64-bit integer arithmetic because otherwise we'll run out of precision. It's
2997safe to assume this will be slower on 32-bit platforms so we use a more optimal solution when the bits per sample is <=16.
2998*/
2999static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_32(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
3000{
3001 drflac_int32 prediction = 0;
3002
3003 DRFLAC_ASSERT(order <= 32);
3004
3005 /* 32-bit version. */
3006
3007 /* VC++ optimizes this to a single jmp. I've not yet verified this for other compilers. */
3008 switch (order)
3009 {
3010 case 32: prediction += coefficients[31] * pDecodedSamples[-32];
3011 case 31: prediction += coefficients[30] * pDecodedSamples[-31];
3012 case 30: prediction += coefficients[29] * pDecodedSamples[-30];
3013 case 29: prediction += coefficients[28] * pDecodedSamples[-29];
3014 case 28: prediction += coefficients[27] * pDecodedSamples[-28];
3015 case 27: prediction += coefficients[26] * pDecodedSamples[-27];
3016 case 26: prediction += coefficients[25] * pDecodedSamples[-26];
3017 case 25: prediction += coefficients[24] * pDecodedSamples[-25];
3018 case 24: prediction += coefficients[23] * pDecodedSamples[-24];
3019 case 23: prediction += coefficients[22] * pDecodedSamples[-23];
3020 case 22: prediction += coefficients[21] * pDecodedSamples[-22];
3021 case 21: prediction += coefficients[20] * pDecodedSamples[-21];
3022 case 20: prediction += coefficients[19] * pDecodedSamples[-20];
3023 case 19: prediction += coefficients[18] * pDecodedSamples[-19];
3024 case 18: prediction += coefficients[17] * pDecodedSamples[-18];
3025 case 17: prediction += coefficients[16] * pDecodedSamples[-17];
3026 case 16: prediction += coefficients[15] * pDecodedSamples[-16];
3027 case 15: prediction += coefficients[14] * pDecodedSamples[-15];
3028 case 14: prediction += coefficients[13] * pDecodedSamples[-14];
3029 case 13: prediction += coefficients[12] * pDecodedSamples[-13];
3030 case 12: prediction += coefficients[11] * pDecodedSamples[-12];
3031 case 11: prediction += coefficients[10] * pDecodedSamples[-11];
3032 case 10: prediction += coefficients[ 9] * pDecodedSamples[-10];
3033 case 9: prediction += coefficients[ 8] * pDecodedSamples[- 9];
3034 case 8: prediction += coefficients[ 7] * pDecodedSamples[- 8];
3035 case 7: prediction += coefficients[ 6] * pDecodedSamples[- 7];
3036 case 6: prediction += coefficients[ 5] * pDecodedSamples[- 6];
3037 case 5: prediction += coefficients[ 4] * pDecodedSamples[- 5];
3038 case 4: prediction += coefficients[ 3] * pDecodedSamples[- 4];
3039 case 3: prediction += coefficients[ 2] * pDecodedSamples[- 3];
3040 case 2: prediction += coefficients[ 1] * pDecodedSamples[- 2];
3041 case 1: prediction += coefficients[ 0] * pDecodedSamples[- 1];
3042 }
3043
3044 return (drflac_int32)(prediction >> shift);
3045}
3046
3047static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_64(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
3048{
3049 drflac_int64 prediction;
3050
3051 DRFLAC_ASSERT(order <= 32);
3052
3053 /* 64-bit version. */
3054
3055 /* This method is faster on the 32-bit build when compiling with VC++. See note below. */
3056#ifndef DRFLAC_64BIT
3057 if (order == 8)
3058 {
3059 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3060 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3061 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3062 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3063 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3064 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3065 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3066 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3067 }
3068 else if (order == 7)
3069 {
3070 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3071 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3072 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3073 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3074 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3075 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3076 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3077 }
3078 else if (order == 3)
3079 {
3080 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3081 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3082 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3083 }
3084 else if (order == 6)
3085 {
3086 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3087 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3088 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3089 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3090 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3091 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3092 }
3093 else if (order == 5)
3094 {
3095 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3096 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3097 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3098 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3099 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3100 }
3101 else if (order == 4)
3102 {
3103 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3104 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3105 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3106 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3107 }
3108 else if (order == 12)
3109 {
3110 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3111 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3112 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3113 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3114 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3115 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3116 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3117 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3118 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3119 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
3120 prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3121 prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
3122 }
3123 else if (order == 2)
3124 {
3125 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3126 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3127 }
3128 else if (order == 1)
3129 {
3130 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3131 }
3132 else if (order == 10)
3133 {
3134 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3135 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3136 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3137 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3138 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3139 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3140 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3141 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3142 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3143 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
3144 }
3145 else if (order == 9)
3146 {
3147 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3148 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3149 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3150 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3151 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3152 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3153 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3154 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3155 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3156 }
3157 else if (order == 11)
3158 {
3159 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3160 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3161 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3162 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3163 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3164 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3165 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3166 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3167 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3168 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
3169 prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3170 }
3171 else
3172 {
3173 int j;
3174
3175 prediction = 0;
3176 for (j = 0; j < (int)order; ++j) {
3177 prediction += coefficients[j] * (drflac_int64)pDecodedSamples[-j-1];
3178 }
3179 }
3180#endif
3181
3182 /*
3183 VC++ optimizes this to a single jmp instruction, but only the 64-bit build. The 32-bit build generates less efficient code for some
3184 reason. The ugly version above is faster so we'll just switch between the two depending on the target platform.
3185 */
3186#ifdef DRFLAC_64BIT
3187 prediction = 0;
3188 switch (order)
3189 {
3190 case 32: prediction += coefficients[31] * (drflac_int64)pDecodedSamples[-32];
3191 case 31: prediction += coefficients[30] * (drflac_int64)pDecodedSamples[-31];
3192 case 30: prediction += coefficients[29] * (drflac_int64)pDecodedSamples[-30];
3193 case 29: prediction += coefficients[28] * (drflac_int64)pDecodedSamples[-29];
3194 case 28: prediction += coefficients[27] * (drflac_int64)pDecodedSamples[-28];
3195 case 27: prediction += coefficients[26] * (drflac_int64)pDecodedSamples[-27];
3196 case 26: prediction += coefficients[25] * (drflac_int64)pDecodedSamples[-26];
3197 case 25: prediction += coefficients[24] * (drflac_int64)pDecodedSamples[-25];
3198 case 24: prediction += coefficients[23] * (drflac_int64)pDecodedSamples[-24];
3199 case 23: prediction += coefficients[22] * (drflac_int64)pDecodedSamples[-23];
3200 case 22: prediction += coefficients[21] * (drflac_int64)pDecodedSamples[-22];
3201 case 21: prediction += coefficients[20] * (drflac_int64)pDecodedSamples[-21];
3202 case 20: prediction += coefficients[19] * (drflac_int64)pDecodedSamples[-20];
3203 case 19: prediction += coefficients[18] * (drflac_int64)pDecodedSamples[-19];
3204 case 18: prediction += coefficients[17] * (drflac_int64)pDecodedSamples[-18];
3205 case 17: prediction += coefficients[16] * (drflac_int64)pDecodedSamples[-17];
3206 case 16: prediction += coefficients[15] * (drflac_int64)pDecodedSamples[-16];
3207 case 15: prediction += coefficients[14] * (drflac_int64)pDecodedSamples[-15];
3208 case 14: prediction += coefficients[13] * (drflac_int64)pDecodedSamples[-14];
3209 case 13: prediction += coefficients[12] * (drflac_int64)pDecodedSamples[-13];
3210 case 12: prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
3211 case 11: prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3212 case 10: prediction += coefficients[ 9] * (drflac_int64)pDecodedSamples[-10];
3213 case 9: prediction += coefficients[ 8] * (drflac_int64)pDecodedSamples[- 9];
3214 case 8: prediction += coefficients[ 7] * (drflac_int64)pDecodedSamples[- 8];
3215 case 7: prediction += coefficients[ 6] * (drflac_int64)pDecodedSamples[- 7];
3216 case 6: prediction += coefficients[ 5] * (drflac_int64)pDecodedSamples[- 6];
3217 case 5: prediction += coefficients[ 4] * (drflac_int64)pDecodedSamples[- 5];
3218 case 4: prediction += coefficients[ 3] * (drflac_int64)pDecodedSamples[- 4];
3219 case 3: prediction += coefficients[ 2] * (drflac_int64)pDecodedSamples[- 3];
3220 case 2: prediction += coefficients[ 1] * (drflac_int64)pDecodedSamples[- 2];
3221 case 1: prediction += coefficients[ 0] * (drflac_int64)pDecodedSamples[- 1];
3222 }
3223#endif
3224
3225 return (drflac_int32)(prediction >> shift);
3226}
3227
3228
3229#if 0
3230/*
3231Reference implementation for reading and decoding samples with residual. This is intentionally left unoptimized for the
3232sake of readability and should only be used as a reference.
3233*/
3234static drflac_bool32 drflac__decode_samples_with_residual__rice__reference(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3235{
3236 drflac_uint32 i;
3237
3238 DRFLAC_ASSERT(bs != NULL);
3239 DRFLAC_ASSERT(pSamplesOut != NULL);
3240
3241 for (i = 0; i < count; ++i) {
3242 drflac_uint32 zeroCounter = 0;
3243 for (;;) {
3244 drflac_uint8 bit;
3245 if (!drflac__read_uint8(bs, 1, &bit)) {
3246 return DRFLAC_FALSE;
3247 }
3248
3249 if (bit == 0) {
3250 zeroCounter += 1;
3251 } else {
3252 break;
3253 }
3254 }
3255
3256 drflac_uint32 decodedRice;
3257 if (riceParam > 0) {
3258 if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
3259 return DRFLAC_FALSE;
3260 }
3261 } else {
3262 decodedRice = 0;
3263 }
3264
3265 decodedRice |= (zeroCounter << riceParam);
3266 if ((decodedRice & 0x01)) {
3267 decodedRice = ~(decodedRice >> 1);
3268 } else {
3269 decodedRice = (decodedRice >> 1);
3270 }
3271
3272
3273 if (bitsPerSample+shift >= 32) {
3274 pSamplesOut[i] = decodedRice + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + i);
3275 } else {
3276 pSamplesOut[i] = decodedRice + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + i);
3277 }
3278 }
3279
3280 return DRFLAC_TRUE;
3281}
3282#endif
3283
3284#if 0
3285static drflac_bool32 drflac__read_rice_parts__reference(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3286{
3287 drflac_uint32 zeroCounter = 0;
3288 drflac_uint32 decodedRice;
3289
3290 for (;;) {
3291 drflac_uint8 bit;
3292 if (!drflac__read_uint8(bs, 1, &bit)) {
3293 return DRFLAC_FALSE;
3294 }
3295
3296 if (bit == 0) {
3297 zeroCounter += 1;
3298 } else {
3299 break;
3300 }
3301 }
3302
3303 if (riceParam > 0) {
3304 if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
3305 return DRFLAC_FALSE;
3306 }
3307 } else {
3308 decodedRice = 0;
3309 }
3310
3311 *pZeroCounterOut = zeroCounter;
3312 *pRiceParamPartOut = decodedRice;
3313 return DRFLAC_TRUE;
3314}
3315#endif
3316
3317#if 0
3318static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3319{
3320 drflac_cache_t riceParamMask;
3321 drflac_uint32 zeroCounter;
3322 drflac_uint32 setBitOffsetPlus1;
3323 drflac_uint32 riceParamPart;
3324 drflac_uint32 riceLength;
3325
3326 DRFLAC_ASSERT(riceParam > 0); /* <-- riceParam should never be 0. drflac__read_rice_parts__param_equals_zero() should be used instead for this case. */
3327
3328 riceParamMask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParam);
3329
3330 zeroCounter = 0;
3331 while (bs->cache == 0) {
3332 zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
3333 if (!drflac__reload_cache(bs)) {
3334 return DRFLAC_FALSE;
3335 }
3336 }
3337
3338 setBitOffsetPlus1 = drflac__clz(bs->cache);
3339 zeroCounter += setBitOffsetPlus1;
3340 setBitOffsetPlus1 += 1;
3341
3342 riceLength = setBitOffsetPlus1 + riceParam;
3343 if (riceLength < DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3344 riceParamPart = (drflac_uint32)((bs->cache & (riceParamMask >> setBitOffsetPlus1)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceLength));
3345
3346 bs->consumedBits += riceLength;
3347 bs->cache <<= riceLength;
3348 } else {
3349 drflac_uint32 bitCountLo;
3350 drflac_cache_t resultHi;
3351
3352 bs->consumedBits += riceLength;
3353 bs->cache <<= setBitOffsetPlus1 & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1); /* <-- Equivalent to "if (setBitOffsetPlus1 < DRFLAC_CACHE_L1_SIZE_BITS(bs)) { bs->cache <<= setBitOffsetPlus1; }" */
3354
3355 /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
3356 bitCountLo = bs->consumedBits - DRFLAC_CACHE_L1_SIZE_BITS(bs);
3357 resultHi = DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, riceParam); /* <-- Use DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE() if ever this function allows riceParam=0. */
3358
3359 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3360#ifndef DR_FLAC_NO_CRC
3361 drflac__update_crc16(bs);
3362#endif
3363 bs->cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3364 bs->consumedBits = 0;
3365#ifndef DR_FLAC_NO_CRC
3366 bs->crc16Cache = bs->cache;
3367#endif
3368 } else {
3369 /* Slow path. We need to fetch more data from the client. */
3370 if (!drflac__reload_cache(bs)) {
3371 return DRFLAC_FALSE;
3372 }
3373 }
3374
3375 riceParamPart = (drflac_uint32)(resultHi | DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, bitCountLo));
3376
3377 bs->consumedBits += bitCountLo;
3378 bs->cache <<= bitCountLo;
3379 }
3380
3381 pZeroCounterOut[0] = zeroCounter;
3382 pRiceParamPartOut[0] = riceParamPart;
3383
3384 return DRFLAC_TRUE;
3385}
3386#endif
3387
3388static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts_x1(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3389{
3390 drflac_uint32 riceParamPlus1 = riceParam + 1;
3391 /*drflac_cache_t riceParamPlus1Mask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParamPlus1);*/
3392 drflac_uint32 riceParamPlus1Shift = DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPlus1);
3393 drflac_uint32 riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
3394
3395 /*
3396 The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
3397 no idea how this will work in practice...
3398 */
3399 drflac_cache_t bs_cache = bs->cache;
3400 drflac_uint32 bs_consumedBits = bs->consumedBits;
3401
3402 /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
3403 drflac_uint32 lzcount = drflac__clz(bs_cache);
3404 if (lzcount < sizeof(bs_cache)*8) {
3405 pZeroCounterOut[0] = lzcount;
3406
3407 /*
3408 It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
3409 this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
3410 outside of this function at a higher level.
3411 */
3412 extract_rice_param_part:
3413 bs_cache <<= lzcount;
3414 bs_consumedBits += lzcount;
3415
3416 if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
3417 /* Getting here means the rice parameter part is wholly contained within the current cache line. */
3418 pRiceParamPartOut[0] = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
3419 bs_cache <<= riceParamPlus1;
3420 bs_consumedBits += riceParamPlus1;
3421 } else {
3422 drflac_uint32 riceParamPartHi;
3423 drflac_uint32 riceParamPartLo;
3424 drflac_uint32 riceParamPartLoBitCount;
3425
3426 /*
3427 Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
3428 line, reload the cache, and then combine it with the head of the next cache line.
3429 */
3430
3431 /* Grab the high part of the rice parameter part. */
3432 riceParamPartHi = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
3433
3434 /* Before reloading the cache we need to grab the size in bits of the low part. */
3435 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
3436 DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
3437
3438 /* Now reload the cache. */
3439 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3440 #ifndef DR_FLAC_NO_CRC
3441 drflac__update_crc16(bs);
3442 #endif
3443 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3444 bs_consumedBits = riceParamPartLoBitCount;
3445 #ifndef DR_FLAC_NO_CRC
3446 bs->crc16Cache = bs_cache;
3447 #endif
3448 } else {
3449 /* Slow path. We need to fetch more data from the client. */
3450 if (!drflac__reload_cache(bs)) {
3451 return DRFLAC_FALSE;
3452 }
3453
3454 bs_cache = bs->cache;
3455 bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
3456 }
3457
3458 /* We should now have enough information to construct the rice parameter part. */
3459 riceParamPartLo = (drflac_uint32)(bs_cache >> (DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPartLoBitCount)));
3460 pRiceParamPartOut[0] = riceParamPartHi | riceParamPartLo;
3461
3462 bs_cache <<= riceParamPartLoBitCount;
3463 }
3464 } else {
3465 /*
3466 Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
3467 to drflac__clz() and we need to reload the cache.
3468 */
3469 drflac_uint32 zeroCounter = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BITS(bs) - bs_consumedBits);
3470 for (;;) {
3471 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3472 #ifndef DR_FLAC_NO_CRC
3473 drflac__update_crc16(bs);
3474 #endif
3475 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3476 bs_consumedBits = 0;
3477 #ifndef DR_FLAC_NO_CRC
3478 bs->crc16Cache = bs_cache;
3479 #endif
3480 } else {
3481 /* Slow path. We need to fetch more data from the client. */
3482 if (!drflac__reload_cache(bs)) {
3483 return DRFLAC_FALSE;
3484 }
3485
3486 bs_cache = bs->cache;
3487 bs_consumedBits = bs->consumedBits;
3488 }
3489
3490 lzcount = drflac__clz(bs_cache);
3491 zeroCounter += lzcount;
3492
3493 if (lzcount < sizeof(bs_cache)*8) {
3494 break;
3495 }
3496 }
3497
3498 pZeroCounterOut[0] = zeroCounter;
3499 goto extract_rice_param_part;
3500 }
3501
3502 /* Make sure the cache is restored at the end of it all. */
3503 bs->cache = bs_cache;
3504 bs->consumedBits = bs_consumedBits;
3505
3506 return DRFLAC_TRUE;
3507}
3508
3509static DRFLAC_INLINE drflac_bool32 drflac__seek_rice_parts(drflac_bs* bs, drflac_uint8 riceParam)
3510{
3511 drflac_uint32 riceParamPlus1 = riceParam + 1;
3512 drflac_uint32 riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
3513
3514 /*
3515 The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
3516 no idea how this will work in practice...
3517 */
3518 drflac_cache_t bs_cache = bs->cache;
3519 drflac_uint32 bs_consumedBits = bs->consumedBits;
3520
3521 /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
3522 drflac_uint32 lzcount = drflac__clz(bs_cache);
3523 if (lzcount < sizeof(bs_cache)*8) {
3524 /*
3525 It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
3526 this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
3527 outside of this function at a higher level.
3528 */
3529 extract_rice_param_part:
3530 bs_cache <<= lzcount;
3531 bs_consumedBits += lzcount;
3532
3533 if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
3534 /* Getting here means the rice parameter part is wholly contained within the current cache line. */
3535 bs_cache <<= riceParamPlus1;
3536 bs_consumedBits += riceParamPlus1;
3537 } else {
3538 /*
3539 Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
3540 line, reload the cache, and then combine it with the head of the next cache line.
3541 */
3542
3543 /* Before reloading the cache we need to grab the size in bits of the low part. */
3544 drflac_uint32 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
3545 DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
3546
3547 /* Now reload the cache. */
3548 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3549 #ifndef DR_FLAC_NO_CRC
3550 drflac__update_crc16(bs);
3551 #endif
3552 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3553 bs_consumedBits = riceParamPartLoBitCount;
3554 #ifndef DR_FLAC_NO_CRC
3555 bs->crc16Cache = bs_cache;
3556 #endif
3557 } else {
3558 /* Slow path. We need to fetch more data from the client. */
3559 if (!drflac__reload_cache(bs)) {
3560 return DRFLAC_FALSE;
3561 }
3562
3563 bs_cache = bs->cache;
3564 bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
3565 }
3566
3567 bs_cache <<= riceParamPartLoBitCount;
3568 }
3569 } else {
3570 /*
3571 Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
3572 to drflac__clz() and we need to reload the cache.
3573 */
3574 for (;;) {
3575 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3576 #ifndef DR_FLAC_NO_CRC
3577 drflac__update_crc16(bs);
3578 #endif
3579 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3580 bs_consumedBits = 0;
3581 #ifndef DR_FLAC_NO_CRC
3582 bs->crc16Cache = bs_cache;
3583 #endif
3584 } else {
3585 /* Slow path. We need to fetch more data from the client. */
3586 if (!drflac__reload_cache(bs)) {
3587 return DRFLAC_FALSE;
3588 }
3589
3590 bs_cache = bs->cache;
3591 bs_consumedBits = bs->consumedBits;
3592 }
3593
3594 lzcount = drflac__clz(bs_cache);
3595 if (lzcount < sizeof(bs_cache)*8) {
3596 break;
3597 }
3598 }
3599
3600 goto extract_rice_param_part;
3601 }
3602
3603 /* Make sure the cache is restored at the end of it all. */
3604 bs->cache = bs_cache;
3605 bs->consumedBits = bs_consumedBits;
3606
3607 return DRFLAC_TRUE;
3608}
3609
3610
3611static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar_zeroorder(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3612{
3613 drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3614 drflac_uint32 zeroCountPart0;
3615 drflac_uint32 riceParamPart0;
3616 drflac_uint32 riceParamMask;
3617 drflac_uint32 i;
3618
3619 DRFLAC_ASSERT(bs != NULL);
3620 DRFLAC_ASSERT(pSamplesOut != NULL);
3621
3622 (void)bitsPerSample;
3623 (void)order;
3624 (void)shift;
3625 (void)coefficients;
3626
3627 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
3628
3629 i = 0;
3630 while (i < count) {
3631 /* Rice extraction. */
3632 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
3633 return DRFLAC_FALSE;
3634 }
3635
3636 /* Rice reconstruction. */
3637 riceParamPart0 &= riceParamMask;
3638 riceParamPart0 |= (zeroCountPart0 << riceParam);
3639 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3640
3641 pSamplesOut[i] = riceParamPart0;
3642
3643 i += 1;
3644 }
3645
3646 return DRFLAC_TRUE;
3647}
3648
3649static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3650{
3651 drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3652 drflac_uint32 zeroCountPart0 = 0;
3653 drflac_uint32 zeroCountPart1 = 0;
3654 drflac_uint32 zeroCountPart2 = 0;
3655 drflac_uint32 zeroCountPart3 = 0;
3656 drflac_uint32 riceParamPart0 = 0;
3657 drflac_uint32 riceParamPart1 = 0;
3658 drflac_uint32 riceParamPart2 = 0;
3659 drflac_uint32 riceParamPart3 = 0;
3660 drflac_uint32 riceParamMask;
3661 const drflac_int32* pSamplesOutEnd;
3662 drflac_uint32 i;
3663
3664 DRFLAC_ASSERT(bs != NULL);
3665 DRFLAC_ASSERT(pSamplesOut != NULL);
3666
3667 if (order == 0) {
3668 return drflac__decode_samples_with_residual__rice__scalar_zeroorder(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
3669 }
3670
3671 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
3672 pSamplesOutEnd = pSamplesOut + (count & ~3);
3673
3674 if (bitsPerSample+shift > 32) {
3675 while (pSamplesOut < pSamplesOutEnd) {
3676 /*
3677 Rice extraction. It's faster to do this one at a time against local variables than it is to use the x4 version
3678 against an array. Not sure why, but perhaps it's making more efficient use of registers?
3679 */
3680 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
3681 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
3682 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
3683 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
3684 return DRFLAC_FALSE;
3685 }
3686
3687 riceParamPart0 &= riceParamMask;
3688 riceParamPart1 &= riceParamMask;
3689 riceParamPart2 &= riceParamMask;
3690 riceParamPart3 &= riceParamMask;
3691
3692 riceParamPart0 |= (zeroCountPart0 << riceParam);
3693 riceParamPart1 |= (zeroCountPart1 << riceParam);
3694 riceParamPart2 |= (zeroCountPart2 << riceParam);
3695 riceParamPart3 |= (zeroCountPart3 << riceParam);
3696
3697 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3698 riceParamPart1 = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
3699 riceParamPart2 = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
3700 riceParamPart3 = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
3701
3702 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + 0);
3703 pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + 1);
3704 pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + 2);
3705 pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + 3);
3706
3707 pSamplesOut += 4;
3708 }
3709 } else {
3710 while (pSamplesOut < pSamplesOutEnd) {
3711 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
3712 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
3713 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
3714 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
3715 return DRFLAC_FALSE;
3716 }
3717
3718 riceParamPart0 &= riceParamMask;
3719 riceParamPart1 &= riceParamMask;
3720 riceParamPart2 &= riceParamMask;
3721 riceParamPart3 &= riceParamMask;
3722
3723 riceParamPart0 |= (zeroCountPart0 << riceParam);
3724 riceParamPart1 |= (zeroCountPart1 << riceParam);
3725 riceParamPart2 |= (zeroCountPart2 << riceParam);
3726 riceParamPart3 |= (zeroCountPart3 << riceParam);
3727
3728 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3729 riceParamPart1 = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
3730 riceParamPart2 = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
3731 riceParamPart3 = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
3732
3733 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + 0);
3734 pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + 1);
3735 pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + 2);
3736 pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + 3);
3737
3738 pSamplesOut += 4;
3739 }
3740 }
3741
3742 i = (count & ~3);
3743 while (i < count) {
3744 /* Rice extraction. */
3745 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
3746 return DRFLAC_FALSE;
3747 }
3748
3749 /* Rice reconstruction. */
3750 riceParamPart0 &= riceParamMask;
3751 riceParamPart0 |= (zeroCountPart0 << riceParam);
3752 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3753 /*riceParamPart0 = (riceParamPart0 >> 1) ^ (~(riceParamPart0 & 0x01) + 1);*/
3754
3755 /* Sample reconstruction. */
3756 if (bitsPerSample+shift > 32) {
3757 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + 0);
3758 } else {
3759 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + 0);
3760 }
3761
3762 i += 1;
3763 pSamplesOut += 1;
3764 }
3765
3766 return DRFLAC_TRUE;
3767}
3768
3769#if defined(DRFLAC_SUPPORT_SSE2)
3770static DRFLAC_INLINE __m128i drflac__mm_packs_interleaved_epi32(__m128i a, __m128i b)
3771{
3772 __m128i r;
3773
3774 /* Pack. */
3775 r = _mm_packs_epi32(a, b);
3776
3777 /* a3a2 a1a0 b3b2 b1b0 -> a3a2 b3b2 a1a0 b1b0 */
3778 r = _mm_shuffle_epi32(r, _MM_SHUFFLE(3, 1, 2, 0));
3779
3780 /* a3a2 b3b2 a1a0 b1b0 -> a3b3 a2b2 a1b1 a0b0 */
3781 r = _mm_shufflehi_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
3782 r = _mm_shufflelo_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
3783
3784 return r;
3785}
3786#endif
3787
3788#if defined(DRFLAC_SUPPORT_SSE41)
3789static DRFLAC_INLINE __m128i drflac__mm_not_si128(__m128i a)
3790{
3791 return _mm_xor_si128(a, _mm_cmpeq_epi32(_mm_setzero_si128(), _mm_setzero_si128()));
3792}
3793
3794static DRFLAC_INLINE __m128i drflac__mm_hadd_epi32(__m128i x)
3795{
3796 __m128i x64 = _mm_add_epi32(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
3797 __m128i x32 = _mm_shufflelo_epi16(x64, _MM_SHUFFLE(1, 0, 3, 2));
3798 return _mm_add_epi32(x64, x32);
3799}
3800
3801static DRFLAC_INLINE __m128i drflac__mm_hadd_epi64(__m128i x)
3802{
3803 return _mm_add_epi64(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
3804}
3805
3806static DRFLAC_INLINE __m128i drflac__mm_srai_epi64(__m128i x, int count)
3807{
3808 /*
3809 To simplify this we are assuming count < 32. This restriction allows us to work on a low side and a high side. The low side
3810 is shifted with zero bits, whereas the right side is shifted with sign bits.
3811 */
3812 __m128i lo = _mm_srli_epi64(x, count);
3813 __m128i hi = _mm_srai_epi32(x, count);
3814
3815 hi = _mm_and_si128(hi, _mm_set_epi32(0xFFFFFFFF, 0, 0xFFFFFFFF, 0)); /* The high part needs to have the low part cleared. */
3816
3817 return _mm_or_si128(lo, hi);
3818}
3819
3820static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3821{
3822 int i;
3823 drflac_uint32 riceParamMask;
3824 drflac_int32* pDecodedSamples = pSamplesOut;
3825 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
3826 drflac_uint32 zeroCountParts0 = 0;
3827 drflac_uint32 zeroCountParts1 = 0;
3828 drflac_uint32 zeroCountParts2 = 0;
3829 drflac_uint32 zeroCountParts3 = 0;
3830 drflac_uint32 riceParamParts0 = 0;
3831 drflac_uint32 riceParamParts1 = 0;
3832 drflac_uint32 riceParamParts2 = 0;
3833 drflac_uint32 riceParamParts3 = 0;
3834 __m128i coefficients128_0;
3835 __m128i coefficients128_4;
3836 __m128i coefficients128_8;
3837 __m128i samples128_0;
3838 __m128i samples128_4;
3839 __m128i samples128_8;
3840 __m128i riceParamMask128;
3841
3842 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3843
3844 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
3845 riceParamMask128 = _mm_set1_epi32(riceParamMask);
3846
3847 /* Pre-load. */
3848 coefficients128_0 = _mm_setzero_si128();
3849 coefficients128_4 = _mm_setzero_si128();
3850 coefficients128_8 = _mm_setzero_si128();
3851
3852 samples128_0 = _mm_setzero_si128();
3853 samples128_4 = _mm_setzero_si128();
3854 samples128_8 = _mm_setzero_si128();
3855
3856 /*
3857 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
3858 what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results
3859 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
3860 so I think there's opportunity for this to be simplified.
3861 */
3862#if 1
3863 {
3864 int runningOrder = order;
3865
3866 /* 0 - 3. */
3867 if (runningOrder >= 4) {
3868 coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
3869 samples128_0 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 4));
3870 runningOrder -= 4;
3871 } else {
3872 switch (runningOrder) {
3873 case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
3874 case 2: coefficients128_0 = _mm_set_epi32(0, 0, coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0, 0); break;
3875 case 1: coefficients128_0 = _mm_set_epi32(0, 0, 0, coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0, 0, 0); break;
3876 }
3877 runningOrder = 0;
3878 }
3879
3880 /* 4 - 7 */
3881 if (runningOrder >= 4) {
3882 coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
3883 samples128_4 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 8));
3884 runningOrder -= 4;
3885 } else {
3886 switch (runningOrder) {
3887 case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
3888 case 2: coefficients128_4 = _mm_set_epi32(0, 0, coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0, 0); break;
3889 case 1: coefficients128_4 = _mm_set_epi32(0, 0, 0, coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0, 0, 0); break;
3890 }
3891 runningOrder = 0;
3892 }
3893
3894 /* 8 - 11 */
3895 if (runningOrder == 4) {
3896 coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
3897 samples128_8 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 12));
3898 runningOrder -= 4;
3899 } else {
3900 switch (runningOrder) {
3901 case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
3902 case 2: coefficients128_8 = _mm_set_epi32(0, 0, coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0, 0); break;
3903 case 1: coefficients128_8 = _mm_set_epi32(0, 0, 0, coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0, 0, 0); break;
3904 }
3905 runningOrder = 0;
3906 }
3907
3908 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
3909 coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
3910 coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
3911 coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
3912 }
3913#else
3914 /* This causes strict-aliasing warnings with GCC. */
3915 switch (order)
3916 {
3917 case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
3918 case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
3919 case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
3920 case 9: ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
3921 case 8: ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
3922 case 7: ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
3923 case 6: ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
3924 case 5: ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
3925 case 4: ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
3926 case 3: ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
3927 case 2: ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
3928 case 1: ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
3929 }
3930#endif
3931
3932 /* For this version we are doing one sample at a time. */
3933 while (pDecodedSamples < pDecodedSamplesEnd) {
3934 __m128i prediction128;
3935 __m128i zeroCountPart128;
3936 __m128i riceParamPart128;
3937
3938 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
3939 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
3940 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
3941 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
3942 return DRFLAC_FALSE;
3943 }
3944
3945 zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
3946 riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
3947
3948 riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
3949 riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
3950 riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01))), _mm_set1_epi32(0x01))); /* <-- SSE2 compatible */
3951 /*riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_mullo_epi32(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01)), _mm_set1_epi32(0xFFFFFFFF)));*/ /* <-- Only supported from SSE4.1 and is slower in my testing... */
3952
3953 if (order <= 4) {
3954 for (i = 0; i < 4; i += 1) {
3955 prediction128 = _mm_mullo_epi32(coefficients128_0, samples128_0);
3956
3957 /* Horizontal add and shift. */
3958 prediction128 = drflac__mm_hadd_epi32(prediction128);
3959 prediction128 = _mm_srai_epi32(prediction128, shift);
3960 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
3961
3962 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
3963 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
3964 }
3965 } else if (order <= 8) {
3966 for (i = 0; i < 4; i += 1) {
3967 prediction128 = _mm_mullo_epi32(coefficients128_4, samples128_4);
3968 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
3969
3970 /* Horizontal add and shift. */
3971 prediction128 = drflac__mm_hadd_epi32(prediction128);
3972 prediction128 = _mm_srai_epi32(prediction128, shift);
3973 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
3974
3975 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
3976 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
3977 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
3978 }
3979 } else {
3980 for (i = 0; i < 4; i += 1) {
3981 prediction128 = _mm_mullo_epi32(coefficients128_8, samples128_8);
3982 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_4, samples128_4));
3983 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
3984
3985 /* Horizontal add and shift. */
3986 prediction128 = drflac__mm_hadd_epi32(prediction128);
3987 prediction128 = _mm_srai_epi32(prediction128, shift);
3988 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
3989
3990 samples128_8 = _mm_alignr_epi8(samples128_4, samples128_8, 4);
3991 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
3992 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
3993 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
3994 }
3995 }
3996
3997 /* We store samples in groups of 4. */
3998 _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
3999 pDecodedSamples += 4;
4000 }
4001
4002 /* Make sure we process the last few samples. */
4003 i = (count & ~3);
4004 while (i < (int)count) {
4005 /* Rice extraction. */
4006 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
4007 return DRFLAC_FALSE;
4008 }
4009
4010 /* Rice reconstruction. */
4011 riceParamParts0 &= riceParamMask;
4012 riceParamParts0 |= (zeroCountParts0 << riceParam);
4013 riceParamParts0 = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
4014
4015 /* Sample reconstruction. */
4016 pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
4017
4018 i += 1;
4019 pDecodedSamples += 1;
4020 }
4021
4022 return DRFLAC_TRUE;
4023}
4024
4025static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4026{
4027 int i;
4028 drflac_uint32 riceParamMask;
4029 drflac_int32* pDecodedSamples = pSamplesOut;
4030 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4031 drflac_uint32 zeroCountParts0 = 0;
4032 drflac_uint32 zeroCountParts1 = 0;
4033 drflac_uint32 zeroCountParts2 = 0;
4034 drflac_uint32 zeroCountParts3 = 0;
4035 drflac_uint32 riceParamParts0 = 0;
4036 drflac_uint32 riceParamParts1 = 0;
4037 drflac_uint32 riceParamParts2 = 0;
4038 drflac_uint32 riceParamParts3 = 0;
4039 __m128i coefficients128_0;
4040 __m128i coefficients128_4;
4041 __m128i coefficients128_8;
4042 __m128i samples128_0;
4043 __m128i samples128_4;
4044 __m128i samples128_8;
4045 __m128i prediction128;
4046 __m128i riceParamMask128;
4047
4048 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4049
4050 DRFLAC_ASSERT(order <= 12);
4051
4052 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
4053 riceParamMask128 = _mm_set1_epi32(riceParamMask);
4054
4055 prediction128 = _mm_setzero_si128();
4056
4057 /* Pre-load. */
4058 coefficients128_0 = _mm_setzero_si128();
4059 coefficients128_4 = _mm_setzero_si128();
4060 coefficients128_8 = _mm_setzero_si128();
4061
4062 samples128_0 = _mm_setzero_si128();
4063 samples128_4 = _mm_setzero_si128();
4064 samples128_8 = _mm_setzero_si128();
4065
4066#if 1
4067 {
4068 int runningOrder = order;
4069
4070 /* 0 - 3. */
4071 if (runningOrder >= 4) {
4072 coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
4073 samples128_0 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 4));
4074 runningOrder -= 4;
4075 } else {
4076 switch (runningOrder) {
4077 case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
4078 case 2: coefficients128_0 = _mm_set_epi32(0, 0, coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0, 0); break;
4079 case 1: coefficients128_0 = _mm_set_epi32(0, 0, 0, coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0, 0, 0); break;
4080 }
4081 runningOrder = 0;
4082 }
4083
4084 /* 4 - 7 */
4085 if (runningOrder >= 4) {
4086 coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
4087 samples128_4 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 8));
4088 runningOrder -= 4;
4089 } else {
4090 switch (runningOrder) {
4091 case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
4092 case 2: coefficients128_4 = _mm_set_epi32(0, 0, coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0, 0); break;
4093 case 1: coefficients128_4 = _mm_set_epi32(0, 0, 0, coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0, 0, 0); break;
4094 }
4095 runningOrder = 0;
4096 }
4097
4098 /* 8 - 11 */
4099 if (runningOrder == 4) {
4100 coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
4101 samples128_8 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 12));
4102 runningOrder -= 4;
4103 } else {
4104 switch (runningOrder) {
4105 case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
4106 case 2: coefficients128_8 = _mm_set_epi32(0, 0, coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0, 0); break;
4107 case 1: coefficients128_8 = _mm_set_epi32(0, 0, 0, coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0, 0, 0); break;
4108 }
4109 runningOrder = 0;
4110 }
4111
4112 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4113 coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
4114 coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
4115 coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
4116 }
4117#else
4118 switch (order)
4119 {
4120 case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
4121 case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
4122 case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
4123 case 9: ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
4124 case 8: ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
4125 case 7: ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
4126 case 6: ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
4127 case 5: ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
4128 case 4: ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
4129 case 3: ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
4130 case 2: ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
4131 case 1: ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
4132 }
4133#endif
4134
4135 /* For this version we are doing one sample at a time. */
4136 while (pDecodedSamples < pDecodedSamplesEnd) {
4137 __m128i zeroCountPart128;
4138 __m128i riceParamPart128;
4139
4140 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
4141 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
4142 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
4143 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
4144 return DRFLAC_FALSE;
4145 }
4146
4147 zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
4148 riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
4149
4150 riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
4151 riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
4152 riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(1))), _mm_set1_epi32(1)));
4153
4154 for (i = 0; i < 4; i += 1) {
4155 prediction128 = _mm_xor_si128(prediction128, prediction128); /* Reset to 0. */
4156
4157 switch (order)
4158 {
4159 case 12:
4160 case 11: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(1, 1, 0, 0))));
4161 case 10:
4162 case 9: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(3, 3, 2, 2))));
4163 case 8:
4164 case 7: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(1, 1, 0, 0))));
4165 case 6:
4166 case 5: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(3, 3, 2, 2))));
4167 case 4:
4168 case 3: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(1, 1, 0, 0))));
4169 case 2:
4170 case 1: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(3, 3, 2, 2))));
4171 }
4172
4173 /* Horizontal add and shift. */
4174 prediction128 = drflac__mm_hadd_epi64(prediction128);
4175 prediction128 = drflac__mm_srai_epi64(prediction128, shift);
4176 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4177
4178 /* Our value should be sitting in prediction128[0]. We need to combine this with our SSE samples. */
4179 samples128_8 = _mm_alignr_epi8(samples128_4, samples128_8, 4);
4180 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
4181 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4182
4183 /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
4184 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4185 }
4186
4187 /* We store samples in groups of 4. */
4188 _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
4189 pDecodedSamples += 4;
4190 }
4191
4192 /* Make sure we process the last few samples. */
4193 i = (count & ~3);
4194 while (i < (int)count) {
4195 /* Rice extraction. */
4196 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
4197 return DRFLAC_FALSE;
4198 }
4199
4200 /* Rice reconstruction. */
4201 riceParamParts0 &= riceParamMask;
4202 riceParamParts0 |= (zeroCountParts0 << riceParam);
4203 riceParamParts0 = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
4204
4205 /* Sample reconstruction. */
4206 pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
4207
4208 i += 1;
4209 pDecodedSamples += 1;
4210 }
4211
4212 return DRFLAC_TRUE;
4213}
4214
4215static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4216{
4217 DRFLAC_ASSERT(bs != NULL);
4218 DRFLAC_ASSERT(pSamplesOut != NULL);
4219
4220 /* In my testing the order is rarely > 12, so in this case I'm going to simplify the SSE implementation by only handling order <= 12. */
4221 if (order > 0 && order <= 12) {
4222 if (bitsPerSample+shift > 32) {
4223 return drflac__decode_samples_with_residual__rice__sse41_64(bs, count, riceParam, order, shift, coefficients, pSamplesOut);
4224 } else {
4225 return drflac__decode_samples_with_residual__rice__sse41_32(bs, count, riceParam, order, shift, coefficients, pSamplesOut);
4226 }
4227 } else {
4228 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
4229 }
4230}
4231#endif
4232
4233#if defined(DRFLAC_SUPPORT_NEON)
4234static DRFLAC_INLINE void drflac__vst2q_s32(drflac_int32* p, int32x4x2_t x)
4235{
4236 vst1q_s32(p+0, x.val[0]);
4237 vst1q_s32(p+4, x.val[1]);
4238}
4239
4240static DRFLAC_INLINE void drflac__vst2q_u32(drflac_uint32* p, uint32x4x2_t x)
4241{
4242 vst1q_u32(p+0, x.val[0]);
4243 vst1q_u32(p+4, x.val[1]);
4244}
4245
4246static DRFLAC_INLINE void drflac__vst2q_f32(float* p, float32x4x2_t x)
4247{
4248 vst1q_f32(p+0, x.val[0]);
4249 vst1q_f32(p+4, x.val[1]);
4250}
4251
4252static DRFLAC_INLINE void drflac__vst2q_s16(drflac_int16* p, int16x4x2_t x)
4253{
4254 vst1q_s16(p, vcombine_s16(x.val[0], x.val[1]));
4255}
4256
4257static DRFLAC_INLINE void drflac__vst2q_u16(drflac_uint16* p, uint16x4x2_t x)
4258{
4259 vst1q_u16(p, vcombine_u16(x.val[0], x.val[1]));
4260}
4261
4262static DRFLAC_INLINE int32x4_t drflac__vdupq_n_s32x4(drflac_int32 x3, drflac_int32 x2, drflac_int32 x1, drflac_int32 x0)
4263{
4264 drflac_int32 x[4];
4265 x[3] = x3;
4266 x[2] = x2;
4267 x[1] = x1;
4268 x[0] = x0;
4269 return vld1q_s32(x);
4270}
4271
4272static DRFLAC_INLINE int32x4_t drflac__valignrq_s32_1(int32x4_t a, int32x4_t b)
4273{
4274 /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
4275
4276 /* Reference */
4277 /*return drflac__vdupq_n_s32x4(
4278 vgetq_lane_s32(a, 0),
4279 vgetq_lane_s32(b, 3),
4280 vgetq_lane_s32(b, 2),
4281 vgetq_lane_s32(b, 1)
4282 );*/
4283
4284 return vextq_s32(b, a, 1);
4285}
4286
4287static DRFLAC_INLINE uint32x4_t drflac__valignrq_u32_1(uint32x4_t a, uint32x4_t b)
4288{
4289 /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
4290
4291 /* Reference */
4292 /*return drflac__vdupq_n_s32x4(
4293 vgetq_lane_s32(a, 0),
4294 vgetq_lane_s32(b, 3),
4295 vgetq_lane_s32(b, 2),
4296 vgetq_lane_s32(b, 1)
4297 );*/
4298
4299 return vextq_u32(b, a, 1);
4300}
4301
4302static DRFLAC_INLINE int32x2_t drflac__vhaddq_s32(int32x4_t x)
4303{
4304 /* The sum must end up in position 0. */
4305
4306 /* Reference */
4307 /*return vdupq_n_s32(
4308 vgetq_lane_s32(x, 3) +
4309 vgetq_lane_s32(x, 2) +
4310 vgetq_lane_s32(x, 1) +
4311 vgetq_lane_s32(x, 0)
4312 );*/
4313
4314 int32x2_t r = vadd_s32(vget_high_s32(x), vget_low_s32(x));
4315 return vpadd_s32(r, r);
4316}
4317
4318static DRFLAC_INLINE int64x1_t drflac__vhaddq_s64(int64x2_t x)
4319{
4320 return vadd_s64(vget_high_s64(x), vget_low_s64(x));
4321}
4322
4323static DRFLAC_INLINE int32x4_t drflac__vrevq_s32(int32x4_t x)
4324{
4325 /* Reference */
4326 /*return drflac__vdupq_n_s32x4(
4327 vgetq_lane_s32(x, 0),
4328 vgetq_lane_s32(x, 1),
4329 vgetq_lane_s32(x, 2),
4330 vgetq_lane_s32(x, 3)
4331 );*/
4332
4333 return vrev64q_s32(vcombine_s32(vget_high_s32(x), vget_low_s32(x)));
4334}
4335
4336static DRFLAC_INLINE int32x4_t drflac__vnotq_s32(int32x4_t x)
4337{
4338 return veorq_s32(x, vdupq_n_s32(0xFFFFFFFF));
4339}
4340
4341static DRFLAC_INLINE uint32x4_t drflac__vnotq_u32(uint32x4_t x)
4342{
4343 return veorq_u32(x, vdupq_n_u32(0xFFFFFFFF));
4344}
4345
4346static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4347{
4348 int i;
4349 drflac_uint32 riceParamMask;
4350 drflac_int32* pDecodedSamples = pSamplesOut;
4351 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4352 drflac_uint32 zeroCountParts[4];
4353 drflac_uint32 riceParamParts[4];
4354 int32x4_t coefficients128_0;
4355 int32x4_t coefficients128_4;
4356 int32x4_t coefficients128_8;
4357 int32x4_t samples128_0;
4358 int32x4_t samples128_4;
4359 int32x4_t samples128_8;
4360 uint32x4_t riceParamMask128;
4361 int32x4_t riceParam128;
4362 int32x2_t shift64;
4363 uint32x4_t one128;
4364
4365 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4366
4367 riceParamMask = ~((~0UL) << riceParam);
4368 riceParamMask128 = vdupq_n_u32(riceParamMask);
4369
4370 riceParam128 = vdupq_n_s32(riceParam);
4371 shift64 = vdup_n_s32(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
4372 one128 = vdupq_n_u32(1);
4373
4374 /*
4375 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
4376 what's available in the input buffers. It would be conenient to use a fall-through switch to do this, but this results
4377 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
4378 so I think there's opportunity for this to be simplified.
4379 */
4380 {
4381 int runningOrder = order;
4382 drflac_int32 tempC[4] = {0, 0, 0, 0};
4383 drflac_int32 tempS[4] = {0, 0, 0, 0};
4384
4385 /* 0 - 3. */
4386 if (runningOrder >= 4) {
4387 coefficients128_0 = vld1q_s32(coefficients + 0);
4388 samples128_0 = vld1q_s32(pSamplesOut - 4);
4389 runningOrder -= 4;
4390 } else {
4391 switch (runningOrder) {
4392 case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
4393 case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
4394 case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
4395 }
4396
4397 coefficients128_0 = vld1q_s32(tempC);
4398 samples128_0 = vld1q_s32(tempS);
4399 runningOrder = 0;
4400 }
4401
4402 /* 4 - 7 */
4403 if (runningOrder >= 4) {
4404 coefficients128_4 = vld1q_s32(coefficients + 4);
4405 samples128_4 = vld1q_s32(pSamplesOut - 8);
4406 runningOrder -= 4;
4407 } else {
4408 switch (runningOrder) {
4409 case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
4410 case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
4411 case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
4412 }
4413
4414 coefficients128_4 = vld1q_s32(tempC);
4415 samples128_4 = vld1q_s32(tempS);
4416 runningOrder = 0;
4417 }
4418
4419 /* 8 - 11 */
4420 if (runningOrder == 4) {
4421 coefficients128_8 = vld1q_s32(coefficients + 8);
4422 samples128_8 = vld1q_s32(pSamplesOut - 12);
4423 runningOrder -= 4;
4424 } else {
4425 switch (runningOrder) {
4426 case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
4427 case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
4428 case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
4429 }
4430
4431 coefficients128_8 = vld1q_s32(tempC);
4432 samples128_8 = vld1q_s32(tempS);
4433 runningOrder = 0;
4434 }
4435
4436 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4437 coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
4438 coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
4439 coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
4440 }
4441
4442 /* For this version we are doing one sample at a time. */
4443 while (pDecodedSamples < pDecodedSamplesEnd) {
4444 int32x4_t prediction128;
4445 int32x2_t prediction64;
4446 uint32x4_t zeroCountPart128;
4447 uint32x4_t riceParamPart128;
4448
4449 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
4450 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
4451 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
4452 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
4453 return DRFLAC_FALSE;
4454 }
4455
4456 zeroCountPart128 = vld1q_u32(zeroCountParts);
4457 riceParamPart128 = vld1q_u32(riceParamParts);
4458
4459 riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
4460 riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
4461 riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
4462
4463 if (order <= 4) {
4464 for (i = 0; i < 4; i += 1) {
4465 prediction128 = vmulq_s32(coefficients128_0, samples128_0);
4466
4467 /* Horizontal add and shift. */
4468 prediction64 = drflac__vhaddq_s32(prediction128);
4469 prediction64 = vshl_s32(prediction64, shift64);
4470 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4471
4472 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4473 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4474 }
4475 } else if (order <= 8) {
4476 for (i = 0; i < 4; i += 1) {
4477 prediction128 = vmulq_s32(coefficients128_4, samples128_4);
4478 prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
4479
4480 /* Horizontal add and shift. */
4481 prediction64 = drflac__vhaddq_s32(prediction128);
4482 prediction64 = vshl_s32(prediction64, shift64);
4483 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4484
4485 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4486 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4487 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4488 }
4489 } else {
4490 for (i = 0; i < 4; i += 1) {
4491 prediction128 = vmulq_s32(coefficients128_8, samples128_8);
4492 prediction128 = vmlaq_s32(prediction128, coefficients128_4, samples128_4);
4493 prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
4494
4495 /* Horizontal add and shift. */
4496 prediction64 = drflac__vhaddq_s32(prediction128);
4497 prediction64 = vshl_s32(prediction64, shift64);
4498 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4499
4500 samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
4501 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4502 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4503 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4504 }
4505 }
4506
4507 /* We store samples in groups of 4. */
4508 vst1q_s32(pDecodedSamples, samples128_0);
4509 pDecodedSamples += 4;
4510 }
4511
4512 /* Make sure we process the last few samples. */
4513 i = (count & ~3);
4514 while (i < (int)count) {
4515 /* Rice extraction. */
4516 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
4517 return DRFLAC_FALSE;
4518 }
4519
4520 /* Rice reconstruction. */
4521 riceParamParts[0] &= riceParamMask;
4522 riceParamParts[0] |= (zeroCountParts[0] << riceParam);
4523 riceParamParts[0] = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
4524
4525 /* Sample reconstruction. */
4526 pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
4527
4528 i += 1;
4529 pDecodedSamples += 1;
4530 }
4531
4532 return DRFLAC_TRUE;
4533}
4534
4535static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4536{
4537 int i;
4538 drflac_uint32 riceParamMask;
4539 drflac_int32* pDecodedSamples = pSamplesOut;
4540 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4541 drflac_uint32 zeroCountParts[4];
4542 drflac_uint32 riceParamParts[4];
4543 int32x4_t coefficients128_0;
4544 int32x4_t coefficients128_4;
4545 int32x4_t coefficients128_8;
4546 int32x4_t samples128_0;
4547 int32x4_t samples128_4;
4548 int32x4_t samples128_8;
4549 uint32x4_t riceParamMask128;
4550 int32x4_t riceParam128;
4551 int64x1_t shift64;
4552 uint32x4_t one128;
4553
4554 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4555
4556 riceParamMask = ~((~0UL) << riceParam);
4557 riceParamMask128 = vdupq_n_u32(riceParamMask);
4558
4559 riceParam128 = vdupq_n_s32(riceParam);
4560 shift64 = vdup_n_s64(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
4561 one128 = vdupq_n_u32(1);
4562
4563 /*
4564 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
4565 what's available in the input buffers. It would be conenient to use a fall-through switch to do this, but this results
4566 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
4567 so I think there's opportunity for this to be simplified.
4568 */
4569 {
4570 int runningOrder = order;
4571 drflac_int32 tempC[4] = {0, 0, 0, 0};
4572 drflac_int32 tempS[4] = {0, 0, 0, 0};
4573
4574 /* 0 - 3. */
4575 if (runningOrder >= 4) {
4576 coefficients128_0 = vld1q_s32(coefficients + 0);
4577 samples128_0 = vld1q_s32(pSamplesOut - 4);
4578 runningOrder -= 4;
4579 } else {
4580 switch (runningOrder) {
4581 case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
4582 case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
4583 case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
4584 }
4585
4586 coefficients128_0 = vld1q_s32(tempC);
4587 samples128_0 = vld1q_s32(tempS);
4588 runningOrder = 0;
4589 }
4590
4591 /* 4 - 7 */
4592 if (runningOrder >= 4) {
4593 coefficients128_4 = vld1q_s32(coefficients + 4);
4594 samples128_4 = vld1q_s32(pSamplesOut - 8);
4595 runningOrder -= 4;
4596 } else {
4597 switch (runningOrder) {
4598 case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
4599 case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
4600 case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
4601 }
4602
4603 coefficients128_4 = vld1q_s32(tempC);
4604 samples128_4 = vld1q_s32(tempS);
4605 runningOrder = 0;
4606 }
4607
4608 /* 8 - 11 */
4609 if (runningOrder == 4) {
4610 coefficients128_8 = vld1q_s32(coefficients + 8);
4611 samples128_8 = vld1q_s32(pSamplesOut - 12);
4612 runningOrder -= 4;
4613 } else {
4614 switch (runningOrder) {
4615 case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
4616 case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
4617 case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
4618 }
4619
4620 coefficients128_8 = vld1q_s32(tempC);
4621 samples128_8 = vld1q_s32(tempS);
4622 runningOrder = 0;
4623 }
4624
4625 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4626 coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
4627 coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
4628 coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
4629 }
4630
4631 /* For this version we are doing one sample at a time. */
4632 while (pDecodedSamples < pDecodedSamplesEnd) {
4633 int64x2_t prediction128;
4634 uint32x4_t zeroCountPart128;
4635 uint32x4_t riceParamPart128;
4636
4637 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
4638 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
4639 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
4640 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
4641 return DRFLAC_FALSE;
4642 }
4643
4644 zeroCountPart128 = vld1q_u32(zeroCountParts);
4645 riceParamPart128 = vld1q_u32(riceParamParts);
4646
4647 riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
4648 riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
4649 riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
4650
4651 for (i = 0; i < 4; i += 1) {
4652 int64x1_t prediction64;
4653
4654 prediction128 = veorq_s64(prediction128, prediction128); /* Reset to 0. */
4655 switch (order)
4656 {
4657 case 12:
4658 case 11: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_8), vget_low_s32(samples128_8)));
4659 case 10:
4660 case 9: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_8), vget_high_s32(samples128_8)));
4661 case 8:
4662 case 7: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_4), vget_low_s32(samples128_4)));
4663 case 6:
4664 case 5: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_4), vget_high_s32(samples128_4)));
4665 case 4:
4666 case 3: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_0), vget_low_s32(samples128_0)));
4667 case 2:
4668 case 1: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_0), vget_high_s32(samples128_0)));
4669 }
4670
4671 /* Horizontal add and shift. */
4672 prediction64 = drflac__vhaddq_s64(prediction128);
4673 prediction64 = vshl_s64(prediction64, shift64);
4674 prediction64 = vadd_s64(prediction64, vdup_n_s64(vgetq_lane_u32(riceParamPart128, 0)));
4675
4676 /* Our value should be sitting in prediction64[0]. We need to combine this with our SSE samples. */
4677 samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
4678 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4679 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(vreinterpret_s32_s64(prediction64), vdup_n_s32(0)), samples128_0);
4680
4681 /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
4682 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4683 }
4684
4685 /* We store samples in groups of 4. */
4686 vst1q_s32(pDecodedSamples, samples128_0);
4687 pDecodedSamples += 4;
4688 }
4689
4690 /* Make sure we process the last few samples. */
4691 i = (count & ~3);
4692 while (i < (int)count) {
4693 /* Rice extraction. */
4694 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
4695 return DRFLAC_FALSE;
4696 }
4697
4698 /* Rice reconstruction. */
4699 riceParamParts[0] &= riceParamMask;
4700 riceParamParts[0] |= (zeroCountParts[0] << riceParam);
4701 riceParamParts[0] = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
4702
4703 /* Sample reconstruction. */
4704 pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
4705
4706 i += 1;
4707 pDecodedSamples += 1;
4708 }
4709
4710 return DRFLAC_TRUE;
4711}
4712
4713static drflac_bool32 drflac__decode_samples_with_residual__rice__neon(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4714{
4715 DRFLAC_ASSERT(bs != NULL);
4716 DRFLAC_ASSERT(pSamplesOut != NULL);
4717
4718 /* In my testing the order is rarely > 12, so in this case I'm going to simplify the NEON implementation by only handling order <= 12. */
4719 if (order > 0 && order <= 12) {
4720 if (bitsPerSample+shift > 32) {
4721 return drflac__decode_samples_with_residual__rice__neon_64(bs, count, riceParam, order, shift, coefficients, pSamplesOut);
4722 } else {
4723 return drflac__decode_samples_with_residual__rice__neon_32(bs, count, riceParam, order, shift, coefficients, pSamplesOut);
4724 }
4725 } else {
4726 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
4727 }
4728}
4729#endif
4730
4731static drflac_bool32 drflac__decode_samples_with_residual__rice(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4732{
4733#if defined(DRFLAC_SUPPORT_SSE41)
4734 if (drflac__gIsSSE41Supported) {
4735 return drflac__decode_samples_with_residual__rice__sse41(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
4736 } else
4737#elif defined(DRFLAC_SUPPORT_NEON)
4738 if (drflac__gIsNEONSupported) {
4739 return drflac__decode_samples_with_residual__rice__neon(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
4740 } else
4741#endif
4742 {
4743 /* Scalar fallback. */
4744 #if 0
4745 return drflac__decode_samples_with_residual__rice__reference(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
4746 #else
4747 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
4748 #endif
4749 }
4750}
4751
4752/* Reads and seeks past a string of residual values as Rice codes. The decoder should be sitting on the first bit of the Rice codes. */
4753static drflac_bool32 drflac__read_and_seek_residual__rice(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam)
4754{
4755 drflac_uint32 i;
4756
4757 DRFLAC_ASSERT(bs != NULL);
4758
4759 for (i = 0; i < count; ++i) {
4760 if (!drflac__seek_rice_parts(bs, riceParam)) {
4761 return DRFLAC_FALSE;
4762 }
4763 }
4764
4765 return DRFLAC_TRUE;
4766}
4767
4768static drflac_bool32 drflac__decode_samples_with_residual__unencoded(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 unencodedBitsPerSample, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4769{
4770 drflac_uint32 i;
4771
4772 DRFLAC_ASSERT(bs != NULL);
4773 DRFLAC_ASSERT(unencodedBitsPerSample <= 31); /* <-- unencodedBitsPerSample is a 5 bit number, so cannot exceed 31. */
4774 DRFLAC_ASSERT(pSamplesOut != NULL);
4775
4776 for (i = 0; i < count; ++i) {
4777 if (unencodedBitsPerSample > 0) {
4778 if (!drflac__read_int32(bs, unencodedBitsPerSample, pSamplesOut + i)) {
4779 return DRFLAC_FALSE;
4780 }
4781 } else {
4782 pSamplesOut[i] = 0;
4783 }
4784
4785 if (bitsPerSample >= 24) {
4786 pSamplesOut[i] += drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + i);
4787 } else {
4788 pSamplesOut[i] += drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + i);
4789 }
4790 }
4791
4792 return DRFLAC_TRUE;
4793}
4794
4795
4796/*
4797Reads and decodes the residual for the sub-frame the decoder is currently sitting on. This function should be called
4798when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be ignored. The
4799<blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
4800*/
4801static drflac_bool32 drflac__decode_samples_with_residual(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 blockSize, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
4802{
4803 drflac_uint8 residualMethod;
4804 drflac_uint8 partitionOrder;
4805 drflac_uint32 samplesInPartition;
4806 drflac_uint32 partitionsRemaining;
4807
4808 DRFLAC_ASSERT(bs != NULL);
4809 DRFLAC_ASSERT(blockSize != 0);
4810 DRFLAC_ASSERT(pDecodedSamples != NULL); /* <-- Should we allow NULL, in which case we just seek past the residual rather than do a full decode? */
4811
4812 if (!drflac__read_uint8(bs, 2, &residualMethod)) {
4813 return DRFLAC_FALSE;
4814 }
4815
4816 if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4817 return DRFLAC_FALSE; /* Unknown or unsupported residual coding method. */
4818 }
4819
4820 /* Ignore the first <order> values. */
4821 pDecodedSamples += order;
4822
4823 if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
4824 return DRFLAC_FALSE;
4825 }
4826
4827 /*
4828 From the FLAC spec:
4829 The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
4830 */
4831 if (partitionOrder > 8) {
4832 return DRFLAC_FALSE;
4833 }
4834
4835 /* Validation check. */
4836 if ((blockSize / (1 << partitionOrder)) < order) {
4837 return DRFLAC_FALSE;
4838 }
4839
4840 samplesInPartition = (blockSize / (1 << partitionOrder)) - order;
4841 partitionsRemaining = (1 << partitionOrder);
4842 for (;;) {
4843 drflac_uint8 riceParam = 0;
4844 if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
4845 if (!drflac__read_uint8(bs, 4, &riceParam)) {
4846 return DRFLAC_FALSE;
4847 }
4848 if (riceParam == 15) {
4849 riceParam = 0xFF;
4850 }
4851 } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4852 if (!drflac__read_uint8(bs, 5, &riceParam)) {
4853 return DRFLAC_FALSE;
4854 }
4855 if (riceParam == 31) {
4856 riceParam = 0xFF;
4857 }
4858 }
4859
4860 if (riceParam != 0xFF) {
4861 if (!drflac__decode_samples_with_residual__rice(bs, bitsPerSample, samplesInPartition, riceParam, order, shift, coefficients, pDecodedSamples)) {
4862 return DRFLAC_FALSE;
4863 }
4864 } else {
4865 drflac_uint8 unencodedBitsPerSample = 0;
4866 if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
4867 return DRFLAC_FALSE;
4868 }
4869
4870 if (!drflac__decode_samples_with_residual__unencoded(bs, bitsPerSample, samplesInPartition, unencodedBitsPerSample, order, shift, coefficients, pDecodedSamples)) {
4871 return DRFLAC_FALSE;
4872 }
4873 }
4874
4875 pDecodedSamples += samplesInPartition;
4876
4877 if (partitionsRemaining == 1) {
4878 break;
4879 }
4880
4881 partitionsRemaining -= 1;
4882
4883 if (partitionOrder != 0) {
4884 samplesInPartition = blockSize / (1 << partitionOrder);
4885 }
4886 }
4887
4888 return DRFLAC_TRUE;
4889}
4890
4891/*
4892Reads and seeks past the residual for the sub-frame the decoder is currently sitting on. This function should be called
4893when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be set to 0. The
4894<blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
4895*/
4896static drflac_bool32 drflac__read_and_seek_residual(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 order)
4897{
4898 drflac_uint8 residualMethod;
4899 drflac_uint8 partitionOrder;
4900 drflac_uint32 samplesInPartition;
4901 drflac_uint32 partitionsRemaining;
4902
4903 DRFLAC_ASSERT(bs != NULL);
4904 DRFLAC_ASSERT(blockSize != 0);
4905
4906 if (!drflac__read_uint8(bs, 2, &residualMethod)) {
4907 return DRFLAC_FALSE;
4908 }
4909
4910 if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4911 return DRFLAC_FALSE; /* Unknown or unsupported residual coding method. */
4912 }
4913
4914 if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
4915 return DRFLAC_FALSE;
4916 }
4917
4918 /*
4919 From the FLAC spec:
4920 The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
4921 */
4922 if (partitionOrder > 8) {
4923 return DRFLAC_FALSE;
4924 }
4925
4926 /* Validation check. */
4927 if ((blockSize / (1 << partitionOrder)) <= order) {
4928 return DRFLAC_FALSE;
4929 }
4930
4931 samplesInPartition = (blockSize / (1 << partitionOrder)) - order;
4932 partitionsRemaining = (1 << partitionOrder);
4933 for (;;)
4934 {
4935 drflac_uint8 riceParam = 0;
4936 if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
4937 if (!drflac__read_uint8(bs, 4, &riceParam)) {
4938 return DRFLAC_FALSE;
4939 }
4940 if (riceParam == 15) {
4941 riceParam = 0xFF;
4942 }
4943 } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4944 if (!drflac__read_uint8(bs, 5, &riceParam)) {
4945 return DRFLAC_FALSE;
4946 }
4947 if (riceParam == 31) {
4948 riceParam = 0xFF;
4949 }
4950 }
4951
4952 if (riceParam != 0xFF) {
4953 if (!drflac__read_and_seek_residual__rice(bs, samplesInPartition, riceParam)) {
4954 return DRFLAC_FALSE;
4955 }
4956 } else {
4957 drflac_uint8 unencodedBitsPerSample = 0;
4958 if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
4959 return DRFLAC_FALSE;
4960 }
4961
4962 if (!drflac__seek_bits(bs, unencodedBitsPerSample * samplesInPartition)) {
4963 return DRFLAC_FALSE;
4964 }
4965 }
4966
4967
4968 if (partitionsRemaining == 1) {
4969 break;
4970 }
4971
4972 partitionsRemaining -= 1;
4973 samplesInPartition = blockSize / (1 << partitionOrder);
4974 }
4975
4976 return DRFLAC_TRUE;
4977}
4978
4979
4980static drflac_bool32 drflac__decode_samples__constant(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
4981{
4982 drflac_uint32 i;
4983
4984 /* Only a single sample needs to be decoded here. */
4985 drflac_int32 sample;
4986 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
4987 return DRFLAC_FALSE;
4988 }
4989
4990 /*
4991 We don't really need to expand this, but it does simplify the process of reading samples. If this becomes a performance issue (unlikely)
4992 we'll want to look at a more efficient way.
4993 */
4994 for (i = 0; i < blockSize; ++i) {
4995 pDecodedSamples[i] = sample;
4996 }
4997
4998 return DRFLAC_TRUE;
4999}
5000
5001static drflac_bool32 drflac__decode_samples__verbatim(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
5002{
5003 drflac_uint32 i;
5004
5005 for (i = 0; i < blockSize; ++i) {
5006 drflac_int32 sample;
5007 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5008 return DRFLAC_FALSE;
5009 }
5010
5011 pDecodedSamples[i] = sample;
5012 }
5013
5014 return DRFLAC_TRUE;
5015}
5016
5017static drflac_bool32 drflac__decode_samples__fixed(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
5018{
5019 drflac_uint32 i;
5020
5021 static drflac_int32 lpcCoefficientsTable[5][4] = {
5022 {0, 0, 0, 0},
5023 {1, 0, 0, 0},
5024 {2, -1, 0, 0},
5025 {3, -3, 1, 0},
5026 {4, -6, 4, -1}
5027 };
5028
5029 /* Warm up samples and coefficients. */
5030 for (i = 0; i < lpcOrder; ++i) {
5031 drflac_int32 sample;
5032 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5033 return DRFLAC_FALSE;
5034 }
5035
5036 pDecodedSamples[i] = sample;
5037 }
5038
5039 if (!drflac__decode_samples_with_residual(bs, subframeBitsPerSample, blockSize, lpcOrder, 0, lpcCoefficientsTable[lpcOrder], pDecodedSamples)) {
5040 return DRFLAC_FALSE;
5041 }
5042
5043 return DRFLAC_TRUE;
5044}
5045
5046static drflac_bool32 drflac__decode_samples__lpc(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 bitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
5047{
5048 drflac_uint8 i;
5049 drflac_uint8 lpcPrecision;
5050 drflac_int8 lpcShift;
5051 drflac_int32 coefficients[32];
5052
5053 /* Warm up samples. */
5054 for (i = 0; i < lpcOrder; ++i) {
5055 drflac_int32 sample;
5056 if (!drflac__read_int32(bs, bitsPerSample, &sample)) {
5057 return DRFLAC_FALSE;
5058 }
5059
5060 pDecodedSamples[i] = sample;
5061 }
5062
5063 if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
5064 return DRFLAC_FALSE;
5065 }
5066 if (lpcPrecision == 15) {
5067 return DRFLAC_FALSE; /* Invalid. */
5068 }
5069 lpcPrecision += 1;
5070
5071 if (!drflac__read_int8(bs, 5, &lpcShift)) {
5072 return DRFLAC_FALSE;
5073 }
5074
5075 /*
5076 From the FLAC specification:
5077
5078 Quantized linear predictor coefficient shift needed in bits (NOTE: this number is signed two's-complement)
5079
5080 Emphasis on the "signed two's-complement". In practice there does not seem to be any encoders nor decoders supporting negative shifts. For now dr_flac is
5081 not going to support negative shifts as I don't have any reference files. However, when a reference file comes through I will consider adding support.
5082 */
5083 if (lpcShift < 0) {
5084 return DRFLAC_FALSE;
5085 }
5086
5087 DRFLAC_ZERO_MEMORY(coefficients, sizeof(coefficients));
5088 for (i = 0; i < lpcOrder; ++i) {
5089 if (!drflac__read_int32(bs, lpcPrecision, coefficients + i)) {
5090 return DRFLAC_FALSE;
5091 }
5092 }
5093
5094 if (!drflac__decode_samples_with_residual(bs, bitsPerSample, blockSize, lpcOrder, lpcShift, coefficients, pDecodedSamples)) {
5095 return DRFLAC_FALSE;
5096 }
5097
5098 return DRFLAC_TRUE;
5099}
5100
5101
5102static drflac_bool32 drflac__read_next_flac_frame_header(drflac_bs* bs, drflac_uint8 streaminfoBitsPerSample, drflac_frame_header* header)
5103{
5104 const drflac_uint32 sampleRateTable[12] = {0, 88200, 176400, 192000, 8000, 16000, 22050, 24000, 32000, 44100, 48000, 96000};
5105 const drflac_uint8 bitsPerSampleTable[8] = {0, 8, 12, (drflac_uint8)-1, 16, 20, 24, (drflac_uint8)-1}; /* -1 = reserved. */
5106
5107 DRFLAC_ASSERT(bs != NULL);
5108 DRFLAC_ASSERT(header != NULL);
5109
5110 /* Keep looping until we find a valid sync code. */
5111 for (;;) {
5112 drflac_uint8 crc8 = 0xCE; /* 0xCE = drflac_crc8(0, 0x3FFE, 14); */
5113 drflac_uint8 reserved = 0;
5114 drflac_uint8 blockingStrategy = 0;
5115 drflac_uint8 blockSize = 0;
5116 drflac_uint8 sampleRate = 0;
5117 drflac_uint8 channelAssignment = 0;
5118 drflac_uint8 bitsPerSample = 0;
5119 drflac_bool32 isVariableBlockSize;
5120
5121 if (!drflac__find_and_seek_to_next_sync_code(bs)) {
5122 return DRFLAC_FALSE;
5123 }
5124
5125 if (!drflac__read_uint8(bs, 1, &reserved)) {
5126 return DRFLAC_FALSE;
5127 }
5128 if (reserved == 1) {
5129 continue;
5130 }
5131 crc8 = drflac_crc8(crc8, reserved, 1);
5132
5133 if (!drflac__read_uint8(bs, 1, &blockingStrategy)) {
5134 return DRFLAC_FALSE;
5135 }
5136 crc8 = drflac_crc8(crc8, blockingStrategy, 1);
5137
5138 if (!drflac__read_uint8(bs, 4, &blockSize)) {
5139 return DRFLAC_FALSE;
5140 }
5141 if (blockSize == 0) {
5142 continue;
5143 }
5144 crc8 = drflac_crc8(crc8, blockSize, 4);
5145
5146 if (!drflac__read_uint8(bs, 4, &sampleRate)) {
5147 return DRFLAC_FALSE;
5148 }
5149 crc8 = drflac_crc8(crc8, sampleRate, 4);
5150
5151 if (!drflac__read_uint8(bs, 4, &channelAssignment)) {
5152 return DRFLAC_FALSE;
5153 }
5154 if (channelAssignment > 10) {
5155 continue;
5156 }
5157 crc8 = drflac_crc8(crc8, channelAssignment, 4);
5158
5159 if (!drflac__read_uint8(bs, 3, &bitsPerSample)) {
5160 return DRFLAC_FALSE;
5161 }
5162 if (bitsPerSample == 3 || bitsPerSample == 7) {
5163 continue;
5164 }
5165 crc8 = drflac_crc8(crc8, bitsPerSample, 3);
5166
5167
5168 if (!drflac__read_uint8(bs, 1, &reserved)) {
5169 return DRFLAC_FALSE;
5170 }
5171 if (reserved == 1) {
5172 continue;
5173 }
5174 crc8 = drflac_crc8(crc8, reserved, 1);
5175
5176
5177 isVariableBlockSize = blockingStrategy == 1;
5178 if (isVariableBlockSize) {
5179 drflac_uint64 pcmFrameNumber;
5180 drflac_result result = drflac__read_utf8_coded_number(bs, &pcmFrameNumber, &crc8);
5181 if (result != DRFLAC_SUCCESS) {
5182 if (result == DRFLAC_AT_END) {
5183 return DRFLAC_FALSE;
5184 } else {
5185 continue;
5186 }
5187 }
5188 header->flacFrameNumber = 0;
5189 header->pcmFrameNumber = pcmFrameNumber;
5190 } else {
5191 drflac_uint64 flacFrameNumber = 0;
5192 drflac_result result = drflac__read_utf8_coded_number(bs, &flacFrameNumber, &crc8);
5193 if (result != DRFLAC_SUCCESS) {
5194 if (result == DRFLAC_AT_END) {
5195 return DRFLAC_FALSE;
5196 } else {
5197 continue;
5198 }
5199 }
5200 header->flacFrameNumber = (drflac_uint32)flacFrameNumber; /* <-- Safe cast. */
5201 header->pcmFrameNumber = 0;
5202 }
5203
5204
5205 DRFLAC_ASSERT(blockSize > 0);
5206 if (blockSize == 1) {
5207 header->blockSizeInPCMFrames = 192;
5208 } else if (blockSize <= 5) {
5209 DRFLAC_ASSERT(blockSize >= 2);
5210 header->blockSizeInPCMFrames = 576 * (1 << (blockSize - 2));
5211 } else if (blockSize == 6) {
5212 if (!drflac__read_uint16(bs, 8, &header->blockSizeInPCMFrames)) {
5213 return DRFLAC_FALSE;
5214 }
5215 crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 8);
5216 header->blockSizeInPCMFrames += 1;
5217 } else if (blockSize == 7) {
5218 if (!drflac__read_uint16(bs, 16, &header->blockSizeInPCMFrames)) {
5219 return DRFLAC_FALSE;
5220 }
5221 crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 16);
5222 header->blockSizeInPCMFrames += 1;
5223 } else {
5224 DRFLAC_ASSERT(blockSize >= 8);
5225 header->blockSizeInPCMFrames = 256 * (1 << (blockSize - 8));
5226 }
5227
5228
5229 if (sampleRate <= 11) {
5230 header->sampleRate = sampleRateTable[sampleRate];
5231 } else if (sampleRate == 12) {
5232 if (!drflac__read_uint32(bs, 8, &header->sampleRate)) {
5233 return DRFLAC_FALSE;
5234 }
5235 crc8 = drflac_crc8(crc8, header->sampleRate, 8);
5236 header->sampleRate *= 1000;
5237 } else if (sampleRate == 13) {
5238 if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
5239 return DRFLAC_FALSE;
5240 }
5241 crc8 = drflac_crc8(crc8, header->sampleRate, 16);
5242 } else if (sampleRate == 14) {
5243 if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
5244 return DRFLAC_FALSE;
5245 }
5246 crc8 = drflac_crc8(crc8, header->sampleRate, 16);
5247 header->sampleRate *= 10;
5248 } else {
5249 continue; /* Invalid. Assume an invalid block. */
5250 }
5251
5252
5253 header->channelAssignment = channelAssignment;
5254
5255 header->bitsPerSample = bitsPerSampleTable[bitsPerSample];
5256 if (header->bitsPerSample == 0) {
5257 header->bitsPerSample = streaminfoBitsPerSample;
5258 }
5259
5260 if (!drflac__read_uint8(bs, 8, &header->crc8)) {
5261 return DRFLAC_FALSE;
5262 }
5263
5264#ifndef DR_FLAC_NO_CRC
5265 if (header->crc8 != crc8) {
5266 continue; /* CRC mismatch. Loop back to the top and find the next sync code. */
5267 }
5268#endif
5269 return DRFLAC_TRUE;
5270 }
5271}
5272
5273static drflac_bool32 drflac__read_subframe_header(drflac_bs* bs, drflac_subframe* pSubframe)
5274{
5275 drflac_uint8 header;
5276 int type;
5277
5278 if (!drflac__read_uint8(bs, 8, &header)) {
5279 return DRFLAC_FALSE;
5280 }
5281
5282 /* First bit should always be 0. */
5283 if ((header & 0x80) != 0) {
5284 return DRFLAC_FALSE;
5285 }
5286
5287 type = (header & 0x7E) >> 1;
5288 if (type == 0) {
5289 pSubframe->subframeType = DRFLAC_SUBFRAME_CONSTANT;
5290 } else if (type == 1) {
5291 pSubframe->subframeType = DRFLAC_SUBFRAME_VERBATIM;
5292 } else {
5293 if ((type & 0x20) != 0) {
5294 pSubframe->subframeType = DRFLAC_SUBFRAME_LPC;
5295 pSubframe->lpcOrder = (drflac_uint8)(type & 0x1F) + 1;
5296 } else if ((type & 0x08) != 0) {
5297 pSubframe->subframeType = DRFLAC_SUBFRAME_FIXED;
5298 pSubframe->lpcOrder = (drflac_uint8)(type & 0x07);
5299 if (pSubframe->lpcOrder > 4) {
5300 pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
5301 pSubframe->lpcOrder = 0;
5302 }
5303 } else {
5304 pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
5305 }
5306 }
5307
5308 if (pSubframe->subframeType == DRFLAC_SUBFRAME_RESERVED) {
5309 return DRFLAC_FALSE;
5310 }
5311
5312 /* Wasted bits per sample. */
5313 pSubframe->wastedBitsPerSample = 0;
5314 if ((header & 0x01) == 1) {
5315 unsigned int wastedBitsPerSample;
5316 if (!drflac__seek_past_next_set_bit(bs, &wastedBitsPerSample)) {
5317 return DRFLAC_FALSE;
5318 }
5319 pSubframe->wastedBitsPerSample = (drflac_uint8)wastedBitsPerSample + 1;
5320 }
5321
5322 return DRFLAC_TRUE;
5323}
5324
5325static drflac_bool32 drflac__decode_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex, drflac_int32* pDecodedSamplesOut)
5326{
5327 drflac_subframe* pSubframe;
5328 drflac_uint32 subframeBitsPerSample;
5329
5330 DRFLAC_ASSERT(bs != NULL);
5331 DRFLAC_ASSERT(frame != NULL);
5332
5333 pSubframe = frame->subframes + subframeIndex;
5334 if (!drflac__read_subframe_header(bs, pSubframe)) {
5335 return DRFLAC_FALSE;
5336 }
5337
5338 /* Side channels require an extra bit per sample. Took a while to figure that one out... */
5339 subframeBitsPerSample = frame->header.bitsPerSample;
5340 if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
5341 subframeBitsPerSample += 1;
5342 } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
5343 subframeBitsPerSample += 1;
5344 }
5345
5346 /* Need to handle wasted bits per sample. */
5347 if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
5348 return DRFLAC_FALSE;
5349 }
5350 subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
5351
5352 pSubframe->pSamplesS32 = pDecodedSamplesOut;
5353
5354 switch (pSubframe->subframeType)
5355 {
5356 case DRFLAC_SUBFRAME_CONSTANT:
5357 {
5358 drflac__decode_samples__constant(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
5359 } break;
5360
5361 case DRFLAC_SUBFRAME_VERBATIM:
5362 {
5363 drflac__decode_samples__verbatim(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
5364 } break;
5365
5366 case DRFLAC_SUBFRAME_FIXED:
5367 {
5368 drflac__decode_samples__fixed(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
5369 } break;
5370
5371 case DRFLAC_SUBFRAME_LPC:
5372 {
5373 drflac__decode_samples__lpc(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
5374 } break;
5375
5376 default: return DRFLAC_FALSE;
5377 }
5378
5379 return DRFLAC_TRUE;
5380}
5381
5382static drflac_bool32 drflac__seek_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex)
5383{
5384 drflac_subframe* pSubframe;
5385 drflac_uint32 subframeBitsPerSample;
5386
5387 DRFLAC_ASSERT(bs != NULL);
5388 DRFLAC_ASSERT(frame != NULL);
5389
5390 pSubframe = frame->subframes + subframeIndex;
5391 if (!drflac__read_subframe_header(bs, pSubframe)) {
5392 return DRFLAC_FALSE;
5393 }
5394
5395 /* Side channels require an extra bit per sample. Took a while to figure that one out... */
5396 subframeBitsPerSample = frame->header.bitsPerSample;
5397 if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
5398 subframeBitsPerSample += 1;
5399 } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
5400 subframeBitsPerSample += 1;
5401 }
5402
5403 /* Need to handle wasted bits per sample. */
5404 if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
5405 return DRFLAC_FALSE;
5406 }
5407 subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
5408
5409 pSubframe->pSamplesS32 = NULL;
5410
5411 switch (pSubframe->subframeType)
5412 {
5413 case DRFLAC_SUBFRAME_CONSTANT:
5414 {
5415 if (!drflac__seek_bits(bs, subframeBitsPerSample)) {
5416 return DRFLAC_FALSE;
5417 }
5418 } break;
5419
5420 case DRFLAC_SUBFRAME_VERBATIM:
5421 {
5422 unsigned int bitsToSeek = frame->header.blockSizeInPCMFrames * subframeBitsPerSample;
5423 if (!drflac__seek_bits(bs, bitsToSeek)) {
5424 return DRFLAC_FALSE;
5425 }
5426 } break;
5427
5428 case DRFLAC_SUBFRAME_FIXED:
5429 {
5430 unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
5431 if (!drflac__seek_bits(bs, bitsToSeek)) {
5432 return DRFLAC_FALSE;
5433 }
5434
5435 if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
5436 return DRFLAC_FALSE;
5437 }
5438 } break;
5439
5440 case DRFLAC_SUBFRAME_LPC:
5441 {
5442 drflac_uint8 lpcPrecision;
5443
5444 unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
5445 if (!drflac__seek_bits(bs, bitsToSeek)) {
5446 return DRFLAC_FALSE;
5447 }
5448
5449 if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
5450 return DRFLAC_FALSE;
5451 }
5452 if (lpcPrecision == 15) {
5453 return DRFLAC_FALSE; /* Invalid. */
5454 }
5455 lpcPrecision += 1;
5456
5457
5458 bitsToSeek = (pSubframe->lpcOrder * lpcPrecision) + 5; /* +5 for shift. */
5459 if (!drflac__seek_bits(bs, bitsToSeek)) {
5460 return DRFLAC_FALSE;
5461 }
5462
5463 if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
5464 return DRFLAC_FALSE;
5465 }
5466 } break;
5467
5468 default: return DRFLAC_FALSE;
5469 }
5470
5471 return DRFLAC_TRUE;
5472}
5473
5474
5475static DRFLAC_INLINE drflac_uint8 drflac__get_channel_count_from_channel_assignment(drflac_int8 channelAssignment)
5476{
5477 drflac_uint8 lookup[] = {1, 2, 3, 4, 5, 6, 7, 8, 2, 2, 2};
5478
5479 DRFLAC_ASSERT(channelAssignment <= 10);
5480 return lookup[channelAssignment];
5481}
5482
5483static drflac_result drflac__decode_flac_frame(drflac* pFlac)
5484{
5485 int channelCount;
5486 int i;
5487 drflac_uint8 paddingSizeInBits;
5488 drflac_uint16 desiredCRC16;
5489#ifndef DR_FLAC_NO_CRC
5490 drflac_uint16 actualCRC16;
5491#endif
5492
5493 /* This function should be called while the stream is sitting on the first byte after the frame header. */
5494 DRFLAC_ZERO_MEMORY(pFlac->currentFLACFrame.subframes, sizeof(pFlac->currentFLACFrame.subframes));
5495
5496 /* The frame block size must never be larger than the maximum block size defined by the FLAC stream. */
5497 if (pFlac->currentFLACFrame.header.blockSizeInPCMFrames > pFlac->maxBlockSizeInPCMFrames) {
5498 return DRFLAC_ERROR;
5499 }
5500
5501 /* The number of channels in the frame must match the channel count from the STREAMINFO block. */
5502 channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
5503 if (channelCount != (int)pFlac->channels) {
5504 return DRFLAC_ERROR;
5505 }
5506
5507 for (i = 0; i < channelCount; ++i) {
5508 if (!drflac__decode_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i, pFlac->pDecodedSamples + (pFlac->currentFLACFrame.header.blockSizeInPCMFrames * i))) {
5509 return DRFLAC_ERROR;
5510 }
5511 }
5512
5513 paddingSizeInBits = (drflac_uint8)(DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7);
5514 if (paddingSizeInBits > 0) {
5515 drflac_uint8 padding = 0;
5516 if (!drflac__read_uint8(&pFlac->bs, paddingSizeInBits, &padding)) {
5517 return DRFLAC_AT_END;
5518 }
5519 }
5520
5521#ifndef DR_FLAC_NO_CRC
5522 actualCRC16 = drflac__flush_crc16(&pFlac->bs);
5523#endif
5524 if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
5525 return DRFLAC_AT_END;
5526 }
5527
5528#ifndef DR_FLAC_NO_CRC
5529 if (actualCRC16 != desiredCRC16) {
5530 return DRFLAC_CRC_MISMATCH; /* CRC mismatch. */
5531 }
5532#endif
5533
5534 pFlac->currentFLACFrame.pcmFramesRemaining = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
5535
5536 return DRFLAC_SUCCESS;
5537}
5538
5539static drflac_result drflac__seek_flac_frame(drflac* pFlac)
5540{
5541 int channelCount;
5542 int i;
5543 drflac_uint16 desiredCRC16;
5544#ifndef DR_FLAC_NO_CRC
5545 drflac_uint16 actualCRC16;
5546#endif
5547
5548 channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
5549 for (i = 0; i < channelCount; ++i) {
5550 if (!drflac__seek_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i)) {
5551 return DRFLAC_ERROR;
5552 }
5553 }
5554
5555 /* Padding. */
5556 if (!drflac__seek_bits(&pFlac->bs, DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7)) {
5557 return DRFLAC_ERROR;
5558 }
5559
5560 /* CRC. */
5561#ifndef DR_FLAC_NO_CRC
5562 actualCRC16 = drflac__flush_crc16(&pFlac->bs);
5563#endif
5564 if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
5565 return DRFLAC_AT_END;
5566 }
5567
5568#ifndef DR_FLAC_NO_CRC
5569 if (actualCRC16 != desiredCRC16) {
5570 return DRFLAC_CRC_MISMATCH; /* CRC mismatch. */
5571 }
5572#endif
5573
5574 return DRFLAC_SUCCESS;
5575}
5576
5577static drflac_bool32 drflac__read_and_decode_next_flac_frame(drflac* pFlac)
5578{
5579 DRFLAC_ASSERT(pFlac != NULL);
5580
5581 for (;;) {
5582 drflac_result result;
5583
5584 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5585 return DRFLAC_FALSE;
5586 }
5587
5588 result = drflac__decode_flac_frame(pFlac);
5589 if (result != DRFLAC_SUCCESS) {
5590 if (result == DRFLAC_CRC_MISMATCH) {
5591 continue; /* CRC mismatch. Skip to the next frame. */
5592 } else {
5593 return DRFLAC_FALSE;
5594 }
5595 }
5596
5597 return DRFLAC_TRUE;
5598 }
5599}
5600
5601static void drflac__get_pcm_frame_range_of_current_flac_frame(drflac* pFlac, drflac_uint64* pFirstPCMFrame, drflac_uint64* pLastPCMFrame)
5602{
5603 drflac_uint64 firstPCMFrame;
5604 drflac_uint64 lastPCMFrame;
5605
5606 DRFLAC_ASSERT(pFlac != NULL);
5607
5608 firstPCMFrame = pFlac->currentFLACFrame.header.pcmFrameNumber;
5609 if (firstPCMFrame == 0) {
5610 firstPCMFrame = ((drflac_uint64)pFlac->currentFLACFrame.header.flacFrameNumber) * pFlac->maxBlockSizeInPCMFrames;
5611 }
5612
5613 lastPCMFrame = firstPCMFrame + pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
5614 if (lastPCMFrame > 0) {
5615 lastPCMFrame -= 1; /* Needs to be zero based. */
5616 }
5617
5618 if (pFirstPCMFrame) {
5619 *pFirstPCMFrame = firstPCMFrame;
5620 }
5621 if (pLastPCMFrame) {
5622 *pLastPCMFrame = lastPCMFrame;
5623 }
5624}
5625
5626static drflac_bool32 drflac__seek_to_first_frame(drflac* pFlac)
5627{
5628 drflac_bool32 result;
5629
5630 DRFLAC_ASSERT(pFlac != NULL);
5631
5632 result = drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes);
5633
5634 DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
5635 pFlac->currentPCMFrame = 0;
5636
5637 return result;
5638}
5639
5640static DRFLAC_INLINE drflac_result drflac__seek_to_next_flac_frame(drflac* pFlac)
5641{
5642 /* This function should only ever be called while the decoder is sitting on the first byte past the FRAME_HEADER section. */
5643 DRFLAC_ASSERT(pFlac != NULL);
5644 return drflac__seek_flac_frame(pFlac);
5645}
5646
5647
5648static drflac_uint64 drflac__seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 pcmFramesToSeek)
5649{
5650 drflac_uint64 pcmFramesRead = 0;
5651 while (pcmFramesToSeek > 0) {
5652 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
5653 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
5654 break; /* Couldn't read the next frame, so just break from the loop and return. */
5655 }
5656 } else {
5657 if (pFlac->currentFLACFrame.pcmFramesRemaining > pcmFramesToSeek) {
5658 pcmFramesRead += pcmFramesToSeek;
5659 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)pcmFramesToSeek; /* <-- Safe cast. Will always be < currentFrame.pcmFramesRemaining < 65536. */
5660 pcmFramesToSeek = 0;
5661 } else {
5662 pcmFramesRead += pFlac->currentFLACFrame.pcmFramesRemaining;
5663 pcmFramesToSeek -= pFlac->currentFLACFrame.pcmFramesRemaining;
5664 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
5665 }
5666 }
5667 }
5668
5669 pFlac->currentPCMFrame += pcmFramesRead;
5670 return pcmFramesRead;
5671}
5672
5673
5674static drflac_bool32 drflac__seek_to_pcm_frame__brute_force(drflac* pFlac, drflac_uint64 pcmFrameIndex)
5675{
5676 drflac_bool32 isMidFrame = DRFLAC_FALSE;
5677 drflac_uint64 runningPCMFrameCount;
5678
5679 DRFLAC_ASSERT(pFlac != NULL);
5680
5681 /* If we are seeking forward we start from the current position. Otherwise we need to start all the way from the start of the file. */
5682 if (pcmFrameIndex >= pFlac->currentPCMFrame) {
5683 /* Seeking forward. Need to seek from the current position. */
5684 runningPCMFrameCount = pFlac->currentPCMFrame;
5685
5686 /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
5687 if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
5688 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5689 return DRFLAC_FALSE;
5690 }
5691 } else {
5692 isMidFrame = DRFLAC_TRUE;
5693 }
5694 } else {
5695 /* Seeking backwards. Need to seek from the start of the file. */
5696 runningPCMFrameCount = 0;
5697
5698 /* Move back to the start. */
5699 if (!drflac__seek_to_first_frame(pFlac)) {
5700 return DRFLAC_FALSE;
5701 }
5702
5703 /* Decode the first frame in preparation for sample-exact seeking below. */
5704 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5705 return DRFLAC_FALSE;
5706 }
5707 }
5708
5709 /*
5710 We need to as quickly as possible find the frame that contains the target sample. To do this, we iterate over each frame and inspect its
5711 header. If based on the header we can determine that the frame contains the sample, we do a full decode of that frame.
5712 */
5713 for (;;) {
5714 drflac_uint64 pcmFrameCountInThisFLACFrame;
5715 drflac_uint64 firstPCMFrameInFLACFrame = 0;
5716 drflac_uint64 lastPCMFrameInFLACFrame = 0;
5717
5718 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
5719
5720 pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
5721 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
5722 /*
5723 The sample should be in this frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
5724 it never existed and keep iterating.
5725 */
5726 drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
5727
5728 if (!isMidFrame) {
5729 drflac_result result = drflac__decode_flac_frame(pFlac);
5730 if (result == DRFLAC_SUCCESS) {
5731 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
5732 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
5733 } else {
5734 if (result == DRFLAC_CRC_MISMATCH) {
5735 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
5736 } else {
5737 return DRFLAC_FALSE;
5738 }
5739 }
5740 } else {
5741 /* We started seeking mid-frame which means we need to skip the frame decoding part. */
5742 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
5743 }
5744 } else {
5745 /*
5746 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
5747 frame never existed and leave the running sample count untouched.
5748 */
5749 if (!isMidFrame) {
5750 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
5751 if (result == DRFLAC_SUCCESS) {
5752 runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
5753 } else {
5754 if (result == DRFLAC_CRC_MISMATCH) {
5755 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
5756 } else {
5757 return DRFLAC_FALSE;
5758 }
5759 }
5760 } else {
5761 /*
5762 We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
5763 drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
5764 */
5765 runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
5766 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
5767 isMidFrame = DRFLAC_FALSE;
5768 }
5769
5770 /* If we are seeking to the end of the file and we've just hit it, we're done. */
5771 if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
5772 return DRFLAC_TRUE;
5773 }
5774 }
5775
5776 next_iteration:
5777 /* Grab the next frame in preparation for the next iteration. */
5778 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5779 return DRFLAC_FALSE;
5780 }
5781 }
5782}
5783
5784
5785#if !defined(DR_FLAC_NO_CRC)
5786/*
5787We use an average compression ratio to determine our approximate start location. FLAC files are generally about 50%-70% the size of their
5788uncompressed counterparts so we'll use this as a basis. I'm going to split the middle and use a factor of 0.6 to determine the starting
5789location.
5790*/
5791#define DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO 0.6f
5792
5793static drflac_bool32 drflac__seek_to_approximate_flac_frame_to_byte(drflac* pFlac, drflac_uint64 targetByte, drflac_uint64 rangeLo, drflac_uint64 rangeHi, drflac_uint64* pLastSuccessfulSeekOffset)
5794{
5795 DRFLAC_ASSERT(pFlac != NULL);
5796 DRFLAC_ASSERT(pLastSuccessfulSeekOffset != NULL);
5797 DRFLAC_ASSERT(targetByte >= rangeLo);
5798 DRFLAC_ASSERT(targetByte <= rangeHi);
5799
5800 *pLastSuccessfulSeekOffset = pFlac->firstFLACFramePosInBytes;
5801
5802 for (;;) {
5803 /* After rangeLo == rangeHi == targetByte fails, we need to break out. */
5804 drflac_uint64 lastTargetByte = targetByte;
5805
5806 /* When seeking to a byte, failure probably means we've attempted to seek beyond the end of the stream. To counter this we just halve it each attempt. */
5807 if (!drflac__seek_to_byte(&pFlac->bs, targetByte)) {
5808 /* If we couldn't even seek to the first byte in the stream we have a problem. Just abandon the whole thing. */
5809 if (targetByte == 0) {
5810 drflac__seek_to_first_frame(pFlac); /* Try to recover. */
5811 return DRFLAC_FALSE;
5812 }
5813
5814 /* Halve the byte location and continue. */
5815 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5816 rangeHi = targetByte;
5817 } else {
5818 /* Getting here should mean that we have seeked to an appropriate byte. */
5819
5820 /* Clear the details of the FLAC frame so we don't misreport data. */
5821 DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
5822
5823 /*
5824 Now seek to the next FLAC frame. We need to decode the entire frame (not just the header) because it's possible for the header to incorrectly pass the
5825 CRC check and return bad data. We need to decode the entire frame to be more certain. Although this seems unlikely, this has happened to me in testing
5826 so it needs to stay this way for now.
5827 */
5828#if 1
5829 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
5830 /* Halve the byte location and continue. */
5831 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5832 rangeHi = targetByte;
5833 } else {
5834 break;
5835 }
5836#else
5837 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5838 /* Halve the byte location and continue. */
5839 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5840 rangeHi = targetByte;
5841 } else {
5842 break;
5843 }
5844#endif
5845 }
5846
5847 /* We already tried this byte and there are no more to try, break out. */
5848 if(targetByte == lastTargetByte) {
5849 return DRFLAC_FALSE;
5850 }
5851 }
5852
5853 /* The current PCM frame needs to be updated based on the frame we just seeked to. */
5854 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
5855
5856 DRFLAC_ASSERT(targetByte <= rangeHi);
5857
5858 *pLastSuccessfulSeekOffset = targetByte;
5859 return DRFLAC_TRUE;
5860}
5861
5862static drflac_bool32 drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 offset)
5863{
5864 /* This section of code would be used if we were only decoding the FLAC frame header when calling drflac__seek_to_approximate_flac_frame_to_byte(). */
5865#if 0
5866 if (drflac__decode_flac_frame(pFlac) != DRFLAC_SUCCESS) {
5867 /* We failed to decode this frame which may be due to it being corrupt. We'll just use the next valid FLAC frame. */
5868 if (drflac__read_and_decode_next_flac_frame(pFlac) == DRFLAC_FALSE) {
5869 return DRFLAC_FALSE;
5870 }
5871 }
5872#endif
5873
5874 return drflac__seek_forward_by_pcm_frames(pFlac, offset) == offset;
5875}
5876
5877
5878static drflac_bool32 drflac__seek_to_pcm_frame__binary_search_internal(drflac* pFlac, drflac_uint64 pcmFrameIndex, drflac_uint64 byteRangeLo, drflac_uint64 byteRangeHi)
5879{
5880 /* This assumes pFlac->currentPCMFrame is sitting on byteRangeLo upon entry. */
5881
5882 drflac_uint64 targetByte;
5883 drflac_uint64 pcmRangeLo = pFlac->totalPCMFrameCount;
5884 drflac_uint64 pcmRangeHi = 0;
5885 drflac_uint64 lastSuccessfulSeekOffset = (drflac_uint64)-1;
5886 drflac_uint64 closestSeekOffsetBeforeTargetPCMFrame = byteRangeLo;
5887 drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
5888
5889 targetByte = byteRangeLo + (drflac_uint64)(((drflac_int64)((pcmFrameIndex - pFlac->currentPCMFrame) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO);
5890 if (targetByte > byteRangeHi) {
5891 targetByte = byteRangeHi;
5892 }
5893
5894 for (;;) {
5895 if (drflac__seek_to_approximate_flac_frame_to_byte(pFlac, targetByte, byteRangeLo, byteRangeHi, &lastSuccessfulSeekOffset)) {
5896 /* We found a FLAC frame. We need to check if it contains the sample we're looking for. */
5897 drflac_uint64 newPCMRangeLo;
5898 drflac_uint64 newPCMRangeHi;
5899 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &newPCMRangeLo, &newPCMRangeHi);
5900
5901 /* If we selected the same frame, it means we should be pretty close. Just decode the rest. */
5902 if (pcmRangeLo == newPCMRangeLo) {
5903 if (!drflac__seek_to_approximate_flac_frame_to_byte(pFlac, closestSeekOffsetBeforeTargetPCMFrame, closestSeekOffsetBeforeTargetPCMFrame, byteRangeHi, &lastSuccessfulSeekOffset)) {
5904 break; /* Failed to seek to closest frame. */
5905 }
5906
5907 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
5908 return DRFLAC_TRUE;
5909 } else {
5910 break; /* Failed to seek forward. */
5911 }
5912 }
5913
5914 pcmRangeLo = newPCMRangeLo;
5915 pcmRangeHi = newPCMRangeHi;
5916
5917 if (pcmRangeLo <= pcmFrameIndex && pcmRangeHi >= pcmFrameIndex) {
5918 /* The target PCM frame is in this FLAC frame. */
5919 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame) ) {
5920 return DRFLAC_TRUE;
5921 } else {
5922 break; /* Failed to seek to FLAC frame. */
5923 }
5924 } else {
5925 const float approxCompressionRatio = (drflac_int64)(lastSuccessfulSeekOffset - pFlac->firstFLACFramePosInBytes) / ((drflac_int64)(pcmRangeLo * pFlac->channels * pFlac->bitsPerSample)/8.0f);
5926
5927 if (pcmRangeLo > pcmFrameIndex) {
5928 /* We seeked too far forward. We need to move our target byte backward and try again. */
5929 byteRangeHi = lastSuccessfulSeekOffset;
5930 if (byteRangeLo > byteRangeHi) {
5931 byteRangeLo = byteRangeHi;
5932 }
5933
5934 targetByte = byteRangeLo + ((byteRangeHi - byteRangeLo) / 2);
5935 if (targetByte < byteRangeLo) {
5936 targetByte = byteRangeLo;
5937 }
5938 } else /*if (pcmRangeHi < pcmFrameIndex)*/ {
5939 /* We didn't seek far enough. We need to move our target byte forward and try again. */
5940
5941 /* If we're close enough we can just seek forward. */
5942 if ((pcmFrameIndex - pcmRangeLo) < seekForwardThreshold) {
5943 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
5944 return DRFLAC_TRUE;
5945 } else {
5946 break; /* Failed to seek to FLAC frame. */
5947 }
5948 } else {
5949 byteRangeLo = lastSuccessfulSeekOffset;
5950 if (byteRangeHi < byteRangeLo) {
5951 byteRangeHi = byteRangeLo;
5952 }
5953
5954 targetByte = lastSuccessfulSeekOffset + (drflac_uint64)(((drflac_int64)((pcmFrameIndex-pcmRangeLo) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * approxCompressionRatio);
5955 if (targetByte > byteRangeHi) {
5956 targetByte = byteRangeHi;
5957 }
5958
5959 if (closestSeekOffsetBeforeTargetPCMFrame < lastSuccessfulSeekOffset) {
5960 closestSeekOffsetBeforeTargetPCMFrame = lastSuccessfulSeekOffset;
5961 }
5962 }
5963 }
5964 }
5965 } else {
5966 /* Getting here is really bad. We just recover as best we can, but moving to the first frame in the stream, and then abort. */
5967 break;
5968 }
5969 }
5970
5971 drflac__seek_to_first_frame(pFlac); /* <-- Try to recover. */
5972 return DRFLAC_FALSE;
5973}
5974
5975static drflac_bool32 drflac__seek_to_pcm_frame__binary_search(drflac* pFlac, drflac_uint64 pcmFrameIndex)
5976{
5977 drflac_uint64 byteRangeLo;
5978 drflac_uint64 byteRangeHi;
5979 drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
5980
5981 /* Our algorithm currently assumes the FLAC stream is currently sitting at the start. */
5982 if (drflac__seek_to_first_frame(pFlac) == DRFLAC_FALSE) {
5983 return DRFLAC_FALSE;
5984 }
5985
5986 /* If we're close enough to the start, just move to the start and seek forward. */
5987 if (pcmFrameIndex < seekForwardThreshold) {
5988 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFrameIndex) == pcmFrameIndex;
5989 }
5990
5991 /*
5992 Our starting byte range is the byte position of the first FLAC frame and the approximate end of the file as if it were completely uncompressed. This ensures
5993 the entire file is included, even though most of the time it'll exceed the end of the actual stream. This is OK as the frame searching logic will handle it.
5994 */
5995 byteRangeLo = pFlac->firstFLACFramePosInBytes;
5996 byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
5997
5998 return drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi);
5999}
6000#endif /* !DR_FLAC_NO_CRC */
6001
6002static drflac_bool32 drflac__seek_to_pcm_frame__seek_table(drflac* pFlac, drflac_uint64 pcmFrameIndex)
6003{
6004 drflac_uint32 iClosestSeekpoint = 0;
6005 drflac_bool32 isMidFrame = DRFLAC_FALSE;
6006 drflac_uint64 runningPCMFrameCount;
6007 drflac_uint32 iSeekpoint;
6008
6009
6010 DRFLAC_ASSERT(pFlac != NULL);
6011
6012 if (pFlac->pSeekpoints == NULL || pFlac->seekpointCount == 0) {
6013 return DRFLAC_FALSE;
6014 }
6015
6016 /* Do not use the seektable if pcmFramIndex is not coverd by it. */
6017 if (pFlac->pSeekpoints[0].firstPCMFrame > pcmFrameIndex) {
6018 return DRFLAC_FALSE;
6019 }
6020
6021 for (iSeekpoint = 0; iSeekpoint < pFlac->seekpointCount; ++iSeekpoint) {
6022 if (pFlac->pSeekpoints[iSeekpoint].firstPCMFrame >= pcmFrameIndex) {
6023 break;
6024 }
6025
6026 iClosestSeekpoint = iSeekpoint;
6027 }
6028
6029 /* There's been cases where the seek table contains only zeros. We need to do some basic validation on the closest seekpoint. */
6030 if (pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount == 0 || pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount > pFlac->maxBlockSizeInPCMFrames) {
6031 return DRFLAC_FALSE;
6032 }
6033 if (pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame > pFlac->totalPCMFrameCount && pFlac->totalPCMFrameCount > 0) {
6034 return DRFLAC_FALSE;
6035 }
6036
6037#if !defined(DR_FLAC_NO_CRC)
6038 /* At this point we should know the closest seek point. We can use a binary search for this. We need to know the total sample count for this. */
6039 if (pFlac->totalPCMFrameCount > 0) {
6040 drflac_uint64 byteRangeLo;
6041 drflac_uint64 byteRangeHi;
6042
6043 byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
6044 byteRangeLo = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset;
6045
6046 /*
6047 If our closest seek point is not the last one, we only need to search between it and the next one. The section below calculates an appropriate starting
6048 value for byteRangeHi which will clamp it appropriately.
6049
6050 Note that the next seekpoint must have an offset greater than the closest seekpoint because otherwise our binary search algorithm will break down. There
6051 have been cases where a seektable consists of seek points where every byte offset is set to 0 which causes problems. If this happens we need to abort.
6052 */
6053 if (iClosestSeekpoint < pFlac->seekpointCount-1) {
6054 drflac_uint32 iNextSeekpoint = iClosestSeekpoint + 1;
6055
6056 /* Basic validation on the seekpoints to ensure they're usable. */
6057 if (pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset >= pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset || pFlac->pSeekpoints[iNextSeekpoint].pcmFrameCount == 0) {
6058 return DRFLAC_FALSE; /* The next seekpoint doesn't look right. The seek table cannot be trusted from here. Abort. */
6059 }
6060
6061 if (pFlac->pSeekpoints[iNextSeekpoint].firstPCMFrame != (((drflac_uint64)0xFFFFFFFF << 32) | 0xFFFFFFFF)) { /* Make sure it's not a placeholder seekpoint. */
6062 byteRangeHi = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset - 1; /* byteRangeHi must be zero based. */
6063 }
6064 }
6065
6066 if (drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
6067 if (drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6068 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
6069
6070 if (drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi)) {
6071 return DRFLAC_TRUE;
6072 }
6073 }
6074 }
6075 }
6076#endif /* !DR_FLAC_NO_CRC */
6077
6078 /* Getting here means we need to use a slower algorithm because the binary search method failed or cannot be used. */
6079
6080 /*
6081 If we are seeking forward and the closest seekpoint is _before_ the current sample, we just seek forward from where we are. Otherwise we start seeking
6082 from the seekpoint's first sample.
6083 */
6084 if (pcmFrameIndex >= pFlac->currentPCMFrame && pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame <= pFlac->currentPCMFrame) {
6085 /* Optimized case. Just seek forward from where we are. */
6086 runningPCMFrameCount = pFlac->currentPCMFrame;
6087
6088 /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
6089 if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
6090 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6091 return DRFLAC_FALSE;
6092 }
6093 } else {
6094 isMidFrame = DRFLAC_TRUE;
6095 }
6096 } else {
6097 /* Slower case. Seek to the start of the seekpoint and then seek forward from there. */
6098 runningPCMFrameCount = pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame;
6099
6100 if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
6101 return DRFLAC_FALSE;
6102 }
6103
6104 /* Grab the frame the seekpoint is sitting on in preparation for the sample-exact seeking below. */
6105 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6106 return DRFLAC_FALSE;
6107 }
6108 }
6109
6110 for (;;) {
6111 drflac_uint64 pcmFrameCountInThisFLACFrame;
6112 drflac_uint64 firstPCMFrameInFLACFrame = 0;
6113 drflac_uint64 lastPCMFrameInFLACFrame = 0;
6114
6115 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
6116
6117 pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
6118 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
6119 /*
6120 The sample should be in this frame. We need to fully decode it, but if it's an invalid frame (a CRC mismatch) we need to pretend
6121 it never existed and keep iterating.
6122 */
6123 drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
6124
6125 if (!isMidFrame) {
6126 drflac_result result = drflac__decode_flac_frame(pFlac);
6127 if (result == DRFLAC_SUCCESS) {
6128 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
6129 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
6130 } else {
6131 if (result == DRFLAC_CRC_MISMATCH) {
6132 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
6133 } else {
6134 return DRFLAC_FALSE;
6135 }
6136 }
6137 } else {
6138 /* We started seeking mid-frame which means we need to skip the frame decoding part. */
6139 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
6140 }
6141 } else {
6142 /*
6143 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
6144 frame never existed and leave the running sample count untouched.
6145 */
6146 if (!isMidFrame) {
6147 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
6148 if (result == DRFLAC_SUCCESS) {
6149 runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
6150 } else {
6151 if (result == DRFLAC_CRC_MISMATCH) {
6152 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
6153 } else {
6154 return DRFLAC_FALSE;
6155 }
6156 }
6157 } else {
6158 /*
6159 We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
6160 drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
6161 */
6162 runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
6163 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
6164 isMidFrame = DRFLAC_FALSE;
6165 }
6166
6167 /* If we are seeking to the end of the file and we've just hit it, we're done. */
6168 if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
6169 return DRFLAC_TRUE;
6170 }
6171 }
6172
6173 next_iteration:
6174 /* Grab the next frame in preparation for the next iteration. */
6175 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6176 return DRFLAC_FALSE;
6177 }
6178 }
6179}
6180
6181
6182#ifndef DR_FLAC_NO_OGG
6183typedef struct
6184{
6185 drflac_uint8 capturePattern[4]; /* Should be "OggS" */
6186 drflac_uint8 structureVersion; /* Always 0. */
6187 drflac_uint8 headerType;
6188 drflac_uint64 granulePosition;
6189 drflac_uint32 serialNumber;
6190 drflac_uint32 sequenceNumber;
6191 drflac_uint32 checksum;
6192 drflac_uint8 segmentCount;
6193 drflac_uint8 segmentTable[255];
6194} drflac_ogg_page_header;
6195#endif
6196
6197typedef struct
6198{
6199 drflac_read_proc onRead;
6200 drflac_seek_proc onSeek;
6201 drflac_meta_proc onMeta;
6202 drflac_container container;
6203 void* pUserData;
6204 void* pUserDataMD;
6205 drflac_uint32 sampleRate;
6206 drflac_uint8 channels;
6207 drflac_uint8 bitsPerSample;
6208 drflac_uint64 totalPCMFrameCount;
6209 drflac_uint16 maxBlockSizeInPCMFrames;
6210 drflac_uint64 runningFilePos;
6211 drflac_bool32 hasStreamInfoBlock;
6212 drflac_bool32 hasMetadataBlocks;
6213 drflac_bs bs; /* <-- A bit streamer is required for loading data during initialization. */
6214 drflac_frame_header firstFrameHeader; /* <-- The header of the first frame that was read during relaxed initalization. Only set if there is no STREAMINFO block. */
6215
6216#ifndef DR_FLAC_NO_OGG
6217 drflac_uint32 oggSerial;
6218 drflac_uint64 oggFirstBytePos;
6219 drflac_ogg_page_header oggBosHeader;
6220#endif
6221} drflac_init_info;
6222
6223static DRFLAC_INLINE void drflac__decode_block_header(drflac_uint32 blockHeader, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
6224{
6225 blockHeader = drflac__be2host_32(blockHeader);
6226 *isLastBlock = (drflac_uint8)((blockHeader & 0x80000000UL) >> 31);
6227 *blockType = (drflac_uint8)((blockHeader & 0x7F000000UL) >> 24);
6228 *blockSize = (blockHeader & 0x00FFFFFFUL);
6229}
6230
6231static DRFLAC_INLINE drflac_bool32 drflac__read_and_decode_block_header(drflac_read_proc onRead, void* pUserData, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
6232{
6233 drflac_uint32 blockHeader;
6234
6235 *blockSize = 0;
6236 if (onRead(pUserData, &blockHeader, 4) != 4) {
6237 return DRFLAC_FALSE;
6238 }
6239
6240 drflac__decode_block_header(blockHeader, isLastBlock, blockType, blockSize);
6241 return DRFLAC_TRUE;
6242}
6243
6244static drflac_bool32 drflac__read_streaminfo(drflac_read_proc onRead, void* pUserData, drflac_streaminfo* pStreamInfo)
6245{
6246 drflac_uint32 blockSizes;
6247 drflac_uint64 frameSizes = 0;
6248 drflac_uint64 importantProps;
6249 drflac_uint8 md5[16];
6250
6251 /* min/max block size. */
6252 if (onRead(pUserData, &blockSizes, 4) != 4) {
6253 return DRFLAC_FALSE;
6254 }
6255
6256 /* min/max frame size. */
6257 if (onRead(pUserData, &frameSizes, 6) != 6) {
6258 return DRFLAC_FALSE;
6259 }
6260
6261 /* Sample rate, channels, bits per sample and total sample count. */
6262 if (onRead(pUserData, &importantProps, 8) != 8) {
6263 return DRFLAC_FALSE;
6264 }
6265
6266 /* MD5 */
6267 if (onRead(pUserData, md5, sizeof(md5)) != sizeof(md5)) {
6268 return DRFLAC_FALSE;
6269 }
6270
6271 blockSizes = drflac__be2host_32(blockSizes);
6272 frameSizes = drflac__be2host_64(frameSizes);
6273 importantProps = drflac__be2host_64(importantProps);
6274
6275 pStreamInfo->minBlockSizeInPCMFrames = (drflac_uint16)((blockSizes & 0xFFFF0000) >> 16);
6276 pStreamInfo->maxBlockSizeInPCMFrames = (drflac_uint16) (blockSizes & 0x0000FFFF);
6277 pStreamInfo->minFrameSizeInPCMFrames = (drflac_uint32)((frameSizes & (((drflac_uint64)0x00FFFFFF << 16) << 24)) >> 40);
6278 pStreamInfo->maxFrameSizeInPCMFrames = (drflac_uint32)((frameSizes & (((drflac_uint64)0x00FFFFFF << 16) << 0)) >> 16);
6279 pStreamInfo->sampleRate = (drflac_uint32)((importantProps & (((drflac_uint64)0x000FFFFF << 16) << 28)) >> 44);
6280 pStreamInfo->channels = (drflac_uint8 )((importantProps & (((drflac_uint64)0x0000000E << 16) << 24)) >> 41) + 1;
6281 pStreamInfo->bitsPerSample = (drflac_uint8 )((importantProps & (((drflac_uint64)0x0000001F << 16) << 20)) >> 36) + 1;
6282 pStreamInfo->totalPCMFrameCount = ((importantProps & ((((drflac_uint64)0x0000000F << 16) << 16) | 0xFFFFFFFF)));
6283 DRFLAC_COPY_MEMORY(pStreamInfo->md5, md5, sizeof(md5));
6284
6285 return DRFLAC_TRUE;
6286}
6287
6288
6289static void* drflac__malloc_default(size_t sz, void* pUserData)
6290{
6291 (void)pUserData;
6292 return DRFLAC_MALLOC(sz);
6293}
6294
6295static void* drflac__realloc_default(void* p, size_t sz, void* pUserData)
6296{
6297 (void)pUserData;
6298 return DRFLAC_REALLOC(p, sz);
6299}
6300
6301static void drflac__free_default(void* p, void* pUserData)
6302{
6303 (void)pUserData;
6304 DRFLAC_FREE(p);
6305}
6306
6307
6308static void* drflac__malloc_from_callbacks(size_t sz, const drflac_allocation_callbacks* pAllocationCallbacks)
6309{
6310 if (pAllocationCallbacks == NULL) {
6311 return NULL;
6312 }
6313
6314 if (pAllocationCallbacks->onMalloc != NULL) {
6315 return pAllocationCallbacks->onMalloc(sz, pAllocationCallbacks->pUserData);
6316 }
6317
6318 /* Try using realloc(). */
6319 if (pAllocationCallbacks->onRealloc != NULL) {
6320 return pAllocationCallbacks->onRealloc(NULL, sz, pAllocationCallbacks->pUserData);
6321 }
6322
6323 return NULL;
6324}
6325
6326static void* drflac__realloc_from_callbacks(void* p, size_t szNew, size_t szOld, const drflac_allocation_callbacks* pAllocationCallbacks)
6327{
6328 if (pAllocationCallbacks == NULL) {
6329 return NULL;
6330 }
6331
6332 if (pAllocationCallbacks->onRealloc != NULL) {
6333 return pAllocationCallbacks->onRealloc(p, szNew, pAllocationCallbacks->pUserData);
6334 }
6335
6336 /* Try emulating realloc() in terms of malloc()/free(). */
6337 if (pAllocationCallbacks->onMalloc != NULL && pAllocationCallbacks->onFree != NULL) {
6338 void* p2;
6339
6340 p2 = pAllocationCallbacks->onMalloc(szNew, pAllocationCallbacks->pUserData);
6341 if (p2 == NULL) {
6342 return NULL;
6343 }
6344
6345 if (p != NULL) {
6346 DRFLAC_COPY_MEMORY(p2, p, szOld);
6347 pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
6348 }
6349
6350 return p2;
6351 }
6352
6353 return NULL;
6354}
6355
6356static void drflac__free_from_callbacks(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
6357{
6358 if (p == NULL || pAllocationCallbacks == NULL) {
6359 return;
6360 }
6361
6362 if (pAllocationCallbacks->onFree != NULL) {
6363 pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
6364 }
6365}
6366
6367
6368static drflac_bool32 drflac__read_and_decode_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_uint64* pFirstFramePos, drflac_uint64* pSeektablePos, drflac_uint32* pSeektableSize, drflac_allocation_callbacks* pAllocationCallbacks)
6369{
6370 /*
6371 We want to keep track of the byte position in the stream of the seektable. At the time of calling this function we know that
6372 we'll be sitting on byte 42.
6373 */
6374 drflac_uint64 runningFilePos = 42;
6375 drflac_uint64 seektablePos = 0;
6376 drflac_uint32 seektableSize = 0;
6377
6378 for (;;) {
6379 drflac_metadata metadata;
6380 drflac_uint8 isLastBlock = 0;
6381 drflac_uint8 blockType;
6382 drflac_uint32 blockSize;
6383 if (drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize) == DRFLAC_FALSE) {
6384 return DRFLAC_FALSE;
6385 }
6386 runningFilePos += 4;
6387
6388 metadata.type = blockType;
6389 metadata.pRawData = NULL;
6390 metadata.rawDataSize = 0;
6391
6392 switch (blockType)
6393 {
6394 case DRFLAC_METADATA_BLOCK_TYPE_APPLICATION:
6395 {
6396 if (blockSize < 4) {
6397 return DRFLAC_FALSE;
6398 }
6399
6400 if (onMeta) {
6401 void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6402 if (pRawData == NULL) {
6403 return DRFLAC_FALSE;
6404 }
6405
6406 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6407 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6408 return DRFLAC_FALSE;
6409 }
6410
6411 metadata.pRawData = pRawData;
6412 metadata.rawDataSize = blockSize;
6413 metadata.data.application.id = drflac__be2host_32(*(drflac_uint32*)pRawData);
6414 metadata.data.application.pData = (const void*)((drflac_uint8*)pRawData + sizeof(drflac_uint32));
6415 metadata.data.application.dataSize = blockSize - sizeof(drflac_uint32);
6416 onMeta(pUserDataMD, &metadata);
6417
6418 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6419 }
6420 } break;
6421
6422 case DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE:
6423 {
6424 seektablePos = runningFilePos;
6425 seektableSize = blockSize;
6426
6427 if (onMeta) {
6428 drflac_uint32 iSeekpoint;
6429 void* pRawData;
6430
6431 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6432 if (pRawData == NULL) {
6433 return DRFLAC_FALSE;
6434 }
6435
6436 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6437 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6438 return DRFLAC_FALSE;
6439 }
6440
6441 metadata.pRawData = pRawData;
6442 metadata.rawDataSize = blockSize;
6443 metadata.data.seektable.seekpointCount = blockSize/sizeof(drflac_seekpoint);
6444 metadata.data.seektable.pSeekpoints = (const drflac_seekpoint*)pRawData;
6445
6446 /* Endian swap. */
6447 for (iSeekpoint = 0; iSeekpoint < metadata.data.seektable.seekpointCount; ++iSeekpoint) {
6448 drflac_seekpoint* pSeekpoint = (drflac_seekpoint*)pRawData + iSeekpoint;
6449 pSeekpoint->firstPCMFrame = drflac__be2host_64(pSeekpoint->firstPCMFrame);
6450 pSeekpoint->flacFrameOffset = drflac__be2host_64(pSeekpoint->flacFrameOffset);
6451 pSeekpoint->pcmFrameCount = drflac__be2host_16(pSeekpoint->pcmFrameCount);
6452 }
6453
6454 onMeta(pUserDataMD, &metadata);
6455
6456 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6457 }
6458 } break;
6459
6460 case DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT:
6461 {
6462 if (blockSize < 8) {
6463 return DRFLAC_FALSE;
6464 }
6465
6466 if (onMeta) {
6467 void* pRawData;
6468 const char* pRunningData;
6469 const char* pRunningDataEnd;
6470 drflac_uint32 i;
6471
6472 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6473 if (pRawData == NULL) {
6474 return DRFLAC_FALSE;
6475 }
6476
6477 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6478 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6479 return DRFLAC_FALSE;
6480 }
6481
6482 metadata.pRawData = pRawData;
6483 metadata.rawDataSize = blockSize;
6484
6485 pRunningData = (const char*)pRawData;
6486 pRunningDataEnd = (const char*)pRawData + blockSize;
6487
6488 metadata.data.vorbis_comment.vendorLength = drflac__le2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6489
6490 /* Need space for the rest of the block */
6491 if ((pRunningDataEnd - pRunningData) - 4 < (drflac_int64)metadata.data.vorbis_comment.vendorLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6492 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6493 return DRFLAC_FALSE;
6494 }
6495 metadata.data.vorbis_comment.vendor = pRunningData; pRunningData += metadata.data.vorbis_comment.vendorLength;
6496 metadata.data.vorbis_comment.commentCount = drflac__le2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6497
6498 /* Need space for 'commentCount' comments after the block, which at minimum is a drflac_uint32 per comment */
6499 if ((pRunningDataEnd - pRunningData) / sizeof(drflac_uint32) < metadata.data.vorbis_comment.commentCount) { /* <-- Note the order of operations to avoid overflow to a valid value */
6500 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6501 return DRFLAC_FALSE;
6502 }
6503 metadata.data.vorbis_comment.pComments = pRunningData;
6504
6505 /* Check that the comments section is valid before passing it to the callback */
6506 for (i = 0; i < metadata.data.vorbis_comment.commentCount; ++i) {
6507 drflac_uint32 commentLength;
6508
6509 if (pRunningDataEnd - pRunningData < 4) {
6510 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6511 return DRFLAC_FALSE;
6512 }
6513
6514 commentLength = drflac__le2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6515 if (pRunningDataEnd - pRunningData < (drflac_int64)commentLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6516 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6517 return DRFLAC_FALSE;
6518 }
6519 pRunningData += commentLength;
6520 }
6521
6522 onMeta(pUserDataMD, &metadata);
6523
6524 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6525 }
6526 } break;
6527
6528 case DRFLAC_METADATA_BLOCK_TYPE_CUESHEET:
6529 {
6530 if (blockSize < 396) {
6531 return DRFLAC_FALSE;
6532 }
6533
6534 if (onMeta) {
6535 void* pRawData;
6536 const char* pRunningData;
6537 const char* pRunningDataEnd;
6538 drflac_uint8 iTrack;
6539 drflac_uint8 iIndex;
6540
6541 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6542 if (pRawData == NULL) {
6543 return DRFLAC_FALSE;
6544 }
6545
6546 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6547 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6548 return DRFLAC_FALSE;
6549 }
6550
6551 metadata.pRawData = pRawData;
6552 metadata.rawDataSize = blockSize;
6553
6554 pRunningData = (const char*)pRawData;
6555 pRunningDataEnd = (const char*)pRawData + blockSize;
6556
6557 DRFLAC_COPY_MEMORY(metadata.data.cuesheet.catalog, pRunningData, 128); pRunningData += 128;
6558 metadata.data.cuesheet.leadInSampleCount = drflac__be2host_64(*(const drflac_uint64*)pRunningData); pRunningData += 8;
6559 metadata.data.cuesheet.isCD = (pRunningData[0] & 0x80) != 0; pRunningData += 259;
6560 metadata.data.cuesheet.trackCount = pRunningData[0]; pRunningData += 1;
6561 metadata.data.cuesheet.pTrackData = pRunningData;
6562
6563 /* Check that the cuesheet tracks are valid before passing it to the callback */
6564 for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) {
6565 drflac_uint8 indexCount;
6566 drflac_uint32 indexPointSize;
6567
6568 if (pRunningDataEnd - pRunningData < 36) {
6569 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6570 return DRFLAC_FALSE;
6571 }
6572
6573 /* Skip to the index point count */
6574 pRunningData += 35;
6575 indexCount = pRunningData[0]; pRunningData += 1;
6576 indexPointSize = indexCount * sizeof(drflac_cuesheet_track_index);
6577 if (pRunningDataEnd - pRunningData < (drflac_int64)indexPointSize) {
6578 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6579 return DRFLAC_FALSE;
6580 }
6581
6582 /* Endian swap. */
6583 for (iIndex = 0; iIndex < indexCount; ++iIndex) {
6584 drflac_cuesheet_track_index* pTrack = (drflac_cuesheet_track_index*)pRunningData;
6585 pRunningData += sizeof(drflac_cuesheet_track_index);
6586 pTrack->offset = drflac__be2host_64(pTrack->offset);
6587 }
6588 }
6589
6590 onMeta(pUserDataMD, &metadata);
6591
6592 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6593 }
6594 } break;
6595
6596 case DRFLAC_METADATA_BLOCK_TYPE_PICTURE:
6597 {
6598 if (blockSize < 32) {
6599 return DRFLAC_FALSE;
6600 }
6601
6602 if (onMeta) {
6603 void* pRawData;
6604 const char* pRunningData;
6605 const char* pRunningDataEnd;
6606
6607 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6608 if (pRawData == NULL) {
6609 return DRFLAC_FALSE;
6610 }
6611
6612 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6613 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6614 return DRFLAC_FALSE;
6615 }
6616
6617 metadata.pRawData = pRawData;
6618 metadata.rawDataSize = blockSize;
6619
6620 pRunningData = (const char*)pRawData;
6621 pRunningDataEnd = (const char*)pRawData + blockSize;
6622
6623 metadata.data.picture.type = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6624 metadata.data.picture.mimeLength = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6625
6626 /* Need space for the rest of the block */
6627 if ((pRunningDataEnd - pRunningData) - 24 < (drflac_int64)metadata.data.picture.mimeLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6628 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6629 return DRFLAC_FALSE;
6630 }
6631 metadata.data.picture.mime = pRunningData; pRunningData += metadata.data.picture.mimeLength;
6632 metadata.data.picture.descriptionLength = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6633
6634 /* Need space for the rest of the block */
6635 if ((pRunningDataEnd - pRunningData) - 20 < (drflac_int64)metadata.data.picture.descriptionLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6636 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6637 return DRFLAC_FALSE;
6638 }
6639 metadata.data.picture.description = pRunningData; pRunningData += metadata.data.picture.descriptionLength;
6640 metadata.data.picture.width = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6641 metadata.data.picture.height = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6642 metadata.data.picture.colorDepth = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6643 metadata.data.picture.indexColorCount = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6644 metadata.data.picture.pictureDataSize = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6645 metadata.data.picture.pPictureData = (const drflac_uint8*)pRunningData;
6646
6647 /* Need space for the picture after the block */
6648 if (pRunningDataEnd - pRunningData < (drflac_int64)metadata.data.picture.pictureDataSize) { /* <-- Note the order of operations to avoid overflow to a valid value */
6649 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6650 return DRFLAC_FALSE;
6651 }
6652
6653 onMeta(pUserDataMD, &metadata);
6654
6655 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6656 }
6657 } break;
6658
6659 case DRFLAC_METADATA_BLOCK_TYPE_PADDING:
6660 {
6661 if (onMeta) {
6662 metadata.data.padding.unused = 0;
6663
6664 /* Padding doesn't have anything meaningful in it, so just skip over it, but make sure the caller is aware of it by firing the callback. */
6665 if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
6666 isLastBlock = DRFLAC_TRUE; /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
6667 } else {
6668 onMeta(pUserDataMD, &metadata);
6669 }
6670 }
6671 } break;
6672
6673 case DRFLAC_METADATA_BLOCK_TYPE_INVALID:
6674 {
6675 /* Invalid chunk. Just skip over this one. */
6676 if (onMeta) {
6677 if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
6678 isLastBlock = DRFLAC_TRUE; /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
6679 }
6680 }
6681 } break;
6682
6683 default:
6684 {
6685 /*
6686 It's an unknown chunk, but not necessarily invalid. There's a chance more metadata blocks might be defined later on, so we
6687 can at the very least report the chunk to the application and let it look at the raw data.
6688 */
6689 if (onMeta) {
6690 void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6691 if (pRawData == NULL) {
6692 return DRFLAC_FALSE;
6693 }
6694
6695 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6696 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6697 return DRFLAC_FALSE;
6698 }
6699
6700 metadata.pRawData = pRawData;
6701 metadata.rawDataSize = blockSize;
6702 onMeta(pUserDataMD, &metadata);
6703
6704 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6705 }
6706 } break;
6707 }
6708
6709 /* If we're not handling metadata, just skip over the block. If we are, it will have been handled earlier in the switch statement above. */
6710 if (onMeta == NULL && blockSize > 0) {
6711 if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
6712 isLastBlock = DRFLAC_TRUE;
6713 }
6714 }
6715
6716 runningFilePos += blockSize;
6717 if (isLastBlock) {
6718 break;
6719 }
6720 }
6721
6722 *pSeektablePos = seektablePos;
6723 *pSeektableSize = seektableSize;
6724 *pFirstFramePos = runningFilePos;
6725
6726 return DRFLAC_TRUE;
6727}
6728
6729static drflac_bool32 drflac__init_private__native(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
6730{
6731 /* Pre Condition: The bit stream should be sitting just past the 4-byte id header. */
6732
6733 drflac_uint8 isLastBlock;
6734 drflac_uint8 blockType;
6735 drflac_uint32 blockSize;
6736
6737 (void)onSeek;
6738
6739 pInit->container = drflac_container_native;
6740
6741 /* The first metadata block should be the STREAMINFO block. */
6742 if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
6743 return DRFLAC_FALSE;
6744 }
6745
6746 if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
6747 if (!relaxed) {
6748 /* We're opening in strict mode and the first block is not the STREAMINFO block. Error. */
6749 return DRFLAC_FALSE;
6750 } else {
6751 /*
6752 Relaxed mode. To open from here we need to just find the first frame and set the sample rate, etc. to whatever is defined
6753 for that frame.
6754 */
6755 pInit->hasStreamInfoBlock = DRFLAC_FALSE;
6756 pInit->hasMetadataBlocks = DRFLAC_FALSE;
6757
6758 if (!drflac__read_next_flac_frame_header(&pInit->bs, 0, &pInit->firstFrameHeader)) {
6759 return DRFLAC_FALSE; /* Couldn't find a frame. */
6760 }
6761
6762 if (pInit->firstFrameHeader.bitsPerSample == 0) {
6763 return DRFLAC_FALSE; /* Failed to initialize because the first frame depends on the STREAMINFO block, which does not exist. */
6764 }
6765
6766 pInit->sampleRate = pInit->firstFrameHeader.sampleRate;
6767 pInit->channels = drflac__get_channel_count_from_channel_assignment(pInit->firstFrameHeader.channelAssignment);
6768 pInit->bitsPerSample = pInit->firstFrameHeader.bitsPerSample;
6769 pInit->maxBlockSizeInPCMFrames = 65535; /* <-- See notes here: https://xiph.org/flac/format.html#metadata_block_streaminfo */
6770 return DRFLAC_TRUE;
6771 }
6772 } else {
6773 drflac_streaminfo streaminfo;
6774 if (!drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
6775 return DRFLAC_FALSE;
6776 }
6777
6778 pInit->hasStreamInfoBlock = DRFLAC_TRUE;
6779 pInit->sampleRate = streaminfo.sampleRate;
6780 pInit->channels = streaminfo.channels;
6781 pInit->bitsPerSample = streaminfo.bitsPerSample;
6782 pInit->totalPCMFrameCount = streaminfo.totalPCMFrameCount;
6783 pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames; /* Don't care about the min block size - only the max (used for determining the size of the memory allocation). */
6784 pInit->hasMetadataBlocks = !isLastBlock;
6785
6786 if (onMeta) {
6787 drflac_metadata metadata;
6788 metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
6789 metadata.pRawData = NULL;
6790 metadata.rawDataSize = 0;
6791 metadata.data.streaminfo = streaminfo;
6792 onMeta(pUserDataMD, &metadata);
6793 }
6794
6795 return DRFLAC_TRUE;
6796 }
6797}
6798
6799#ifndef DR_FLAC_NO_OGG
6800#define DRFLAC_OGG_MAX_PAGE_SIZE 65307
6801#define DRFLAC_OGG_CAPTURE_PATTERN_CRC32 1605413199 /* CRC-32 of "OggS". */
6802
6803typedef enum
6804{
6805 drflac_ogg_recover_on_crc_mismatch,
6806 drflac_ogg_fail_on_crc_mismatch
6807} drflac_ogg_crc_mismatch_recovery;
6808
6809#ifndef DR_FLAC_NO_CRC
6810static drflac_uint32 drflac__crc32_table[] = {
6811 0x00000000L, 0x04C11DB7L, 0x09823B6EL, 0x0D4326D9L,
6812 0x130476DCL, 0x17C56B6BL, 0x1A864DB2L, 0x1E475005L,
6813 0x2608EDB8L, 0x22C9F00FL, 0x2F8AD6D6L, 0x2B4BCB61L,
6814 0x350C9B64L, 0x31CD86D3L, 0x3C8EA00AL, 0x384FBDBDL,
6815 0x4C11DB70L, 0x48D0C6C7L, 0x4593E01EL, 0x4152FDA9L,
6816 0x5F15ADACL, 0x5BD4B01BL, 0x569796C2L, 0x52568B75L,
6817 0x6A1936C8L, 0x6ED82B7FL, 0x639B0DA6L, 0x675A1011L,
6818 0x791D4014L, 0x7DDC5DA3L, 0x709F7B7AL, 0x745E66CDL,
6819 0x9823B6E0L, 0x9CE2AB57L, 0x91A18D8EL, 0x95609039L,
6820 0x8B27C03CL, 0x8FE6DD8BL, 0x82A5FB52L, 0x8664E6E5L,
6821 0xBE2B5B58L, 0xBAEA46EFL, 0xB7A96036L, 0xB3687D81L,
6822 0xAD2F2D84L, 0xA9EE3033L, 0xA4AD16EAL, 0xA06C0B5DL,
6823 0xD4326D90L, 0xD0F37027L, 0xDDB056FEL, 0xD9714B49L,
6824 0xC7361B4CL, 0xC3F706FBL, 0xCEB42022L, 0xCA753D95L,
6825 0xF23A8028L, 0xF6FB9D9FL, 0xFBB8BB46L, 0xFF79A6F1L,
6826 0xE13EF6F4L, 0xE5FFEB43L, 0xE8BCCD9AL, 0xEC7DD02DL,
6827 0x34867077L, 0x30476DC0L, 0x3D044B19L, 0x39C556AEL,
6828 0x278206ABL, 0x23431B1CL, 0x2E003DC5L, 0x2AC12072L,
6829 0x128E9DCFL, 0x164F8078L, 0x1B0CA6A1L, 0x1FCDBB16L,
6830 0x018AEB13L, 0x054BF6A4L, 0x0808D07DL, 0x0CC9CDCAL,
6831 0x7897AB07L, 0x7C56B6B0L, 0x71159069L, 0x75D48DDEL,
6832 0x6B93DDDBL, 0x6F52C06CL, 0x6211E6B5L, 0x66D0FB02L,
6833 0x5E9F46BFL, 0x5A5E5B08L, 0x571D7DD1L, 0x53DC6066L,
6834 0x4D9B3063L, 0x495A2DD4L, 0x44190B0DL, 0x40D816BAL,
6835 0xACA5C697L, 0xA864DB20L, 0xA527FDF9L, 0xA1E6E04EL,
6836 0xBFA1B04BL, 0xBB60ADFCL, 0xB6238B25L, 0xB2E29692L,
6837 0x8AAD2B2FL, 0x8E6C3698L, 0x832F1041L, 0x87EE0DF6L,
6838 0x99A95DF3L, 0x9D684044L, 0x902B669DL, 0x94EA7B2AL,
6839 0xE0B41DE7L, 0xE4750050L, 0xE9362689L, 0xEDF73B3EL,
6840 0xF3B06B3BL, 0xF771768CL, 0xFA325055L, 0xFEF34DE2L,
6841 0xC6BCF05FL, 0xC27DEDE8L, 0xCF3ECB31L, 0xCBFFD686L,
6842 0xD5B88683L, 0xD1799B34L, 0xDC3ABDEDL, 0xD8FBA05AL,
6843 0x690CE0EEL, 0x6DCDFD59L, 0x608EDB80L, 0x644FC637L,
6844 0x7A089632L, 0x7EC98B85L, 0x738AAD5CL, 0x774BB0EBL,
6845 0x4F040D56L, 0x4BC510E1L, 0x46863638L, 0x42472B8FL,
6846 0x5C007B8AL, 0x58C1663DL, 0x558240E4L, 0x51435D53L,
6847 0x251D3B9EL, 0x21DC2629L, 0x2C9F00F0L, 0x285E1D47L,
6848 0x36194D42L, 0x32D850F5L, 0x3F9B762CL, 0x3B5A6B9BL,
6849 0x0315D626L, 0x07D4CB91L, 0x0A97ED48L, 0x0E56F0FFL,
6850 0x1011A0FAL, 0x14D0BD4DL, 0x19939B94L, 0x1D528623L,
6851 0xF12F560EL, 0xF5EE4BB9L, 0xF8AD6D60L, 0xFC6C70D7L,
6852 0xE22B20D2L, 0xE6EA3D65L, 0xEBA91BBCL, 0xEF68060BL,
6853 0xD727BBB6L, 0xD3E6A601L, 0xDEA580D8L, 0xDA649D6FL,
6854 0xC423CD6AL, 0xC0E2D0DDL, 0xCDA1F604L, 0xC960EBB3L,
6855 0xBD3E8D7EL, 0xB9FF90C9L, 0xB4BCB610L, 0xB07DABA7L,
6856 0xAE3AFBA2L, 0xAAFBE615L, 0xA7B8C0CCL, 0xA379DD7BL,
6857 0x9B3660C6L, 0x9FF77D71L, 0x92B45BA8L, 0x9675461FL,
6858 0x8832161AL, 0x8CF30BADL, 0x81B02D74L, 0x857130C3L,
6859 0x5D8A9099L, 0x594B8D2EL, 0x5408ABF7L, 0x50C9B640L,
6860 0x4E8EE645L, 0x4A4FFBF2L, 0x470CDD2BL, 0x43CDC09CL,
6861 0x7B827D21L, 0x7F436096L, 0x7200464FL, 0x76C15BF8L,
6862 0x68860BFDL, 0x6C47164AL, 0x61043093L, 0x65C52D24L,
6863 0x119B4BE9L, 0x155A565EL, 0x18197087L, 0x1CD86D30L,
6864 0x029F3D35L, 0x065E2082L, 0x0B1D065BL, 0x0FDC1BECL,
6865 0x3793A651L, 0x3352BBE6L, 0x3E119D3FL, 0x3AD08088L,
6866 0x2497D08DL, 0x2056CD3AL, 0x2D15EBE3L, 0x29D4F654L,
6867 0xC5A92679L, 0xC1683BCEL, 0xCC2B1D17L, 0xC8EA00A0L,
6868 0xD6AD50A5L, 0xD26C4D12L, 0xDF2F6BCBL, 0xDBEE767CL,
6869 0xE3A1CBC1L, 0xE760D676L, 0xEA23F0AFL, 0xEEE2ED18L,
6870 0xF0A5BD1DL, 0xF464A0AAL, 0xF9278673L, 0xFDE69BC4L,
6871 0x89B8FD09L, 0x8D79E0BEL, 0x803AC667L, 0x84FBDBD0L,
6872 0x9ABC8BD5L, 0x9E7D9662L, 0x933EB0BBL, 0x97FFAD0CL,
6873 0xAFB010B1L, 0xAB710D06L, 0xA6322BDFL, 0xA2F33668L,
6874 0xBCB4666DL, 0xB8757BDAL, 0xB5365D03L, 0xB1F740B4L
6875};
6876#endif
6877
6878static DRFLAC_INLINE drflac_uint32 drflac_crc32_byte(drflac_uint32 crc32, drflac_uint8 data)
6879{
6880#ifndef DR_FLAC_NO_CRC
6881 return (crc32 << 8) ^ drflac__crc32_table[(drflac_uint8)((crc32 >> 24) & 0xFF) ^ data];
6882#else
6883 (void)data;
6884 return crc32;
6885#endif
6886}
6887
6888#if 0
6889static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint32(drflac_uint32 crc32, drflac_uint32 data)
6890{
6891 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 24) & 0xFF));
6892 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 16) & 0xFF));
6893 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 8) & 0xFF));
6894 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 0) & 0xFF));
6895 return crc32;
6896}
6897
6898static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint64(drflac_uint32 crc32, drflac_uint64 data)
6899{
6900 crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 32) & 0xFFFFFFFF));
6901 crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 0) & 0xFFFFFFFF));
6902 return crc32;
6903}
6904#endif
6905
6906static DRFLAC_INLINE drflac_uint32 drflac_crc32_buffer(drflac_uint32 crc32, drflac_uint8* pData, drflac_uint32 dataSize)
6907{
6908 /* This can be optimized. */
6909 drflac_uint32 i;
6910 for (i = 0; i < dataSize; ++i) {
6911 crc32 = drflac_crc32_byte(crc32, pData[i]);
6912 }
6913 return crc32;
6914}
6915
6916
6917static DRFLAC_INLINE drflac_bool32 drflac_ogg__is_capture_pattern(drflac_uint8 pattern[4])
6918{
6919 return pattern[0] == 'O' && pattern[1] == 'g' && pattern[2] == 'g' && pattern[3] == 'S';
6920}
6921
6922static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_header_size(drflac_ogg_page_header* pHeader)
6923{
6924 return 27 + pHeader->segmentCount;
6925}
6926
6927static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_body_size(drflac_ogg_page_header* pHeader)
6928{
6929 drflac_uint32 pageBodySize = 0;
6930 int i;
6931
6932 for (i = 0; i < pHeader->segmentCount; ++i) {
6933 pageBodySize += pHeader->segmentTable[i];
6934 }
6935
6936 return pageBodySize;
6937}
6938
6939static drflac_result drflac_ogg__read_page_header_after_capture_pattern(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
6940{
6941 drflac_uint8 data[23];
6942 drflac_uint32 i;
6943
6944 DRFLAC_ASSERT(*pCRC32 == DRFLAC_OGG_CAPTURE_PATTERN_CRC32);
6945
6946 if (onRead(pUserData, data, 23) != 23) {
6947 return DRFLAC_AT_END;
6948 }
6949 *pBytesRead += 23;
6950
6951 /*
6952 It's not actually used, but set the capture pattern to 'OggS' for completeness. Not doing this will cause static analysers to complain about
6953 us trying to access uninitialized data. We could alternatively just comment out this member of the drflac_ogg_page_header structure, but I
6954 like to have it map to the structure of the underlying data.
6955 */
6956 pHeader->capturePattern[0] = 'O';
6957 pHeader->capturePattern[1] = 'g';
6958 pHeader->capturePattern[2] = 'g';
6959 pHeader->capturePattern[3] = 'S';
6960
6961 pHeader->structureVersion = data[0];
6962 pHeader->headerType = data[1];
6963 DRFLAC_COPY_MEMORY(&pHeader->granulePosition, &data[ 2], 8);
6964 DRFLAC_COPY_MEMORY(&pHeader->serialNumber, &data[10], 4);
6965 DRFLAC_COPY_MEMORY(&pHeader->sequenceNumber, &data[14], 4);
6966 DRFLAC_COPY_MEMORY(&pHeader->checksum, &data[18], 4);
6967 pHeader->segmentCount = data[22];
6968
6969 /* Calculate the CRC. Note that for the calculation the checksum part of the page needs to be set to 0. */
6970 data[18] = 0;
6971 data[19] = 0;
6972 data[20] = 0;
6973 data[21] = 0;
6974
6975 for (i = 0; i < 23; ++i) {
6976 *pCRC32 = drflac_crc32_byte(*pCRC32, data[i]);
6977 }
6978
6979
6980 if (onRead(pUserData, pHeader->segmentTable, pHeader->segmentCount) != pHeader->segmentCount) {
6981 return DRFLAC_AT_END;
6982 }
6983 *pBytesRead += pHeader->segmentCount;
6984
6985 for (i = 0; i < pHeader->segmentCount; ++i) {
6986 *pCRC32 = drflac_crc32_byte(*pCRC32, pHeader->segmentTable[i]);
6987 }
6988
6989 return DRFLAC_SUCCESS;
6990}
6991
6992static drflac_result drflac_ogg__read_page_header(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
6993{
6994 drflac_uint8 id[4];
6995
6996 *pBytesRead = 0;
6997
6998 if (onRead(pUserData, id, 4) != 4) {
6999 return DRFLAC_AT_END;
7000 }
7001 *pBytesRead += 4;
7002
7003 /* We need to read byte-by-byte until we find the OggS capture pattern. */
7004 for (;;) {
7005 if (drflac_ogg__is_capture_pattern(id)) {
7006 drflac_result result;
7007
7008 *pCRC32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
7009
7010 result = drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, pHeader, pBytesRead, pCRC32);
7011 if (result == DRFLAC_SUCCESS) {
7012 return DRFLAC_SUCCESS;
7013 } else {
7014 if (result == DRFLAC_CRC_MISMATCH) {
7015 continue;
7016 } else {
7017 return result;
7018 }
7019 }
7020 } else {
7021 /* The first 4 bytes did not equal the capture pattern. Read the next byte and try again. */
7022 id[0] = id[1];
7023 id[1] = id[2];
7024 id[2] = id[3];
7025 if (onRead(pUserData, &id[3], 1) != 1) {
7026 return DRFLAC_AT_END;
7027 }
7028 *pBytesRead += 1;
7029 }
7030 }
7031}
7032
7033
7034/*
7035The main part of the Ogg encapsulation is the conversion from the physical Ogg bitstream to the native FLAC bitstream. It works
7036in three general stages: Ogg Physical Bitstream -> Ogg/FLAC Logical Bitstream -> FLAC Native Bitstream. dr_flac is designed
7037in such a way that the core sections assume everything is delivered in native format. Therefore, for each encapsulation type
7038dr_flac is supporting there needs to be a layer sitting on top of the onRead and onSeek callbacks that ensures the bits read from
7039the physical Ogg bitstream are converted and delivered in native FLAC format.
7040*/
7041typedef struct
7042{
7043 drflac_read_proc onRead; /* The original onRead callback from drflac_open() and family. */
7044 drflac_seek_proc onSeek; /* The original onSeek callback from drflac_open() and family. */
7045 void* pUserData; /* The user data passed on onRead and onSeek. This is the user data that was passed on drflac_open() and family. */
7046 drflac_uint64 currentBytePos; /* The position of the byte we are sitting on in the physical byte stream. Used for efficient seeking. */
7047 drflac_uint64 firstBytePos; /* The position of the first byte in the physical bitstream. Points to the start of the "OggS" identifier of the FLAC bos page. */
7048 drflac_uint32 serialNumber; /* The serial number of the FLAC audio pages. This is determined by the initial header page that was read during initialization. */
7049 drflac_ogg_page_header bosPageHeader; /* Used for seeking. */
7050 drflac_ogg_page_header currentPageHeader;
7051 drflac_uint32 bytesRemainingInPage;
7052 drflac_uint32 pageDataSize;
7053 drflac_uint8 pageData[DRFLAC_OGG_MAX_PAGE_SIZE];
7054} drflac_oggbs; /* oggbs = Ogg Bitstream */
7055
7056static size_t drflac_oggbs__read_physical(drflac_oggbs* oggbs, void* bufferOut, size_t bytesToRead)
7057{
7058 size_t bytesActuallyRead = oggbs->onRead(oggbs->pUserData, bufferOut, bytesToRead);
7059 oggbs->currentBytePos += bytesActuallyRead;
7060
7061 return bytesActuallyRead;
7062}
7063
7064static drflac_bool32 drflac_oggbs__seek_physical(drflac_oggbs* oggbs, drflac_uint64 offset, drflac_seek_origin origin)
7065{
7066 if (origin == drflac_seek_origin_start) {
7067 if (offset <= 0x7FFFFFFF) {
7068 if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_start)) {
7069 return DRFLAC_FALSE;
7070 }
7071 oggbs->currentBytePos = offset;
7072
7073 return DRFLAC_TRUE;
7074 } else {
7075 if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) {
7076 return DRFLAC_FALSE;
7077 }
7078 oggbs->currentBytePos = offset;
7079
7080 return drflac_oggbs__seek_physical(oggbs, offset - 0x7FFFFFFF, drflac_seek_origin_current);
7081 }
7082 } else {
7083 while (offset > 0x7FFFFFFF) {
7084 if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) {
7085 return DRFLAC_FALSE;
7086 }
7087 oggbs->currentBytePos += 0x7FFFFFFF;
7088 offset -= 0x7FFFFFFF;
7089 }
7090
7091 if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_current)) { /* <-- Safe cast thanks to the loop above. */
7092 return DRFLAC_FALSE;
7093 }
7094 oggbs->currentBytePos += offset;
7095
7096 return DRFLAC_TRUE;
7097 }
7098}
7099
7100static drflac_bool32 drflac_oggbs__goto_next_page(drflac_oggbs* oggbs, drflac_ogg_crc_mismatch_recovery recoveryMethod)
7101{
7102 drflac_ogg_page_header header;
7103 for (;;) {
7104 drflac_uint32 crc32 = 0;
7105 drflac_uint32 bytesRead;
7106 drflac_uint32 pageBodySize;
7107#ifndef DR_FLAC_NO_CRC
7108 drflac_uint32 actualCRC32;
7109#endif
7110
7111 if (drflac_ogg__read_page_header(oggbs->onRead, oggbs->pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7112 return DRFLAC_FALSE;
7113 }
7114 oggbs->currentBytePos += bytesRead;
7115
7116 pageBodySize = drflac_ogg__get_page_body_size(&header);
7117 if (pageBodySize > DRFLAC_OGG_MAX_PAGE_SIZE) {
7118 continue; /* Invalid page size. Assume it's corrupted and just move to the next page. */
7119 }
7120
7121 if (header.serialNumber != oggbs->serialNumber) {
7122 /* It's not a FLAC page. Skip it. */
7123 if (pageBodySize > 0 && !drflac_oggbs__seek_physical(oggbs, pageBodySize, drflac_seek_origin_current)) {
7124 return DRFLAC_FALSE;
7125 }
7126 continue;
7127 }
7128
7129
7130 /* We need to read the entire page and then do a CRC check on it. If there's a CRC mismatch we need to skip this page. */
7131 if (drflac_oggbs__read_physical(oggbs, oggbs->pageData, pageBodySize) != pageBodySize) {
7132 return DRFLAC_FALSE;
7133 }
7134 oggbs->pageDataSize = pageBodySize;
7135
7136#ifndef DR_FLAC_NO_CRC
7137 actualCRC32 = drflac_crc32_buffer(crc32, oggbs->pageData, oggbs->pageDataSize);
7138 if (actualCRC32 != header.checksum) {
7139 if (recoveryMethod == drflac_ogg_recover_on_crc_mismatch) {
7140 continue; /* CRC mismatch. Skip this page. */
7141 } else {
7142 /*
7143 Even though we are failing on a CRC mismatch, we still want our stream to be in a good state. Therefore we
7144 go to the next valid page to ensure we're in a good state, but return false to let the caller know that the
7145 seek did not fully complete.
7146 */
7147 drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch);
7148 return DRFLAC_FALSE;
7149 }
7150 }
7151#else
7152 (void)recoveryMethod; /* <-- Silence a warning. */
7153#endif
7154
7155 oggbs->currentPageHeader = header;
7156 oggbs->bytesRemainingInPage = pageBodySize;
7157 return DRFLAC_TRUE;
7158 }
7159}
7160
7161/* Function below is unused at the moment, but I might be re-adding it later. */
7162#if 0
7163static drflac_uint8 drflac_oggbs__get_current_segment_index(drflac_oggbs* oggbs, drflac_uint8* pBytesRemainingInSeg)
7164{
7165 drflac_uint32 bytesConsumedInPage = drflac_ogg__get_page_body_size(&oggbs->currentPageHeader) - oggbs->bytesRemainingInPage;
7166 drflac_uint8 iSeg = 0;
7167 drflac_uint32 iByte = 0;
7168 while (iByte < bytesConsumedInPage) {
7169 drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
7170 if (iByte + segmentSize > bytesConsumedInPage) {
7171 break;
7172 } else {
7173 iSeg += 1;
7174 iByte += segmentSize;
7175 }
7176 }
7177
7178 *pBytesRemainingInSeg = oggbs->currentPageHeader.segmentTable[iSeg] - (drflac_uint8)(bytesConsumedInPage - iByte);
7179 return iSeg;
7180}
7181
7182static drflac_bool32 drflac_oggbs__seek_to_next_packet(drflac_oggbs* oggbs)
7183{
7184 /* The current packet ends when we get to the segment with a lacing value of < 255 which is not at the end of a page. */
7185 for (;;) {
7186 drflac_bool32 atEndOfPage = DRFLAC_FALSE;
7187
7188 drflac_uint8 bytesRemainingInSeg;
7189 drflac_uint8 iFirstSeg = drflac_oggbs__get_current_segment_index(oggbs, &bytesRemainingInSeg);
7190
7191 drflac_uint32 bytesToEndOfPacketOrPage = bytesRemainingInSeg;
7192 for (drflac_uint8 iSeg = iFirstSeg; iSeg < oggbs->currentPageHeader.segmentCount; ++iSeg) {
7193 drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
7194 if (segmentSize < 255) {
7195 if (iSeg == oggbs->currentPageHeader.segmentCount-1) {
7196 atEndOfPage = DRFLAC_TRUE;
7197 }
7198
7199 break;
7200 }
7201
7202 bytesToEndOfPacketOrPage += segmentSize;
7203 }
7204
7205 /*
7206 At this point we will have found either the packet or the end of the page. If were at the end of the page we'll
7207 want to load the next page and keep searching for the end of the packet.
7208 */
7209 drflac_oggbs__seek_physical(oggbs, bytesToEndOfPacketOrPage, drflac_seek_origin_current);
7210 oggbs->bytesRemainingInPage -= bytesToEndOfPacketOrPage;
7211
7212 if (atEndOfPage) {
7213 /*
7214 We're potentially at the next packet, but we need to check the next page first to be sure because the packet may
7215 straddle pages.
7216 */
7217 if (!drflac_oggbs__goto_next_page(oggbs)) {
7218 return DRFLAC_FALSE;
7219 }
7220
7221 /* If it's a fresh packet it most likely means we're at the next packet. */
7222 if ((oggbs->currentPageHeader.headerType & 0x01) == 0) {
7223 return DRFLAC_TRUE;
7224 }
7225 } else {
7226 /* We're at the next packet. */
7227 return DRFLAC_TRUE;
7228 }
7229 }
7230}
7231
7232static drflac_bool32 drflac_oggbs__seek_to_next_frame(drflac_oggbs* oggbs)
7233{
7234 /* The bitstream should be sitting on the first byte just after the header of the frame. */
7235
7236 /* What we're actually doing here is seeking to the start of the next packet. */
7237 return drflac_oggbs__seek_to_next_packet(oggbs);
7238}
7239#endif
7240
7241static size_t drflac__on_read_ogg(void* pUserData, void* bufferOut, size_t bytesToRead)
7242{
7243 drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
7244 drflac_uint8* pRunningBufferOut = (drflac_uint8*)bufferOut;
7245 size_t bytesRead = 0;
7246
7247 DRFLAC_ASSERT(oggbs != NULL);
7248 DRFLAC_ASSERT(pRunningBufferOut != NULL);
7249
7250 /* Reading is done page-by-page. If we've run out of bytes in the page we need to move to the next one. */
7251 while (bytesRead < bytesToRead) {
7252 size_t bytesRemainingToRead = bytesToRead - bytesRead;
7253
7254 if (oggbs->bytesRemainingInPage >= bytesRemainingToRead) {
7255 DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), bytesRemainingToRead);
7256 bytesRead += bytesRemainingToRead;
7257 oggbs->bytesRemainingInPage -= (drflac_uint32)bytesRemainingToRead;
7258 break;
7259 }
7260
7261 /* If we get here it means some of the requested data is contained in the next pages. */
7262 if (oggbs->bytesRemainingInPage > 0) {
7263 DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), oggbs->bytesRemainingInPage);
7264 bytesRead += oggbs->bytesRemainingInPage;
7265 pRunningBufferOut += oggbs->bytesRemainingInPage;
7266 oggbs->bytesRemainingInPage = 0;
7267 }
7268
7269 DRFLAC_ASSERT(bytesRemainingToRead > 0);
7270 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7271 break; /* Failed to go to the next page. Might have simply hit the end of the stream. */
7272 }
7273 }
7274
7275 return bytesRead;
7276}
7277
7278static drflac_bool32 drflac__on_seek_ogg(void* pUserData, int offset, drflac_seek_origin origin)
7279{
7280 drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
7281 int bytesSeeked = 0;
7282
7283 DRFLAC_ASSERT(oggbs != NULL);
7284 DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
7285
7286 /* Seeking is always forward which makes things a lot simpler. */
7287 if (origin == drflac_seek_origin_start) {
7288 if (!drflac_oggbs__seek_physical(oggbs, (int)oggbs->firstBytePos, drflac_seek_origin_start)) {
7289 return DRFLAC_FALSE;
7290 }
7291
7292 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
7293 return DRFLAC_FALSE;
7294 }
7295
7296 return drflac__on_seek_ogg(pUserData, offset, drflac_seek_origin_current);
7297 }
7298
7299 DRFLAC_ASSERT(origin == drflac_seek_origin_current);
7300
7301 while (bytesSeeked < offset) {
7302 int bytesRemainingToSeek = offset - bytesSeeked;
7303 DRFLAC_ASSERT(bytesRemainingToSeek >= 0);
7304
7305 if (oggbs->bytesRemainingInPage >= (size_t)bytesRemainingToSeek) {
7306 bytesSeeked += bytesRemainingToSeek;
7307 (void)bytesSeeked; /* <-- Silence a dead store warning emitted by Clang Static Analyzer. */
7308 oggbs->bytesRemainingInPage -= bytesRemainingToSeek;
7309 break;
7310 }
7311
7312 /* If we get here it means some of the requested data is contained in the next pages. */
7313 if (oggbs->bytesRemainingInPage > 0) {
7314 bytesSeeked += (int)oggbs->bytesRemainingInPage;
7315 oggbs->bytesRemainingInPage = 0;
7316 }
7317
7318 DRFLAC_ASSERT(bytesRemainingToSeek > 0);
7319 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
7320 /* Failed to go to the next page. We either hit the end of the stream or had a CRC mismatch. */
7321 return DRFLAC_FALSE;
7322 }
7323 }
7324
7325 return DRFLAC_TRUE;
7326}
7327
7328
7329static drflac_bool32 drflac_ogg__seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
7330{
7331 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
7332 drflac_uint64 originalBytePos;
7333 drflac_uint64 runningGranulePosition;
7334 drflac_uint64 runningFrameBytePos;
7335 drflac_uint64 runningPCMFrameCount;
7336
7337 DRFLAC_ASSERT(oggbs != NULL);
7338
7339 originalBytePos = oggbs->currentBytePos; /* For recovery. Points to the OggS identifier. */
7340
7341 /* First seek to the first frame. */
7342 if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes)) {
7343 return DRFLAC_FALSE;
7344 }
7345 oggbs->bytesRemainingInPage = 0;
7346
7347 runningGranulePosition = 0;
7348 for (;;) {
7349 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7350 drflac_oggbs__seek_physical(oggbs, originalBytePos, drflac_seek_origin_start);
7351 return DRFLAC_FALSE; /* Never did find that sample... */
7352 }
7353
7354 runningFrameBytePos = oggbs->currentBytePos - drflac_ogg__get_page_header_size(&oggbs->currentPageHeader) - oggbs->pageDataSize;
7355 if (oggbs->currentPageHeader.granulePosition >= pcmFrameIndex) {
7356 break; /* The sample is somewhere in the previous page. */
7357 }
7358
7359 /*
7360 At this point we know the sample is not in the previous page. It could possibly be in this page. For simplicity we
7361 disregard any pages that do not begin a fresh packet.
7362 */
7363 if ((oggbs->currentPageHeader.headerType & 0x01) == 0) { /* <-- Is it a fresh page? */
7364 if (oggbs->currentPageHeader.segmentTable[0] >= 2) {
7365 drflac_uint8 firstBytesInPage[2];
7366 firstBytesInPage[0] = oggbs->pageData[0];
7367 firstBytesInPage[1] = oggbs->pageData[1];
7368
7369 if ((firstBytesInPage[0] == 0xFF) && (firstBytesInPage[1] & 0xFC) == 0xF8) { /* <-- Does the page begin with a frame's sync code? */
7370 runningGranulePosition = oggbs->currentPageHeader.granulePosition;
7371 }
7372
7373 continue;
7374 }
7375 }
7376 }
7377
7378 /*
7379 We found the page that that is closest to the sample, so now we need to find it. The first thing to do is seek to the
7380 start of that page. In the loop above we checked that it was a fresh page which means this page is also the start of
7381 a new frame. This property means that after we've seeked to the page we can immediately start looping over frames until
7382 we find the one containing the target sample.
7383 */
7384 if (!drflac_oggbs__seek_physical(oggbs, runningFrameBytePos, drflac_seek_origin_start)) {
7385 return DRFLAC_FALSE;
7386 }
7387 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7388 return DRFLAC_FALSE;
7389 }
7390
7391 /*
7392 At this point we'll be sitting on the first byte of the frame header of the first frame in the page. We just keep
7393 looping over these frames until we find the one containing the sample we're after.
7394 */
7395 runningPCMFrameCount = runningGranulePosition;
7396 for (;;) {
7397 /*
7398 There are two ways to find the sample and seek past irrelevant frames:
7399 1) Use the native FLAC decoder.
7400 2) Use Ogg's framing system.
7401
7402 Both of these options have their own pros and cons. Using the native FLAC decoder is slower because it needs to
7403 do a full decode of the frame. Using Ogg's framing system is faster, but more complicated and involves some code
7404 duplication for the decoding of frame headers.
7405
7406 Another thing to consider is that using the Ogg framing system will perform direct seeking of the physical Ogg
7407 bitstream. This is important to consider because it means we cannot read data from the drflac_bs object using the
7408 standard drflac__*() APIs because that will read in extra data for its own internal caching which in turn breaks
7409 the positioning of the read pointer of the physical Ogg bitstream. Therefore, anything that would normally be read
7410 using the native FLAC decoding APIs, such as drflac__read_next_flac_frame_header(), need to be re-implemented so as to
7411 avoid the use of the drflac_bs object.
7412
7413 Considering these issues, I have decided to use the slower native FLAC decoding method for the following reasons:
7414 1) Seeking is already partially accelerated using Ogg's paging system in the code block above.
7415 2) Seeking in an Ogg encapsulated FLAC stream is probably quite uncommon.
7416 3) Simplicity.
7417 */
7418 drflac_uint64 firstPCMFrameInFLACFrame = 0;
7419 drflac_uint64 lastPCMFrameInFLACFrame = 0;
7420 drflac_uint64 pcmFrameCountInThisFrame;
7421
7422 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
7423 return DRFLAC_FALSE;
7424 }
7425
7426 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
7427
7428 pcmFrameCountInThisFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
7429
7430 /* If we are seeking to the end of the file and we've just hit it, we're done. */
7431 if (pcmFrameIndex == pFlac->totalPCMFrameCount && (runningPCMFrameCount + pcmFrameCountInThisFrame) == pFlac->totalPCMFrameCount) {
7432 drflac_result result = drflac__decode_flac_frame(pFlac);
7433 if (result == DRFLAC_SUCCESS) {
7434 pFlac->currentPCMFrame = pcmFrameIndex;
7435 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
7436 return DRFLAC_TRUE;
7437 } else {
7438 return DRFLAC_FALSE;
7439 }
7440 }
7441
7442 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFrame)) {
7443 /*
7444 The sample should be in this FLAC frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
7445 it never existed and keep iterating.
7446 */
7447 drflac_result result = drflac__decode_flac_frame(pFlac);
7448 if (result == DRFLAC_SUCCESS) {
7449 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
7450 drflac_uint64 pcmFramesToDecode = (size_t)(pcmFrameIndex - runningPCMFrameCount); /* <-- Safe cast because the maximum number of samples in a frame is 65535. */
7451 if (pcmFramesToDecode == 0) {
7452 return DRFLAC_TRUE;
7453 }
7454
7455 pFlac->currentPCMFrame = runningPCMFrameCount;
7456
7457 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
7458 } else {
7459 if (result == DRFLAC_CRC_MISMATCH) {
7460 continue; /* CRC mismatch. Pretend this frame never existed. */
7461 } else {
7462 return DRFLAC_FALSE;
7463 }
7464 }
7465 } else {
7466 /*
7467 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
7468 frame never existed and leave the running sample count untouched.
7469 */
7470 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
7471 if (result == DRFLAC_SUCCESS) {
7472 runningPCMFrameCount += pcmFrameCountInThisFrame;
7473 } else {
7474 if (result == DRFLAC_CRC_MISMATCH) {
7475 continue; /* CRC mismatch. Pretend this frame never existed. */
7476 } else {
7477 return DRFLAC_FALSE;
7478 }
7479 }
7480 }
7481 }
7482}
7483
7484
7485
7486static drflac_bool32 drflac__init_private__ogg(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
7487{
7488 drflac_ogg_page_header header;
7489 drflac_uint32 crc32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
7490 drflac_uint32 bytesRead = 0;
7491
7492 /* Pre Condition: The bit stream should be sitting just past the 4-byte OggS capture pattern. */
7493 (void)relaxed;
7494
7495 pInit->container = drflac_container_ogg;
7496 pInit->oggFirstBytePos = 0;
7497
7498 /*
7499 We'll get here if the first 4 bytes of the stream were the OggS capture pattern, however it doesn't necessarily mean the
7500 stream includes FLAC encoded audio. To check for this we need to scan the beginning-of-stream page markers and check if
7501 any match the FLAC specification. Important to keep in mind that the stream may be multiplexed.
7502 */
7503 if (drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7504 return DRFLAC_FALSE;
7505 }
7506 pInit->runningFilePos += bytesRead;
7507
7508 for (;;) {
7509 int pageBodySize;
7510
7511 /* Break if we're past the beginning of stream page. */
7512 if ((header.headerType & 0x02) == 0) {
7513 return DRFLAC_FALSE;
7514 }
7515
7516 /* Check if it's a FLAC header. */
7517 pageBodySize = drflac_ogg__get_page_body_size(&header);
7518 if (pageBodySize == 51) { /* 51 = the lacing value of the FLAC header packet. */
7519 /* It could be a FLAC page... */
7520 drflac_uint32 bytesRemainingInPage = pageBodySize;
7521 drflac_uint8 packetType;
7522
7523 if (onRead(pUserData, &packetType, 1) != 1) {
7524 return DRFLAC_FALSE;
7525 }
7526
7527 bytesRemainingInPage -= 1;
7528 if (packetType == 0x7F) {
7529 /* Increasingly more likely to be a FLAC page... */
7530 drflac_uint8 sig[4];
7531 if (onRead(pUserData, sig, 4) != 4) {
7532 return DRFLAC_FALSE;
7533 }
7534
7535 bytesRemainingInPage -= 4;
7536 if (sig[0] == 'F' && sig[1] == 'L' && sig[2] == 'A' && sig[3] == 'C') {
7537 /* Almost certainly a FLAC page... */
7538 drflac_uint8 mappingVersion[2];
7539 if (onRead(pUserData, mappingVersion, 2) != 2) {
7540 return DRFLAC_FALSE;
7541 }
7542
7543 if (mappingVersion[0] != 1) {
7544 return DRFLAC_FALSE; /* Only supporting version 1.x of the Ogg mapping. */
7545 }
7546
7547 /*
7548 The next 2 bytes are the non-audio packets, not including this one. We don't care about this because we're going to
7549 be handling it in a generic way based on the serial number and packet types.
7550 */
7551 if (!onSeek(pUserData, 2, drflac_seek_origin_current)) {
7552 return DRFLAC_FALSE;
7553 }
7554
7555 /* Expecting the native FLAC signature "fLaC". */
7556 if (onRead(pUserData, sig, 4) != 4) {
7557 return DRFLAC_FALSE;
7558 }
7559
7560 if (sig[0] == 'f' && sig[1] == 'L' && sig[2] == 'a' && sig[3] == 'C') {
7561 /* The remaining data in the page should be the STREAMINFO block. */
7562 drflac_streaminfo streaminfo;
7563 drflac_uint8 isLastBlock;
7564 drflac_uint8 blockType;
7565 drflac_uint32 blockSize;
7566 if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
7567 return DRFLAC_FALSE;
7568 }
7569
7570 if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
7571 return DRFLAC_FALSE; /* Invalid block type. First block must be the STREAMINFO block. */
7572 }
7573
7574 if (drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
7575 /* Success! */
7576 pInit->hasStreamInfoBlock = DRFLAC_TRUE;
7577 pInit->sampleRate = streaminfo.sampleRate;
7578 pInit->channels = streaminfo.channels;
7579 pInit->bitsPerSample = streaminfo.bitsPerSample;
7580 pInit->totalPCMFrameCount = streaminfo.totalPCMFrameCount;
7581 pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames;
7582 pInit->hasMetadataBlocks = !isLastBlock;
7583
7584 if (onMeta) {
7585 drflac_metadata metadata;
7586 metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
7587 metadata.pRawData = NULL;
7588 metadata.rawDataSize = 0;
7589 metadata.data.streaminfo = streaminfo;
7590 onMeta(pUserDataMD, &metadata);
7591 }
7592
7593 pInit->runningFilePos += pageBodySize;
7594 pInit->oggFirstBytePos = pInit->runningFilePos - 79; /* Subtracting 79 will place us right on top of the "OggS" identifier of the FLAC bos page. */
7595 pInit->oggSerial = header.serialNumber;
7596 pInit->oggBosHeader = header;
7597 break;
7598 } else {
7599 /* Failed to read STREAMINFO block. Aww, so close... */
7600 return DRFLAC_FALSE;
7601 }
7602 } else {
7603 /* Invalid file. */
7604 return DRFLAC_FALSE;
7605 }
7606 } else {
7607 /* Not a FLAC header. Skip it. */
7608 if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
7609 return DRFLAC_FALSE;
7610 }
7611 }
7612 } else {
7613 /* Not a FLAC header. Seek past the entire page and move on to the next. */
7614 if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
7615 return DRFLAC_FALSE;
7616 }
7617 }
7618 } else {
7619 if (!onSeek(pUserData, pageBodySize, drflac_seek_origin_current)) {
7620 return DRFLAC_FALSE;
7621 }
7622 }
7623
7624 pInit->runningFilePos += pageBodySize;
7625
7626
7627 /* Read the header of the next page. */
7628 if (drflac_ogg__read_page_header(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7629 return DRFLAC_FALSE;
7630 }
7631 pInit->runningFilePos += bytesRead;
7632 }
7633
7634 /*
7635 If we get here it means we found a FLAC audio stream. We should be sitting on the first byte of the header of the next page. The next
7636 packets in the FLAC logical stream contain the metadata. The only thing left to do in the initialization phase for Ogg is to create the
7637 Ogg bistream object.
7638 */
7639 pInit->hasMetadataBlocks = DRFLAC_TRUE; /* <-- Always have at least VORBIS_COMMENT metadata block. */
7640 return DRFLAC_TRUE;
7641}
7642#endif
7643
7644static drflac_bool32 drflac__init_private(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD)
7645{
7646 drflac_bool32 relaxed;
7647 drflac_uint8 id[4];
7648
7649 if (pInit == NULL || onRead == NULL || onSeek == NULL) {
7650 return DRFLAC_FALSE;
7651 }
7652
7653 DRFLAC_ZERO_MEMORY(pInit, sizeof(*pInit));
7654 pInit->onRead = onRead;
7655 pInit->onSeek = onSeek;
7656 pInit->onMeta = onMeta;
7657 pInit->container = container;
7658 pInit->pUserData = pUserData;
7659 pInit->pUserDataMD = pUserDataMD;
7660
7661 pInit->bs.onRead = onRead;
7662 pInit->bs.onSeek = onSeek;
7663 pInit->bs.pUserData = pUserData;
7664 drflac__reset_cache(&pInit->bs);
7665
7666
7667 /* If the container is explicitly defined then we can try opening in relaxed mode. */
7668 relaxed = container != drflac_container_unknown;
7669
7670 /* Skip over any ID3 tags. */
7671 for (;;) {
7672 if (onRead(pUserData, id, 4) != 4) {
7673 return DRFLAC_FALSE; /* Ran out of data. */
7674 }
7675 pInit->runningFilePos += 4;
7676
7677 if (id[0] == 'I' && id[1] == 'D' && id[2] == '3') {
7678 drflac_uint8 header[6];
7679 drflac_uint8 flags;
7680 drflac_uint32 headerSize;
7681
7682 if (onRead(pUserData, header, 6) != 6) {
7683 return DRFLAC_FALSE; /* Ran out of data. */
7684 }
7685 pInit->runningFilePos += 6;
7686
7687 flags = header[1];
7688
7689 DRFLAC_COPY_MEMORY(&headerSize, header+2, 4);
7690 headerSize = drflac__unsynchsafe_32(drflac__be2host_32(headerSize));
7691 if (flags & 0x10) {
7692 headerSize += 10;
7693 }
7694
7695 if (!onSeek(pUserData, headerSize, drflac_seek_origin_current)) {
7696 return DRFLAC_FALSE; /* Failed to seek past the tag. */
7697 }
7698 pInit->runningFilePos += headerSize;
7699 } else {
7700 break;
7701 }
7702 }
7703
7704 if (id[0] == 'f' && id[1] == 'L' && id[2] == 'a' && id[3] == 'C') {
7705 return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7706 }
7707#ifndef DR_FLAC_NO_OGG
7708 if (id[0] == 'O' && id[1] == 'g' && id[2] == 'g' && id[3] == 'S') {
7709 return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7710 }
7711#endif
7712
7713 /* If we get here it means we likely don't have a header. Try opening in relaxed mode, if applicable. */
7714 if (relaxed) {
7715 if (container == drflac_container_native) {
7716 return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7717 }
7718#ifndef DR_FLAC_NO_OGG
7719 if (container == drflac_container_ogg) {
7720 return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7721 }
7722#endif
7723 }
7724
7725 /* Unsupported container. */
7726 return DRFLAC_FALSE;
7727}
7728
7729static void drflac__init_from_info(drflac* pFlac, const drflac_init_info* pInit)
7730{
7731 DRFLAC_ASSERT(pFlac != NULL);
7732 DRFLAC_ASSERT(pInit != NULL);
7733
7734 DRFLAC_ZERO_MEMORY(pFlac, sizeof(*pFlac));
7735 pFlac->bs = pInit->bs;
7736 pFlac->onMeta = pInit->onMeta;
7737 pFlac->pUserDataMD = pInit->pUserDataMD;
7738 pFlac->maxBlockSizeInPCMFrames = pInit->maxBlockSizeInPCMFrames;
7739 pFlac->sampleRate = pInit->sampleRate;
7740 pFlac->channels = (drflac_uint8)pInit->channels;
7741 pFlac->bitsPerSample = (drflac_uint8)pInit->bitsPerSample;
7742 pFlac->totalPCMFrameCount = pInit->totalPCMFrameCount;
7743 pFlac->container = pInit->container;
7744}
7745
7746
7747static drflac* drflac_open_with_metadata_private(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD, const drflac_allocation_callbacks* pAllocationCallbacks)
7748{
7749 drflac_init_info init;
7750 drflac_uint32 allocationSize;
7751 drflac_uint32 wholeSIMDVectorCountPerChannel;
7752 drflac_uint32 decodedSamplesAllocationSize;
7753#ifndef DR_FLAC_NO_OGG
7754 drflac_oggbs oggbs;
7755#endif
7756 drflac_uint64 firstFramePos;
7757 drflac_uint64 seektablePos;
7758 drflac_uint32 seektableSize;
7759 drflac_allocation_callbacks allocationCallbacks;
7760 drflac* pFlac;
7761
7762 /* CPU support first. */
7763 drflac__init_cpu_caps();
7764
7765 if (!drflac__init_private(&init, onRead, onSeek, onMeta, container, pUserData, pUserDataMD)) {
7766 return NULL;
7767 }
7768
7769 if (pAllocationCallbacks != NULL) {
7770 allocationCallbacks = *pAllocationCallbacks;
7771 if (allocationCallbacks.onFree == NULL || (allocationCallbacks.onMalloc == NULL && allocationCallbacks.onRealloc == NULL)) {
7772 return NULL; /* Invalid allocation callbacks. */
7773 }
7774 } else {
7775 allocationCallbacks.pUserData = NULL;
7776 allocationCallbacks.onMalloc = drflac__malloc_default;
7777 allocationCallbacks.onRealloc = drflac__realloc_default;
7778 allocationCallbacks.onFree = drflac__free_default;
7779 }
7780
7781
7782 /*
7783 The size of the allocation for the drflac object needs to be large enough to fit the following:
7784 1) The main members of the drflac structure
7785 2) A block of memory large enough to store the decoded samples of the largest frame in the stream
7786 3) If the container is Ogg, a drflac_oggbs object
7787
7788 The complicated part of the allocation is making sure there's enough room the decoded samples, taking into consideration
7789 the different SIMD instruction sets.
7790 */
7791 allocationSize = sizeof(drflac);
7792
7793 /*
7794 The allocation size for decoded frames depends on the number of 32-bit integers that fit inside the largest SIMD vector
7795 we are supporting.
7796 */
7797 if ((init.maxBlockSizeInPCMFrames % (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) == 0) {
7798 wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32)));
7799 } else {
7800 wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) + 1;
7801 }
7802
7803 decodedSamplesAllocationSize = wholeSIMDVectorCountPerChannel * DRFLAC_MAX_SIMD_VECTOR_SIZE * init.channels;
7804
7805 allocationSize += decodedSamplesAllocationSize;
7806 allocationSize += DRFLAC_MAX_SIMD_VECTOR_SIZE; /* Allocate extra bytes to ensure we have enough for alignment. */
7807
7808#ifndef DR_FLAC_NO_OGG
7809 /* There's additional data required for Ogg streams. */
7810 if (init.container == drflac_container_ogg) {
7811 allocationSize += sizeof(drflac_oggbs);
7812 }
7813
7814 DRFLAC_ZERO_MEMORY(&oggbs, sizeof(oggbs));
7815 if (init.container == drflac_container_ogg) {
7816 oggbs.onRead = onRead;
7817 oggbs.onSeek = onSeek;
7818 oggbs.pUserData = pUserData;
7819 oggbs.currentBytePos = init.oggFirstBytePos;
7820 oggbs.firstBytePos = init.oggFirstBytePos;
7821 oggbs.serialNumber = init.oggSerial;
7822 oggbs.bosPageHeader = init.oggBosHeader;
7823 oggbs.bytesRemainingInPage = 0;
7824 }
7825#endif
7826
7827 /*
7828 This part is a bit awkward. We need to load the seektable so that it can be referenced in-memory, but I want the drflac object to
7829 consist of only a single heap allocation. To this, the size of the seek table needs to be known, which we determine when reading
7830 and decoding the metadata.
7831 */
7832 firstFramePos = 42; /* <-- We know we are at byte 42 at this point. */
7833 seektablePos = 0;
7834 seektableSize = 0;
7835 if (init.hasMetadataBlocks) {
7836 drflac_read_proc onReadOverride = onRead;
7837 drflac_seek_proc onSeekOverride = onSeek;
7838 void* pUserDataOverride = pUserData;
7839
7840#ifndef DR_FLAC_NO_OGG
7841 if (init.container == drflac_container_ogg) {
7842 onReadOverride = drflac__on_read_ogg;
7843 onSeekOverride = drflac__on_seek_ogg;
7844 pUserDataOverride = (void*)&oggbs;
7845 }
7846#endif
7847
7848 if (!drflac__read_and_decode_metadata(onReadOverride, onSeekOverride, onMeta, pUserDataOverride, pUserDataMD, &firstFramePos, &seektablePos, &seektableSize, &allocationCallbacks)) {
7849 return NULL;
7850 }
7851
7852 allocationSize += seektableSize;
7853 }
7854
7855
7856 pFlac = (drflac*)drflac__malloc_from_callbacks(allocationSize, &allocationCallbacks);
7857 if (pFlac == NULL) {
7858 return NULL;
7859 }
7860
7861 drflac__init_from_info(pFlac, &init);
7862 pFlac->allocationCallbacks = allocationCallbacks;
7863 pFlac->pDecodedSamples = (drflac_int32*)drflac_align((size_t)pFlac->pExtraData, DRFLAC_MAX_SIMD_VECTOR_SIZE);
7864
7865#ifndef DR_FLAC_NO_OGG
7866 if (init.container == drflac_container_ogg) {
7867 drflac_oggbs* pInternalOggbs = (drflac_oggbs*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize + seektableSize);
7868 *pInternalOggbs = oggbs;
7869
7870 /* The Ogg bistream needs to be layered on top of the original bitstream. */
7871 pFlac->bs.onRead = drflac__on_read_ogg;
7872 pFlac->bs.onSeek = drflac__on_seek_ogg;
7873 pFlac->bs.pUserData = (void*)pInternalOggbs;
7874 pFlac->_oggbs = (void*)pInternalOggbs;
7875 }
7876#endif
7877
7878 pFlac->firstFLACFramePosInBytes = firstFramePos;
7879
7880 /* NOTE: Seektables are not currently compatible with Ogg encapsulation (Ogg has its own accelerated seeking system). I may change this later, so I'm leaving this here for now. */
7881#ifndef DR_FLAC_NO_OGG
7882 if (init.container == drflac_container_ogg)
7883 {
7884 pFlac->pSeekpoints = NULL;
7885 pFlac->seekpointCount = 0;
7886 }
7887 else
7888#endif
7889 {
7890 /* If we have a seektable we need to load it now, making sure we move back to where we were previously. */
7891 if (seektablePos != 0) {
7892 pFlac->seekpointCount = seektableSize / sizeof(*pFlac->pSeekpoints);
7893 pFlac->pSeekpoints = (drflac_seekpoint*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize);
7894
7895 DRFLAC_ASSERT(pFlac->bs.onSeek != NULL);
7896 DRFLAC_ASSERT(pFlac->bs.onRead != NULL);
7897
7898 /* Seek to the seektable, then just read directly into our seektable buffer. */
7899 if (pFlac->bs.onSeek(pFlac->bs.pUserData, (int)seektablePos, drflac_seek_origin_start)) {
7900 if (pFlac->bs.onRead(pFlac->bs.pUserData, pFlac->pSeekpoints, seektableSize) == seektableSize) {
7901 /* Endian swap. */
7902 drflac_uint32 iSeekpoint;
7903 for (iSeekpoint = 0; iSeekpoint < pFlac->seekpointCount; ++iSeekpoint) {
7904 pFlac->pSeekpoints[iSeekpoint].firstPCMFrame = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].firstPCMFrame);
7905 pFlac->pSeekpoints[iSeekpoint].flacFrameOffset = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].flacFrameOffset);
7906 pFlac->pSeekpoints[iSeekpoint].pcmFrameCount = drflac__be2host_16(pFlac->pSeekpoints[iSeekpoint].pcmFrameCount);
7907 }
7908 } else {
7909 /* Failed to read the seektable. Pretend we don't have one. */
7910 pFlac->pSeekpoints = NULL;
7911 pFlac->seekpointCount = 0;
7912 }
7913
7914 /* We need to seek back to where we were. If this fails it's a critical error. */
7915 if (!pFlac->bs.onSeek(pFlac->bs.pUserData, (int)pFlac->firstFLACFramePosInBytes, drflac_seek_origin_start)) {
7916 drflac__free_from_callbacks(pFlac, &allocationCallbacks);
7917 return NULL;
7918 }
7919 } else {
7920 /* Failed to seek to the seektable. Ominous sign, but for now we can just pretend we don't have one. */
7921 pFlac->pSeekpoints = NULL;
7922 pFlac->seekpointCount = 0;
7923 }
7924 }
7925 }
7926
7927
7928 /*
7929 If we get here, but don't have a STREAMINFO block, it means we've opened the stream in relaxed mode and need to decode
7930 the first frame.
7931 */
7932 if (!init.hasStreamInfoBlock) {
7933 pFlac->currentFLACFrame.header = init.firstFrameHeader;
7934 for (;;) {
7935 drflac_result result = drflac__decode_flac_frame(pFlac);
7936 if (result == DRFLAC_SUCCESS) {
7937 break;
7938 } else {
7939 if (result == DRFLAC_CRC_MISMATCH) {
7940 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
7941 drflac__free_from_callbacks(pFlac, &allocationCallbacks);
7942 return NULL;
7943 }
7944 continue;
7945 } else {
7946 drflac__free_from_callbacks(pFlac, &allocationCallbacks);
7947 return NULL;
7948 }
7949 }
7950 }
7951 }
7952
7953 return pFlac;
7954}
7955
7956
7957
7958#ifndef DR_FLAC_NO_STDIO
7959#include <stdio.h>
7960#include <wchar.h> /* For wcslen(), wcsrtombs() */
7961
7962/* drflac_result_from_errno() is only used for fopen() and wfopen() so putting it inside DR_WAV_NO_STDIO for now. If something else needs this later we can move it out. */
7963#include <errno.h>
7964static drflac_result drflac_result_from_errno(int e)
7965{
7966 switch (e)
7967 {
7968 case 0: return DRFLAC_SUCCESS;
7969 #ifdef EPERM
7970 case EPERM: return DRFLAC_INVALID_OPERATION;
7971 #endif
7972 #ifdef ENOENT
7973 case ENOENT: return DRFLAC_DOES_NOT_EXIST;
7974 #endif
7975 #ifdef ESRCH
7976 case ESRCH: return DRFLAC_DOES_NOT_EXIST;
7977 #endif
7978 #ifdef EINTR
7979 case EINTR: return DRFLAC_INTERRUPT;
7980 #endif
7981 #ifdef EIO
7982 case EIO: return DRFLAC_IO_ERROR;
7983 #endif
7984 #ifdef ENXIO
7985 case ENXIO: return DRFLAC_DOES_NOT_EXIST;
7986 #endif
7987 #ifdef E2BIG
7988 case E2BIG: return DRFLAC_INVALID_ARGS;
7989 #endif
7990 #ifdef ENOEXEC
7991 case ENOEXEC: return DRFLAC_INVALID_FILE;
7992 #endif
7993 #ifdef EBADF
7994 case EBADF: return DRFLAC_INVALID_FILE;
7995 #endif
7996 #ifdef ECHILD
7997 case ECHILD: return DRFLAC_ERROR;
7998 #endif
7999 #ifdef EAGAIN
8000 case EAGAIN: return DRFLAC_UNAVAILABLE;
8001 #endif
8002 #ifdef ENOMEM
8003 case ENOMEM: return DRFLAC_OUT_OF_MEMORY;
8004 #endif
8005 #ifdef EACCES
8006 case EACCES: return DRFLAC_ACCESS_DENIED;
8007 #endif
8008 #ifdef EFAULT
8009 case EFAULT: return DRFLAC_BAD_ADDRESS;
8010 #endif
8011 #ifdef ENOTBLK
8012 case ENOTBLK: return DRFLAC_ERROR;
8013 #endif
8014 #ifdef EBUSY
8015 case EBUSY: return DRFLAC_BUSY;
8016 #endif
8017 #ifdef EEXIST
8018 case EEXIST: return DRFLAC_ALREADY_EXISTS;
8019 #endif
8020 #ifdef EXDEV
8021 case EXDEV: return DRFLAC_ERROR;
8022 #endif
8023 #ifdef ENODEV
8024 case ENODEV: return DRFLAC_DOES_NOT_EXIST;
8025 #endif
8026 #ifdef ENOTDIR
8027 case ENOTDIR: return DRFLAC_NOT_DIRECTORY;
8028 #endif
8029 #ifdef EISDIR
8030 case EISDIR: return DRFLAC_IS_DIRECTORY;
8031 #endif
8032 #ifdef EINVAL
8033 case EINVAL: return DRFLAC_INVALID_ARGS;
8034 #endif
8035 #ifdef ENFILE
8036 case ENFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
8037 #endif
8038 #ifdef EMFILE
8039 case EMFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
8040 #endif
8041 #ifdef ENOTTY
8042 case ENOTTY: return DRFLAC_INVALID_OPERATION;
8043 #endif
8044 #ifdef ETXTBSY
8045 case ETXTBSY: return DRFLAC_BUSY;
8046 #endif
8047 #ifdef EFBIG
8048 case EFBIG: return DRFLAC_TOO_BIG;
8049 #endif
8050 #ifdef ENOSPC
8051 case ENOSPC: return DRFLAC_NO_SPACE;
8052 #endif
8053 #ifdef ESPIPE
8054 case ESPIPE: return DRFLAC_BAD_SEEK;
8055 #endif
8056 #ifdef EROFS
8057 case EROFS: return DRFLAC_ACCESS_DENIED;
8058 #endif
8059 #ifdef EMLINK
8060 case EMLINK: return DRFLAC_TOO_MANY_LINKS;
8061 #endif
8062 #ifdef EPIPE
8063 case EPIPE: return DRFLAC_BAD_PIPE;
8064 #endif
8065 #ifdef EDOM
8066 case EDOM: return DRFLAC_OUT_OF_RANGE;
8067 #endif
8068 #ifdef ERANGE
8069 case ERANGE: return DRFLAC_OUT_OF_RANGE;
8070 #endif
8071 #ifdef EDEADLK
8072 case EDEADLK: return DRFLAC_DEADLOCK;
8073 #endif
8074 #ifdef ENAMETOOLONG
8075 case ENAMETOOLONG: return DRFLAC_PATH_TOO_LONG;
8076 #endif
8077 #ifdef ENOLCK
8078 case ENOLCK: return DRFLAC_ERROR;
8079 #endif
8080 #ifdef ENOSYS
8081 case ENOSYS: return DRFLAC_NOT_IMPLEMENTED;
8082 #endif
8083 #ifdef ENOTEMPTY
8084 case ENOTEMPTY: return DRFLAC_DIRECTORY_NOT_EMPTY;
8085 #endif
8086 #ifdef ELOOP
8087 case ELOOP: return DRFLAC_TOO_MANY_LINKS;
8088 #endif
8089 #ifdef ENOMSG
8090 case ENOMSG: return DRFLAC_NO_MESSAGE;
8091 #endif
8092 #ifdef EIDRM
8093 case EIDRM: return DRFLAC_ERROR;
8094 #endif
8095 #ifdef ECHRNG
8096 case ECHRNG: return DRFLAC_ERROR;
8097 #endif
8098 #ifdef EL2NSYNC
8099 case EL2NSYNC: return DRFLAC_ERROR;
8100 #endif
8101 #ifdef EL3HLT
8102 case EL3HLT: return DRFLAC_ERROR;
8103 #endif
8104 #ifdef EL3RST
8105 case EL3RST: return DRFLAC_ERROR;
8106 #endif
8107 #ifdef ELNRNG
8108 case ELNRNG: return DRFLAC_OUT_OF_RANGE;
8109 #endif
8110 #ifdef EUNATCH
8111 case EUNATCH: return DRFLAC_ERROR;
8112 #endif
8113 #ifdef ENOCSI
8114 case ENOCSI: return DRFLAC_ERROR;
8115 #endif
8116 #ifdef EL2HLT
8117 case EL2HLT: return DRFLAC_ERROR;
8118 #endif
8119 #ifdef EBADE
8120 case EBADE: return DRFLAC_ERROR;
8121 #endif
8122 #ifdef EBADR
8123 case EBADR: return DRFLAC_ERROR;
8124 #endif
8125 #ifdef EXFULL
8126 case EXFULL: return DRFLAC_ERROR;
8127 #endif
8128 #ifdef ENOANO
8129 case ENOANO: return DRFLAC_ERROR;
8130 #endif
8131 #ifdef EBADRQC
8132 case EBADRQC: return DRFLAC_ERROR;
8133 #endif
8134 #ifdef EBADSLT
8135 case EBADSLT: return DRFLAC_ERROR;
8136 #endif
8137 #ifdef EBFONT
8138 case EBFONT: return DRFLAC_INVALID_FILE;
8139 #endif
8140 #ifdef ENOSTR
8141 case ENOSTR: return DRFLAC_ERROR;
8142 #endif
8143 #ifdef ENODATA
8144 case ENODATA: return DRFLAC_NO_DATA_AVAILABLE;
8145 #endif
8146 #ifdef ETIME
8147 case ETIME: return DRFLAC_TIMEOUT;
8148 #endif
8149 #ifdef ENOSR
8150 case ENOSR: return DRFLAC_NO_DATA_AVAILABLE;
8151 #endif
8152 #ifdef ENONET
8153 case ENONET: return DRFLAC_NO_NETWORK;
8154 #endif
8155 #ifdef ENOPKG
8156 case ENOPKG: return DRFLAC_ERROR;
8157 #endif
8158 #ifdef EREMOTE
8159 case EREMOTE: return DRFLAC_ERROR;
8160 #endif
8161 #ifdef ENOLINK
8162 case ENOLINK: return DRFLAC_ERROR;
8163 #endif
8164 #ifdef EADV
8165 case EADV: return DRFLAC_ERROR;
8166 #endif
8167 #ifdef ESRMNT
8168 case ESRMNT: return DRFLAC_ERROR;
8169 #endif
8170 #ifdef ECOMM
8171 case ECOMM: return DRFLAC_ERROR;
8172 #endif
8173 #ifdef EPROTO
8174 case EPROTO: return DRFLAC_ERROR;
8175 #endif
8176 #ifdef EMULTIHOP
8177 case EMULTIHOP: return DRFLAC_ERROR;
8178 #endif
8179 #ifdef EDOTDOT
8180 case EDOTDOT: return DRFLAC_ERROR;
8181 #endif
8182 #ifdef EBADMSG
8183 case EBADMSG: return DRFLAC_BAD_MESSAGE;
8184 #endif
8185 #ifdef EOVERFLOW
8186 case EOVERFLOW: return DRFLAC_TOO_BIG;
8187 #endif
8188 #ifdef ENOTUNIQ
8189 case ENOTUNIQ: return DRFLAC_NOT_UNIQUE;
8190 #endif
8191 #ifdef EBADFD
8192 case EBADFD: return DRFLAC_ERROR;
8193 #endif
8194 #ifdef EREMCHG
8195 case EREMCHG: return DRFLAC_ERROR;
8196 #endif
8197 #ifdef ELIBACC
8198 case ELIBACC: return DRFLAC_ACCESS_DENIED;
8199 #endif
8200 #ifdef ELIBBAD
8201 case ELIBBAD: return DRFLAC_INVALID_FILE;
8202 #endif
8203 #ifdef ELIBSCN
8204 case ELIBSCN: return DRFLAC_INVALID_FILE;
8205 #endif
8206 #ifdef ELIBMAX
8207 case ELIBMAX: return DRFLAC_ERROR;
8208 #endif
8209 #ifdef ELIBEXEC
8210 case ELIBEXEC: return DRFLAC_ERROR;
8211 #endif
8212 #ifdef EILSEQ
8213 case EILSEQ: return DRFLAC_INVALID_DATA;
8214 #endif
8215 #ifdef ERESTART
8216 case ERESTART: return DRFLAC_ERROR;
8217 #endif
8218 #ifdef ESTRPIPE
8219 case ESTRPIPE: return DRFLAC_ERROR;
8220 #endif
8221 #ifdef EUSERS
8222 case EUSERS: return DRFLAC_ERROR;
8223 #endif
8224 #ifdef ENOTSOCK
8225 case ENOTSOCK: return DRFLAC_NOT_SOCKET;
8226 #endif
8227 #ifdef EDESTADDRREQ
8228 case EDESTADDRREQ: return DRFLAC_NO_ADDRESS;
8229 #endif
8230 #ifdef EMSGSIZE
8231 case EMSGSIZE: return DRFLAC_TOO_BIG;
8232 #endif
8233 #ifdef EPROTOTYPE
8234 case EPROTOTYPE: return DRFLAC_BAD_PROTOCOL;
8235 #endif
8236 #ifdef ENOPROTOOPT
8237 case ENOPROTOOPT: return DRFLAC_PROTOCOL_UNAVAILABLE;
8238 #endif
8239 #ifdef EPROTONOSUPPORT
8240 case EPROTONOSUPPORT: return DRFLAC_PROTOCOL_NOT_SUPPORTED;
8241 #endif
8242 #ifdef ESOCKTNOSUPPORT
8243 case ESOCKTNOSUPPORT: return DRFLAC_SOCKET_NOT_SUPPORTED;
8244 #endif
8245 #ifdef EOPNOTSUPP
8246 case EOPNOTSUPP: return DRFLAC_INVALID_OPERATION;
8247 #endif
8248 #ifdef EPFNOSUPPORT
8249 case EPFNOSUPPORT: return DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED;
8250 #endif
8251 #ifdef EAFNOSUPPORT
8252 case EAFNOSUPPORT: return DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED;
8253 #endif
8254 #ifdef EADDRINUSE
8255 case EADDRINUSE: return DRFLAC_ALREADY_IN_USE;
8256 #endif
8257 #ifdef EADDRNOTAVAIL
8258 case EADDRNOTAVAIL: return DRFLAC_ERROR;
8259 #endif
8260 #ifdef ENETDOWN
8261 case ENETDOWN: return DRFLAC_NO_NETWORK;
8262 #endif
8263 #ifdef ENETUNREACH
8264 case ENETUNREACH: return DRFLAC_NO_NETWORK;
8265 #endif
8266 #ifdef ENETRESET
8267 case ENETRESET: return DRFLAC_NO_NETWORK;
8268 #endif
8269 #ifdef ECONNABORTED
8270 case ECONNABORTED: return DRFLAC_NO_NETWORK;
8271 #endif
8272 #ifdef ECONNRESET
8273 case ECONNRESET: return DRFLAC_CONNECTION_RESET;
8274 #endif
8275 #ifdef ENOBUFS
8276 case ENOBUFS: return DRFLAC_NO_SPACE;
8277 #endif
8278 #ifdef EISCONN
8279 case EISCONN: return DRFLAC_ALREADY_CONNECTED;
8280 #endif
8281 #ifdef ENOTCONN
8282 case ENOTCONN: return DRFLAC_NOT_CONNECTED;
8283 #endif
8284 #ifdef ESHUTDOWN
8285 case ESHUTDOWN: return DRFLAC_ERROR;
8286 #endif
8287 #ifdef ETOOMANYREFS
8288 case ETOOMANYREFS: return DRFLAC_ERROR;
8289 #endif
8290 #ifdef ETIMEDOUT
8291 case ETIMEDOUT: return DRFLAC_TIMEOUT;
8292 #endif
8293 #ifdef ECONNREFUSED
8294 case ECONNREFUSED: return DRFLAC_CONNECTION_REFUSED;
8295 #endif
8296 #ifdef EHOSTDOWN
8297 case EHOSTDOWN: return DRFLAC_NO_HOST;
8298 #endif
8299 #ifdef EHOSTUNREACH
8300 case EHOSTUNREACH: return DRFLAC_NO_HOST;
8301 #endif
8302 #ifdef EALREADY
8303 case EALREADY: return DRFLAC_IN_PROGRESS;
8304 #endif
8305 #ifdef EINPROGRESS
8306 case EINPROGRESS: return DRFLAC_IN_PROGRESS;
8307 #endif
8308 #ifdef ESTALE
8309 case ESTALE: return DRFLAC_INVALID_FILE;
8310 #endif
8311 #ifdef EUCLEAN
8312 case EUCLEAN: return DRFLAC_ERROR;
8313 #endif
8314 #ifdef ENOTNAM
8315 case ENOTNAM: return DRFLAC_ERROR;
8316 #endif
8317 #ifdef ENAVAIL
8318 case ENAVAIL: return DRFLAC_ERROR;
8319 #endif
8320 #ifdef EISNAM
8321 case EISNAM: return DRFLAC_ERROR;
8322 #endif
8323 #ifdef EREMOTEIO
8324 case EREMOTEIO: return DRFLAC_IO_ERROR;
8325 #endif
8326 #ifdef EDQUOT
8327 case EDQUOT: return DRFLAC_NO_SPACE;
8328 #endif
8329 #ifdef ENOMEDIUM
8330 case ENOMEDIUM: return DRFLAC_DOES_NOT_EXIST;
8331 #endif
8332 #ifdef EMEDIUMTYPE
8333 case EMEDIUMTYPE: return DRFLAC_ERROR;
8334 #endif
8335 #ifdef ECANCELED
8336 case ECANCELED: return DRFLAC_CANCELLED;
8337 #endif
8338 #ifdef ENOKEY
8339 case ENOKEY: return DRFLAC_ERROR;
8340 #endif
8341 #ifdef EKEYEXPIRED
8342 case EKEYEXPIRED: return DRFLAC_ERROR;
8343 #endif
8344 #ifdef EKEYREVOKED
8345 case EKEYREVOKED: return DRFLAC_ERROR;
8346 #endif
8347 #ifdef EKEYREJECTED
8348 case EKEYREJECTED: return DRFLAC_ERROR;
8349 #endif
8350 #ifdef EOWNERDEAD
8351 case EOWNERDEAD: return DRFLAC_ERROR;
8352 #endif
8353 #ifdef ENOTRECOVERABLE
8354 case ENOTRECOVERABLE: return DRFLAC_ERROR;
8355 #endif
8356 #ifdef ERFKILL
8357 case ERFKILL: return DRFLAC_ERROR;
8358 #endif
8359 #ifdef EHWPOISON
8360 case EHWPOISON: return DRFLAC_ERROR;
8361 #endif
8362 default: return DRFLAC_ERROR;
8363 }
8364}
8365
8366static drflac_result drflac_fopen(FILE** ppFile, const char* pFilePath, const char* pOpenMode)
8367{
8368#if defined(_MSC_VER) && _MSC_VER >= 1400
8369 errno_t err;
8370#endif
8371
8372 if (ppFile != NULL) {
8373 *ppFile = NULL; /* Safety. */
8374 }
8375
8376 if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
8377 return DRFLAC_INVALID_ARGS;
8378 }
8379
8380#if defined(_MSC_VER) && _MSC_VER >= 1400
8381 err = fopen_s(ppFile, pFilePath, pOpenMode);
8382 if (err != 0) {
8383 return drflac_result_from_errno(err);
8384 }
8385#else
8386#if defined(_WIN32) || defined(__APPLE__)
8387 *ppFile = fopen(pFilePath, pOpenMode);
8388#else
8389 #if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS == 64 && defined(_LARGEFILE64_SOURCE)
8390 *ppFile = fopen64(pFilePath, pOpenMode);
8391 #else
8392 *ppFile = fopen(pFilePath, pOpenMode);
8393 #endif
8394#endif
8395 if (*ppFile == NULL) {
8396 drflac_result result = drflac_result_from_errno(errno);
8397 if (result == DRFLAC_SUCCESS) {
8398 result = DRFLAC_ERROR; /* Just a safety check to make sure we never ever return success when pFile == NULL. */
8399 }
8400
8401 return result;
8402 }
8403#endif
8404
8405 return DRFLAC_SUCCESS;
8406}
8407
8408/*
8409_wfopen() isn't always available in all compilation environments.
8410
8411 * Windows only.
8412 * MSVC seems to support it universally as far back as VC6 from what I can tell (haven't checked further back).
8413 * MinGW-64 (both 32- and 64-bit) seems to support it.
8414 * MinGW wraps it in !defined(__STRICT_ANSI__).
8415 * OpenWatcom wraps it in !defined(_NO_EXT_KEYS).
8416
8417This can be reviewed as compatibility issues arise. The preference is to use _wfopen_s() and _wfopen() as opposed to the wcsrtombs()
8418fallback, so if you notice your compiler not detecting this properly I'm happy to look at adding support.
8419*/
8420#if defined(_WIN32)
8421 #if defined(_MSC_VER) || defined(__MINGW64__) || (!defined(__STRICT_ANSI__) && !defined(_NO_EXT_KEYS))
8422 #define DRFLAC_HAS_WFOPEN
8423 #endif
8424#endif
8425
8426static drflac_result drflac_wfopen(FILE** ppFile, const wchar_t* pFilePath, const wchar_t* pOpenMode, const drflac_allocation_callbacks* pAllocationCallbacks)
8427{
8428 if (ppFile != NULL) {
8429 *ppFile = NULL; /* Safety. */
8430 }
8431
8432 if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
8433 return DRFLAC_INVALID_ARGS;
8434 }
8435
8436#if defined(DRFLAC_HAS_WFOPEN)
8437 {
8438 /* Use _wfopen() on Windows. */
8439 #if defined(_MSC_VER) && _MSC_VER >= 1400
8440 errno_t err = _wfopen_s(ppFile, pFilePath, pOpenMode);
8441 if (err != 0) {
8442 return drflac_result_from_errno(err);
8443 }
8444 #else
8445 *ppFile = _wfopen(pFilePath, pOpenMode);
8446 if (*ppFile == NULL) {
8447 return drflac_result_from_errno(errno);
8448 }
8449 #endif
8450 (void)pAllocationCallbacks;
8451 }
8452#else
8453 /*
8454 Use fopen() on anything other than Windows. Requires a conversion. This is annoying because fopen() is locale specific. The only real way I can
8455 think of to do this is with wcsrtombs(). Note that wcstombs() is apparently not thread-safe because it uses a static global mbstate_t object for
8456 maintaining state. I've checked this with -std=c89 and it works, but if somebody get's a compiler error I'll look into improving compatibility.
8457 */
8458 {
8459 mbstate_t mbs;
8460 size_t lenMB;
8461 const wchar_t* pFilePathTemp = pFilePath;
8462 char* pFilePathMB = NULL;
8463 char pOpenModeMB[32] = {0};
8464
8465 /* Get the length first. */
8466 DRFLAC_ZERO_OBJECT(&mbs);
8467 lenMB = wcsrtombs(NULL, &pFilePathTemp, 0, &mbs);
8468 if (lenMB == (size_t)-1) {
8469 return drflac_result_from_errno(errno);
8470 }
8471
8472 pFilePathMB = (char*)drflac__malloc_from_callbacks(lenMB + 1, pAllocationCallbacks);
8473 if (pFilePathMB == NULL) {
8474 return DRFLAC_OUT_OF_MEMORY;
8475 }
8476
8477 pFilePathTemp = pFilePath;
8478 DRFLAC_ZERO_OBJECT(&mbs);
8479 wcsrtombs(pFilePathMB, &pFilePathTemp, lenMB + 1, &mbs);
8480
8481 /* The open mode should always consist of ASCII characters so we should be able to do a trivial conversion. */
8482 {
8483 size_t i = 0;
8484 for (;;) {
8485 if (pOpenMode[i] == 0) {
8486 pOpenModeMB[i] = '\0';
8487 break;
8488 }
8489
8490 pOpenModeMB[i] = (char)pOpenMode[i];
8491 i += 1;
8492 }
8493 }
8494
8495 *ppFile = fopen(pFilePathMB, pOpenModeMB);
8496
8497 drflac__free_from_callbacks(pFilePathMB, pAllocationCallbacks);
8498 }
8499
8500 if (*ppFile == NULL) {
8501 return DRFLAC_ERROR;
8502 }
8503#endif
8504
8505 return DRFLAC_SUCCESS;
8506}
8507
8508static size_t drflac__on_read_stdio(void* pUserData, void* bufferOut, size_t bytesToRead)
8509{
8510 return fread(bufferOut, 1, bytesToRead, (FILE*)pUserData);
8511}
8512
8513static drflac_bool32 drflac__on_seek_stdio(void* pUserData, int offset, drflac_seek_origin origin)
8514{
8515 DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
8516
8517 return fseek((FILE*)pUserData, offset, (origin == drflac_seek_origin_current) ? SEEK_CUR : SEEK_SET) == 0;
8518}
8519
8520
8521DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
8522{
8523 drflac* pFlac;
8524 FILE* pFile;
8525
8526 if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
8527 return NULL;
8528 }
8529
8530 pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
8531 if (pFlac == NULL) {
8532 fclose(pFile);
8533 return NULL;
8534 }
8535
8536 return pFlac;
8537}
8538
8539DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
8540{
8541 drflac* pFlac;
8542 FILE* pFile;
8543
8544 if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
8545 return NULL;
8546 }
8547
8548 pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
8549 if (pFlac == NULL) {
8550 fclose(pFile);
8551 return NULL;
8552 }
8553
8554 return pFlac;
8555}
8556
8557DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8558{
8559 drflac* pFlac;
8560 FILE* pFile;
8561
8562 if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
8563 return NULL;
8564 }
8565
8566 pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
8567 if (pFlac == NULL) {
8568 fclose(pFile);
8569 return pFlac;
8570 }
8571
8572 return pFlac;
8573}
8574
8575DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8576{
8577 drflac* pFlac;
8578 FILE* pFile;
8579
8580 if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
8581 return NULL;
8582 }
8583
8584 pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
8585 if (pFlac == NULL) {
8586 fclose(pFile);
8587 return pFlac;
8588 }
8589
8590 return pFlac;
8591}
8592#endif /* DR_FLAC_NO_STDIO */
8593
8594static size_t drflac__on_read_memory(void* pUserData, void* bufferOut, size_t bytesToRead)
8595{
8596 drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
8597 size_t bytesRemaining;
8598
8599 DRFLAC_ASSERT(memoryStream != NULL);
8600 DRFLAC_ASSERT(memoryStream->dataSize >= memoryStream->currentReadPos);
8601
8602 bytesRemaining = memoryStream->dataSize - memoryStream->currentReadPos;
8603 if (bytesToRead > bytesRemaining) {
8604 bytesToRead = bytesRemaining;
8605 }
8606
8607 if (bytesToRead > 0) {
8608 DRFLAC_COPY_MEMORY(bufferOut, memoryStream->data + memoryStream->currentReadPos, bytesToRead);
8609 memoryStream->currentReadPos += bytesToRead;
8610 }
8611
8612 return bytesToRead;
8613}
8614
8615static drflac_bool32 drflac__on_seek_memory(void* pUserData, int offset, drflac_seek_origin origin)
8616{
8617 drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
8618
8619 DRFLAC_ASSERT(memoryStream != NULL);
8620 DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
8621
8622 if (offset > (drflac_int64)memoryStream->dataSize) {
8623 return DRFLAC_FALSE;
8624 }
8625
8626 if (origin == drflac_seek_origin_current) {
8627 if (memoryStream->currentReadPos + offset <= memoryStream->dataSize) {
8628 memoryStream->currentReadPos += offset;
8629 } else {
8630 return DRFLAC_FALSE; /* Trying to seek too far forward. */
8631 }
8632 } else {
8633 if ((drflac_uint32)offset <= memoryStream->dataSize) {
8634 memoryStream->currentReadPos = offset;
8635 } else {
8636 return DRFLAC_FALSE; /* Trying to seek too far forward. */
8637 }
8638 }
8639
8640 return DRFLAC_TRUE;
8641}
8642
8643DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks)
8644{
8645 drflac__memory_stream memoryStream;
8646 drflac* pFlac;
8647
8648 memoryStream.data = (const drflac_uint8*)pData;
8649 memoryStream.dataSize = dataSize;
8650 memoryStream.currentReadPos = 0;
8651 pFlac = drflac_open(drflac__on_read_memory, drflac__on_seek_memory, &memoryStream, pAllocationCallbacks);
8652 if (pFlac == NULL) {
8653 return NULL;
8654 }
8655
8656 pFlac->memoryStream = memoryStream;
8657
8658 /* This is an awful hack... */
8659#ifndef DR_FLAC_NO_OGG
8660 if (pFlac->container == drflac_container_ogg)
8661 {
8662 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8663 oggbs->pUserData = &pFlac->memoryStream;
8664 }
8665 else
8666#endif
8667 {
8668 pFlac->bs.pUserData = &pFlac->memoryStream;
8669 }
8670
8671 return pFlac;
8672}
8673
8674DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8675{
8676 drflac__memory_stream memoryStream;
8677 drflac* pFlac;
8678
8679 memoryStream.data = (const drflac_uint8*)pData;
8680 memoryStream.dataSize = dataSize;
8681 memoryStream.currentReadPos = 0;
8682 pFlac = drflac_open_with_metadata_private(drflac__on_read_memory, drflac__on_seek_memory, onMeta, drflac_container_unknown, &memoryStream, pUserData, pAllocationCallbacks);
8683 if (pFlac == NULL) {
8684 return NULL;
8685 }
8686
8687 pFlac->memoryStream = memoryStream;
8688
8689 /* This is an awful hack... */
8690#ifndef DR_FLAC_NO_OGG
8691 if (pFlac->container == drflac_container_ogg)
8692 {
8693 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8694 oggbs->pUserData = &pFlac->memoryStream;
8695 }
8696 else
8697#endif
8698 {
8699 pFlac->bs.pUserData = &pFlac->memoryStream;
8700 }
8701
8702 return pFlac;
8703}
8704
8705
8706
8707DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8708{
8709 return drflac_open_with_metadata_private(onRead, onSeek, NULL, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
8710}
8711DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8712{
8713 return drflac_open_with_metadata_private(onRead, onSeek, NULL, container, pUserData, pUserData, pAllocationCallbacks);
8714}
8715
8716DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8717{
8718 return drflac_open_with_metadata_private(onRead, onSeek, onMeta, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
8719}
8720DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8721{
8722 return drflac_open_with_metadata_private(onRead, onSeek, onMeta, container, pUserData, pUserData, pAllocationCallbacks);
8723}
8724
8725DRFLAC_API void drflac_close(drflac* pFlac)
8726{
8727 if (pFlac == NULL) {
8728 return;
8729 }
8730
8731#ifndef DR_FLAC_NO_STDIO
8732 /*
8733 If we opened the file with drflac_open_file() we will want to close the file handle. We can know whether or not drflac_open_file()
8734 was used by looking at the callbacks.
8735 */
8736 if (pFlac->bs.onRead == drflac__on_read_stdio) {
8737 fclose((FILE*)pFlac->bs.pUserData);
8738 }
8739
8740#ifndef DR_FLAC_NO_OGG
8741 /* Need to clean up Ogg streams a bit differently due to the way the bit streaming is chained. */
8742 if (pFlac->container == drflac_container_ogg) {
8743 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8744 DRFLAC_ASSERT(pFlac->bs.onRead == drflac__on_read_ogg);
8745
8746 if (oggbs->onRead == drflac__on_read_stdio) {
8747 fclose((FILE*)oggbs->pUserData);
8748 }
8749 }
8750#endif
8751#endif
8752
8753 drflac__free_from_callbacks(pFlac, &pFlac->allocationCallbacks);
8754}
8755
8756
8757#if 0
8758static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8759{
8760 drflac_uint64 i;
8761 for (i = 0; i < frameCount; ++i) {
8762 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
8763 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
8764 drflac_uint32 right = left - side;
8765
8766 pOutputSamples[i*2+0] = (drflac_int32)left;
8767 pOutputSamples[i*2+1] = (drflac_int32)right;
8768 }
8769}
8770#endif
8771
8772static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8773{
8774 drflac_uint64 i;
8775 drflac_uint64 frameCount4 = frameCount >> 2;
8776 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8777 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8778 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
8779 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
8780
8781 for (i = 0; i < frameCount4; ++i) {
8782 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
8783 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
8784 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
8785 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
8786
8787 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
8788 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
8789 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
8790 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
8791
8792 drflac_uint32 right0 = left0 - side0;
8793 drflac_uint32 right1 = left1 - side1;
8794 drflac_uint32 right2 = left2 - side2;
8795 drflac_uint32 right3 = left3 - side3;
8796
8797 pOutputSamples[i*8+0] = (drflac_int32)left0;
8798 pOutputSamples[i*8+1] = (drflac_int32)right0;
8799 pOutputSamples[i*8+2] = (drflac_int32)left1;
8800 pOutputSamples[i*8+3] = (drflac_int32)right1;
8801 pOutputSamples[i*8+4] = (drflac_int32)left2;
8802 pOutputSamples[i*8+5] = (drflac_int32)right2;
8803 pOutputSamples[i*8+6] = (drflac_int32)left3;
8804 pOutputSamples[i*8+7] = (drflac_int32)right3;
8805 }
8806
8807 for (i = (frameCount4 << 2); i < frameCount; ++i) {
8808 drflac_uint32 left = pInputSamples0U32[i] << shift0;
8809 drflac_uint32 side = pInputSamples1U32[i] << shift1;
8810 drflac_uint32 right = left - side;
8811
8812 pOutputSamples[i*2+0] = (drflac_int32)left;
8813 pOutputSamples[i*2+1] = (drflac_int32)right;
8814 }
8815}
8816
8817#if defined(DRFLAC_SUPPORT_SSE2)
8818static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8819{
8820 drflac_uint64 i;
8821 drflac_uint64 frameCount4 = frameCount >> 2;
8822 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8823 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8824 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
8825 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
8826
8827 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
8828
8829 for (i = 0; i < frameCount4; ++i) {
8830 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
8831 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
8832 __m128i right = _mm_sub_epi32(left, side);
8833
8834 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
8835 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
8836 }
8837
8838 for (i = (frameCount4 << 2); i < frameCount; ++i) {
8839 drflac_uint32 left = pInputSamples0U32[i] << shift0;
8840 drflac_uint32 side = pInputSamples1U32[i] << shift1;
8841 drflac_uint32 right = left - side;
8842
8843 pOutputSamples[i*2+0] = (drflac_int32)left;
8844 pOutputSamples[i*2+1] = (drflac_int32)right;
8845 }
8846}
8847#endif
8848
8849#if defined(DRFLAC_SUPPORT_NEON)
8850static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8851{
8852 drflac_uint64 i;
8853 drflac_uint64 frameCount4 = frameCount >> 2;
8854 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8855 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8856 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
8857 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
8858 int32x4_t shift0_4;
8859 int32x4_t shift1_4;
8860
8861 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
8862
8863 shift0_4 = vdupq_n_s32(shift0);
8864 shift1_4 = vdupq_n_s32(shift1);
8865
8866 for (i = 0; i < frameCount4; ++i) {
8867 uint32x4_t left;
8868 uint32x4_t side;
8869 uint32x4_t right;
8870
8871 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
8872 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
8873 right = vsubq_u32(left, side);
8874
8875 drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
8876 }
8877
8878 for (i = (frameCount4 << 2); i < frameCount; ++i) {
8879 drflac_uint32 left = pInputSamples0U32[i] << shift0;
8880 drflac_uint32 side = pInputSamples1U32[i] << shift1;
8881 drflac_uint32 right = left - side;
8882
8883 pOutputSamples[i*2+0] = (drflac_int32)left;
8884 pOutputSamples[i*2+1] = (drflac_int32)right;
8885 }
8886}
8887#endif
8888
8889static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8890{
8891#if defined(DRFLAC_SUPPORT_SSE2)
8892 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
8893 drflac_read_pcm_frames_s32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
8894 } else
8895#elif defined(DRFLAC_SUPPORT_NEON)
8896 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
8897 drflac_read_pcm_frames_s32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
8898 } else
8899#endif
8900 {
8901 /* Scalar fallback. */
8902#if 0
8903 drflac_read_pcm_frames_s32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
8904#else
8905 drflac_read_pcm_frames_s32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
8906#endif
8907 }
8908}
8909
8910
8911#if 0
8912static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8913{
8914 drflac_uint64 i;
8915 for (i = 0; i < frameCount; ++i) {
8916 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
8917 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
8918 drflac_uint32 left = right + side;
8919
8920 pOutputSamples[i*2+0] = (drflac_int32)left;
8921 pOutputSamples[i*2+1] = (drflac_int32)right;
8922 }
8923}
8924#endif
8925
8926static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8927{
8928 drflac_uint64 i;
8929 drflac_uint64 frameCount4 = frameCount >> 2;
8930 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8931 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8932 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
8933 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
8934
8935 for (i = 0; i < frameCount4; ++i) {
8936 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
8937 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
8938 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
8939 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
8940
8941 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
8942 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
8943 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
8944 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
8945
8946 drflac_uint32 left0 = right0 + side0;
8947 drflac_uint32 left1 = right1 + side1;
8948 drflac_uint32 left2 = right2 + side2;
8949 drflac_uint32 left3 = right3 + side3;
8950
8951 pOutputSamples[i*8+0] = (drflac_int32)left0;
8952 pOutputSamples[i*8+1] = (drflac_int32)right0;
8953 pOutputSamples[i*8+2] = (drflac_int32)left1;
8954 pOutputSamples[i*8+3] = (drflac_int32)right1;
8955 pOutputSamples[i*8+4] = (drflac_int32)left2;
8956 pOutputSamples[i*8+5] = (drflac_int32)right2;
8957 pOutputSamples[i*8+6] = (drflac_int32)left3;
8958 pOutputSamples[i*8+7] = (drflac_int32)right3;
8959 }
8960
8961 for (i = (frameCount4 << 2); i < frameCount; ++i) {
8962 drflac_uint32 side = pInputSamples0U32[i] << shift0;
8963 drflac_uint32 right = pInputSamples1U32[i] << shift1;
8964 drflac_uint32 left = right + side;
8965
8966 pOutputSamples[i*2+0] = (drflac_int32)left;
8967 pOutputSamples[i*2+1] = (drflac_int32)right;
8968 }
8969}
8970
8971#if defined(DRFLAC_SUPPORT_SSE2)
8972static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8973{
8974 drflac_uint64 i;
8975 drflac_uint64 frameCount4 = frameCount >> 2;
8976 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8977 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8978 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
8979 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
8980
8981 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
8982
8983 for (i = 0; i < frameCount4; ++i) {
8984 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
8985 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
8986 __m128i left = _mm_add_epi32(right, side);
8987
8988 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
8989 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
8990 }
8991
8992 for (i = (frameCount4 << 2); i < frameCount; ++i) {
8993 drflac_uint32 side = pInputSamples0U32[i] << shift0;
8994 drflac_uint32 right = pInputSamples1U32[i] << shift1;
8995 drflac_uint32 left = right + side;
8996
8997 pOutputSamples[i*2+0] = (drflac_int32)left;
8998 pOutputSamples[i*2+1] = (drflac_int32)right;
8999 }
9000}
9001#endif
9002
9003#if defined(DRFLAC_SUPPORT_NEON)
9004static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9005{
9006 drflac_uint64 i;
9007 drflac_uint64 frameCount4 = frameCount >> 2;
9008 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9009 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9010 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9011 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9012 int32x4_t shift0_4;
9013 int32x4_t shift1_4;
9014
9015 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9016
9017 shift0_4 = vdupq_n_s32(shift0);
9018 shift1_4 = vdupq_n_s32(shift1);
9019
9020 for (i = 0; i < frameCount4; ++i) {
9021 uint32x4_t side;
9022 uint32x4_t right;
9023 uint32x4_t left;
9024
9025 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9026 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9027 left = vaddq_u32(right, side);
9028
9029 drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
9030 }
9031
9032 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9033 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9034 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9035 drflac_uint32 left = right + side;
9036
9037 pOutputSamples[i*2+0] = (drflac_int32)left;
9038 pOutputSamples[i*2+1] = (drflac_int32)right;
9039 }
9040}
9041#endif
9042
9043static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9044{
9045#if defined(DRFLAC_SUPPORT_SSE2)
9046 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9047 drflac_read_pcm_frames_s32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9048 } else
9049#elif defined(DRFLAC_SUPPORT_NEON)
9050 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9051 drflac_read_pcm_frames_s32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9052 } else
9053#endif
9054 {
9055 /* Scalar fallback. */
9056#if 0
9057 drflac_read_pcm_frames_s32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9058#else
9059 drflac_read_pcm_frames_s32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9060#endif
9061 }
9062}
9063
9064
9065#if 0
9066static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9067{
9068 for (drflac_uint64 i = 0; i < frameCount; ++i) {
9069 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9070 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9071
9072 mid = (mid << 1) | (side & 0x01);
9073
9074 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
9075 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
9076 }
9077}
9078#endif
9079
9080static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9081{
9082 drflac_uint64 i;
9083 drflac_uint64 frameCount4 = frameCount >> 2;
9084 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9085 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9086 drflac_int32 shift = unusedBitsPerSample;
9087
9088 if (shift > 0) {
9089 shift -= 1;
9090 for (i = 0; i < frameCount4; ++i) {
9091 drflac_uint32 temp0L;
9092 drflac_uint32 temp1L;
9093 drflac_uint32 temp2L;
9094 drflac_uint32 temp3L;
9095 drflac_uint32 temp0R;
9096 drflac_uint32 temp1R;
9097 drflac_uint32 temp2R;
9098 drflac_uint32 temp3R;
9099
9100 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9101 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9102 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9103 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9104
9105 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9106 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9107 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9108 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9109
9110 mid0 = (mid0 << 1) | (side0 & 0x01);
9111 mid1 = (mid1 << 1) | (side1 & 0x01);
9112 mid2 = (mid2 << 1) | (side2 & 0x01);
9113 mid3 = (mid3 << 1) | (side3 & 0x01);
9114
9115 temp0L = (mid0 + side0) << shift;
9116 temp1L = (mid1 + side1) << shift;
9117 temp2L = (mid2 + side2) << shift;
9118 temp3L = (mid3 + side3) << shift;
9119
9120 temp0R = (mid0 - side0) << shift;
9121 temp1R = (mid1 - side1) << shift;
9122 temp2R = (mid2 - side2) << shift;
9123 temp3R = (mid3 - side3) << shift;
9124
9125 pOutputSamples[i*8+0] = (drflac_int32)temp0L;
9126 pOutputSamples[i*8+1] = (drflac_int32)temp0R;
9127 pOutputSamples[i*8+2] = (drflac_int32)temp1L;
9128 pOutputSamples[i*8+3] = (drflac_int32)temp1R;
9129 pOutputSamples[i*8+4] = (drflac_int32)temp2L;
9130 pOutputSamples[i*8+5] = (drflac_int32)temp2R;
9131 pOutputSamples[i*8+6] = (drflac_int32)temp3L;
9132 pOutputSamples[i*8+7] = (drflac_int32)temp3R;
9133 }
9134 } else {
9135 for (i = 0; i < frameCount4; ++i) {
9136 drflac_uint32 temp0L;
9137 drflac_uint32 temp1L;
9138 drflac_uint32 temp2L;
9139 drflac_uint32 temp3L;
9140 drflac_uint32 temp0R;
9141 drflac_uint32 temp1R;
9142 drflac_uint32 temp2R;
9143 drflac_uint32 temp3R;
9144
9145 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9146 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9147 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9148 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9149
9150 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9151 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9152 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9153 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9154
9155 mid0 = (mid0 << 1) | (side0 & 0x01);
9156 mid1 = (mid1 << 1) | (side1 & 0x01);
9157 mid2 = (mid2 << 1) | (side2 & 0x01);
9158 mid3 = (mid3 << 1) | (side3 & 0x01);
9159
9160 temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
9161 temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
9162 temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
9163 temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
9164
9165 temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
9166 temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
9167 temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
9168 temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
9169
9170 pOutputSamples[i*8+0] = (drflac_int32)temp0L;
9171 pOutputSamples[i*8+1] = (drflac_int32)temp0R;
9172 pOutputSamples[i*8+2] = (drflac_int32)temp1L;
9173 pOutputSamples[i*8+3] = (drflac_int32)temp1R;
9174 pOutputSamples[i*8+4] = (drflac_int32)temp2L;
9175 pOutputSamples[i*8+5] = (drflac_int32)temp2R;
9176 pOutputSamples[i*8+6] = (drflac_int32)temp3L;
9177 pOutputSamples[i*8+7] = (drflac_int32)temp3R;
9178 }
9179 }
9180
9181 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9182 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9183 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9184
9185 mid = (mid << 1) | (side & 0x01);
9186
9187 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
9188 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
9189 }
9190}
9191
9192#if defined(DRFLAC_SUPPORT_SSE2)
9193static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9194{
9195 drflac_uint64 i;
9196 drflac_uint64 frameCount4 = frameCount >> 2;
9197 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9198 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9199 drflac_int32 shift = unusedBitsPerSample;
9200
9201 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9202
9203 if (shift == 0) {
9204 for (i = 0; i < frameCount4; ++i) {
9205 __m128i mid;
9206 __m128i side;
9207 __m128i left;
9208 __m128i right;
9209
9210 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9211 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9212
9213 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
9214
9215 left = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
9216 right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
9217
9218 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9219 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9220 }
9221
9222 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9223 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9224 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9225
9226 mid = (mid << 1) | (side & 0x01);
9227
9228 pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
9229 pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
9230 }
9231 } else {
9232 shift -= 1;
9233 for (i = 0; i < frameCount4; ++i) {
9234 __m128i mid;
9235 __m128i side;
9236 __m128i left;
9237 __m128i right;
9238
9239 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9240 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9241
9242 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
9243
9244 left = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
9245 right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
9246
9247 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9248 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9249 }
9250
9251 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9252 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9253 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9254
9255 mid = (mid << 1) | (side & 0x01);
9256
9257 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
9258 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
9259 }
9260 }
9261}
9262#endif
9263
9264#if defined(DRFLAC_SUPPORT_NEON)
9265static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9266{
9267 drflac_uint64 i;
9268 drflac_uint64 frameCount4 = frameCount >> 2;
9269 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9270 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9271 drflac_int32 shift = unusedBitsPerSample;
9272 int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
9273 int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
9274 uint32x4_t one4;
9275
9276 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9277
9278 wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9279 wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9280 one4 = vdupq_n_u32(1);
9281
9282 if (shift == 0) {
9283 for (i = 0; i < frameCount4; ++i) {
9284 uint32x4_t mid;
9285 uint32x4_t side;
9286 int32x4_t left;
9287 int32x4_t right;
9288
9289 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
9290 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
9291
9292 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
9293
9294 left = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
9295 right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
9296
9297 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9298 }
9299
9300 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9301 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9302 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9303
9304 mid = (mid << 1) | (side & 0x01);
9305
9306 pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
9307 pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
9308 }
9309 } else {
9310 int32x4_t shift4;
9311
9312 shift -= 1;
9313 shift4 = vdupq_n_s32(shift);
9314
9315 for (i = 0; i < frameCount4; ++i) {
9316 uint32x4_t mid;
9317 uint32x4_t side;
9318 int32x4_t left;
9319 int32x4_t right;
9320
9321 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
9322 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
9323
9324 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
9325
9326 left = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
9327 right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
9328
9329 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9330 }
9331
9332 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9333 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9334 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9335
9336 mid = (mid << 1) | (side & 0x01);
9337
9338 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
9339 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
9340 }
9341 }
9342}
9343#endif
9344
9345static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9346{
9347#if defined(DRFLAC_SUPPORT_SSE2)
9348 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9349 drflac_read_pcm_frames_s32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9350 } else
9351#elif defined(DRFLAC_SUPPORT_NEON)
9352 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9353 drflac_read_pcm_frames_s32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9354 } else
9355#endif
9356 {
9357 /* Scalar fallback. */
9358#if 0
9359 drflac_read_pcm_frames_s32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9360#else
9361 drflac_read_pcm_frames_s32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9362#endif
9363 }
9364}
9365
9366
9367#if 0
9368static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9369{
9370 for (drflac_uint64 i = 0; i < frameCount; ++i) {
9371 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample));
9372 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample));
9373 }
9374}
9375#endif
9376
9377static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9378{
9379 drflac_uint64 i;
9380 drflac_uint64 frameCount4 = frameCount >> 2;
9381 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9382 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9383 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9384 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9385
9386 for (i = 0; i < frameCount4; ++i) {
9387 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
9388 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
9389 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
9390 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
9391
9392 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
9393 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
9394 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
9395 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
9396
9397 pOutputSamples[i*8+0] = (drflac_int32)tempL0;
9398 pOutputSamples[i*8+1] = (drflac_int32)tempR0;
9399 pOutputSamples[i*8+2] = (drflac_int32)tempL1;
9400 pOutputSamples[i*8+3] = (drflac_int32)tempR1;
9401 pOutputSamples[i*8+4] = (drflac_int32)tempL2;
9402 pOutputSamples[i*8+5] = (drflac_int32)tempR2;
9403 pOutputSamples[i*8+6] = (drflac_int32)tempL3;
9404 pOutputSamples[i*8+7] = (drflac_int32)tempR3;
9405 }
9406
9407 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9408 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9409 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9410 }
9411}
9412
9413#if defined(DRFLAC_SUPPORT_SSE2)
9414static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9415{
9416 drflac_uint64 i;
9417 drflac_uint64 frameCount4 = frameCount >> 2;
9418 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9419 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9420 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9421 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9422
9423 for (i = 0; i < frameCount4; ++i) {
9424 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9425 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9426
9427 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9428 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9429 }
9430
9431 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9432 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9433 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9434 }
9435}
9436#endif
9437
9438#if defined(DRFLAC_SUPPORT_NEON)
9439static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9440{
9441 drflac_uint64 i;
9442 drflac_uint64 frameCount4 = frameCount >> 2;
9443 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9444 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9445 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9446 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9447
9448 int32x4_t shift4_0 = vdupq_n_s32(shift0);
9449 int32x4_t shift4_1 = vdupq_n_s32(shift1);
9450
9451 for (i = 0; i < frameCount4; ++i) {
9452 int32x4_t left;
9453 int32x4_t right;
9454
9455 left = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift4_0));
9456 right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift4_1));
9457
9458 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9459 }
9460
9461 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9462 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9463 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9464 }
9465}
9466#endif
9467
9468static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9469{
9470#if defined(DRFLAC_SUPPORT_SSE2)
9471 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9472 drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9473 } else
9474#elif defined(DRFLAC_SUPPORT_NEON)
9475 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9476 drflac_read_pcm_frames_s32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9477 } else
9478#endif
9479 {
9480 /* Scalar fallback. */
9481#if 0
9482 drflac_read_pcm_frames_s32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9483#else
9484 drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9485#endif
9486 }
9487}
9488
9489
9490DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut)
9491{
9492 drflac_uint64 framesRead;
9493 drflac_uint32 unusedBitsPerSample;
9494
9495 if (pFlac == NULL || framesToRead == 0) {
9496 return 0;
9497 }
9498
9499 if (pBufferOut == NULL) {
9500 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
9501 }
9502
9503 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
9504 unusedBitsPerSample = 32 - pFlac->bitsPerSample;
9505
9506 framesRead = 0;
9507 while (framesToRead > 0) {
9508 /* If we've run out of samples in this frame, go to the next. */
9509 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
9510 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
9511 break; /* Couldn't read the next frame, so just break from the loop and return. */
9512 }
9513 } else {
9514 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
9515 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
9516 drflac_uint64 frameCountThisIteration = framesToRead;
9517
9518 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
9519 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
9520 }
9521
9522 if (channelCount == 2) {
9523 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
9524 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
9525
9526 switch (pFlac->currentFLACFrame.header.channelAssignment)
9527 {
9528 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
9529 {
9530 drflac_read_pcm_frames_s32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9531 } break;
9532
9533 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
9534 {
9535 drflac_read_pcm_frames_s32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9536 } break;
9537
9538 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
9539 {
9540 drflac_read_pcm_frames_s32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9541 } break;
9542
9543 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
9544 default:
9545 {
9546 drflac_read_pcm_frames_s32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9547 } break;
9548 }
9549 } else {
9550 /* Generic interleaving. */
9551 drflac_uint64 i;
9552 for (i = 0; i < frameCountThisIteration; ++i) {
9553 unsigned int j;
9554 for (j = 0; j < channelCount; ++j) {
9555 pBufferOut[(i*channelCount)+j] = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
9556 }
9557 }
9558 }
9559
9560 framesRead += frameCountThisIteration;
9561 pBufferOut += frameCountThisIteration * channelCount;
9562 framesToRead -= frameCountThisIteration;
9563 pFlac->currentPCMFrame += frameCountThisIteration;
9564 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
9565 }
9566 }
9567
9568 return framesRead;
9569}
9570
9571
9572#if 0
9573static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9574{
9575 drflac_uint64 i;
9576 for (i = 0; i < frameCount; ++i) {
9577 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9578 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9579 drflac_uint32 right = left - side;
9580
9581 left >>= 16;
9582 right >>= 16;
9583
9584 pOutputSamples[i*2+0] = (drflac_int16)left;
9585 pOutputSamples[i*2+1] = (drflac_int16)right;
9586 }
9587}
9588#endif
9589
9590static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9591{
9592 drflac_uint64 i;
9593 drflac_uint64 frameCount4 = frameCount >> 2;
9594 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9595 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9596 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9597 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9598
9599 for (i = 0; i < frameCount4; ++i) {
9600 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
9601 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
9602 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
9603 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
9604
9605 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
9606 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
9607 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
9608 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
9609
9610 drflac_uint32 right0 = left0 - side0;
9611 drflac_uint32 right1 = left1 - side1;
9612 drflac_uint32 right2 = left2 - side2;
9613 drflac_uint32 right3 = left3 - side3;
9614
9615 left0 >>= 16;
9616 left1 >>= 16;
9617 left2 >>= 16;
9618 left3 >>= 16;
9619
9620 right0 >>= 16;
9621 right1 >>= 16;
9622 right2 >>= 16;
9623 right3 >>= 16;
9624
9625 pOutputSamples[i*8+0] = (drflac_int16)left0;
9626 pOutputSamples[i*8+1] = (drflac_int16)right0;
9627 pOutputSamples[i*8+2] = (drflac_int16)left1;
9628 pOutputSamples[i*8+3] = (drflac_int16)right1;
9629 pOutputSamples[i*8+4] = (drflac_int16)left2;
9630 pOutputSamples[i*8+5] = (drflac_int16)right2;
9631 pOutputSamples[i*8+6] = (drflac_int16)left3;
9632 pOutputSamples[i*8+7] = (drflac_int16)right3;
9633 }
9634
9635 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9636 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9637 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9638 drflac_uint32 right = left - side;
9639
9640 left >>= 16;
9641 right >>= 16;
9642
9643 pOutputSamples[i*2+0] = (drflac_int16)left;
9644 pOutputSamples[i*2+1] = (drflac_int16)right;
9645 }
9646}
9647
9648#if defined(DRFLAC_SUPPORT_SSE2)
9649static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9650{
9651 drflac_uint64 i;
9652 drflac_uint64 frameCount4 = frameCount >> 2;
9653 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9654 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9655 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9656 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9657
9658 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9659
9660 for (i = 0; i < frameCount4; ++i) {
9661 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9662 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9663 __m128i right = _mm_sub_epi32(left, side);
9664
9665 left = _mm_srai_epi32(left, 16);
9666 right = _mm_srai_epi32(right, 16);
9667
9668 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
9669 }
9670
9671 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9672 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9673 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9674 drflac_uint32 right = left - side;
9675
9676 left >>= 16;
9677 right >>= 16;
9678
9679 pOutputSamples[i*2+0] = (drflac_int16)left;
9680 pOutputSamples[i*2+1] = (drflac_int16)right;
9681 }
9682}
9683#endif
9684
9685#if defined(DRFLAC_SUPPORT_NEON)
9686static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9687{
9688 drflac_uint64 i;
9689 drflac_uint64 frameCount4 = frameCount >> 2;
9690 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9691 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9692 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9693 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9694 int32x4_t shift0_4;
9695 int32x4_t shift1_4;
9696
9697 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9698
9699 shift0_4 = vdupq_n_s32(shift0);
9700 shift1_4 = vdupq_n_s32(shift1);
9701
9702 for (i = 0; i < frameCount4; ++i) {
9703 uint32x4_t left;
9704 uint32x4_t side;
9705 uint32x4_t right;
9706
9707 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9708 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9709 right = vsubq_u32(left, side);
9710
9711 left = vshrq_n_u32(left, 16);
9712 right = vshrq_n_u32(right, 16);
9713
9714 drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
9715 }
9716
9717 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9718 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9719 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9720 drflac_uint32 right = left - side;
9721
9722 left >>= 16;
9723 right >>= 16;
9724
9725 pOutputSamples[i*2+0] = (drflac_int16)left;
9726 pOutputSamples[i*2+1] = (drflac_int16)right;
9727 }
9728}
9729#endif
9730
9731static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9732{
9733#if defined(DRFLAC_SUPPORT_SSE2)
9734 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9735 drflac_read_pcm_frames_s16__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9736 } else
9737#elif defined(DRFLAC_SUPPORT_NEON)
9738 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9739 drflac_read_pcm_frames_s16__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9740 } else
9741#endif
9742 {
9743 /* Scalar fallback. */
9744#if 0
9745 drflac_read_pcm_frames_s16__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9746#else
9747 drflac_read_pcm_frames_s16__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9748#endif
9749 }
9750}
9751
9752
9753#if 0
9754static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9755{
9756 drflac_uint64 i;
9757 for (i = 0; i < frameCount; ++i) {
9758 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9759 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9760 drflac_uint32 left = right + side;
9761
9762 left >>= 16;
9763 right >>= 16;
9764
9765 pOutputSamples[i*2+0] = (drflac_int16)left;
9766 pOutputSamples[i*2+1] = (drflac_int16)right;
9767 }
9768}
9769#endif
9770
9771static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9772{
9773 drflac_uint64 i;
9774 drflac_uint64 frameCount4 = frameCount >> 2;
9775 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9776 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9777 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9778 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9779
9780 for (i = 0; i < frameCount4; ++i) {
9781 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
9782 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
9783 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
9784 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
9785
9786 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
9787 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
9788 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
9789 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
9790
9791 drflac_uint32 left0 = right0 + side0;
9792 drflac_uint32 left1 = right1 + side1;
9793 drflac_uint32 left2 = right2 + side2;
9794 drflac_uint32 left3 = right3 + side3;
9795
9796 left0 >>= 16;
9797 left1 >>= 16;
9798 left2 >>= 16;
9799 left3 >>= 16;
9800
9801 right0 >>= 16;
9802 right1 >>= 16;
9803 right2 >>= 16;
9804 right3 >>= 16;
9805
9806 pOutputSamples[i*8+0] = (drflac_int16)left0;
9807 pOutputSamples[i*8+1] = (drflac_int16)right0;
9808 pOutputSamples[i*8+2] = (drflac_int16)left1;
9809 pOutputSamples[i*8+3] = (drflac_int16)right1;
9810 pOutputSamples[i*8+4] = (drflac_int16)left2;
9811 pOutputSamples[i*8+5] = (drflac_int16)right2;
9812 pOutputSamples[i*8+6] = (drflac_int16)left3;
9813 pOutputSamples[i*8+7] = (drflac_int16)right3;
9814 }
9815
9816 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9817 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9818 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9819 drflac_uint32 left = right + side;
9820
9821 left >>= 16;
9822 right >>= 16;
9823
9824 pOutputSamples[i*2+0] = (drflac_int16)left;
9825 pOutputSamples[i*2+1] = (drflac_int16)right;
9826 }
9827}
9828
9829#if defined(DRFLAC_SUPPORT_SSE2)
9830static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9831{
9832 drflac_uint64 i;
9833 drflac_uint64 frameCount4 = frameCount >> 2;
9834 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9835 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9836 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9837 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9838
9839 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9840
9841 for (i = 0; i < frameCount4; ++i) {
9842 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9843 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9844 __m128i left = _mm_add_epi32(right, side);
9845
9846 left = _mm_srai_epi32(left, 16);
9847 right = _mm_srai_epi32(right, 16);
9848
9849 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
9850 }
9851
9852 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9853 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9854 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9855 drflac_uint32 left = right + side;
9856
9857 left >>= 16;
9858 right >>= 16;
9859
9860 pOutputSamples[i*2+0] = (drflac_int16)left;
9861 pOutputSamples[i*2+1] = (drflac_int16)right;
9862 }
9863}
9864#endif
9865
9866#if defined(DRFLAC_SUPPORT_NEON)
9867static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9868{
9869 drflac_uint64 i;
9870 drflac_uint64 frameCount4 = frameCount >> 2;
9871 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9872 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9873 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9874 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9875 int32x4_t shift0_4;
9876 int32x4_t shift1_4;
9877
9878 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9879
9880 shift0_4 = vdupq_n_s32(shift0);
9881 shift1_4 = vdupq_n_s32(shift1);
9882
9883 for (i = 0; i < frameCount4; ++i) {
9884 uint32x4_t side;
9885 uint32x4_t right;
9886 uint32x4_t left;
9887
9888 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9889 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9890 left = vaddq_u32(right, side);
9891
9892 left = vshrq_n_u32(left, 16);
9893 right = vshrq_n_u32(right, 16);
9894
9895 drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
9896 }
9897
9898 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9899 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9900 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9901 drflac_uint32 left = right + side;
9902
9903 left >>= 16;
9904 right >>= 16;
9905
9906 pOutputSamples[i*2+0] = (drflac_int16)left;
9907 pOutputSamples[i*2+1] = (drflac_int16)right;
9908 }
9909}
9910#endif
9911
9912static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9913{
9914#if defined(DRFLAC_SUPPORT_SSE2)
9915 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9916 drflac_read_pcm_frames_s16__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9917 } else
9918#elif defined(DRFLAC_SUPPORT_NEON)
9919 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9920 drflac_read_pcm_frames_s16__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9921 } else
9922#endif
9923 {
9924 /* Scalar fallback. */
9925#if 0
9926 drflac_read_pcm_frames_s16__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9927#else
9928 drflac_read_pcm_frames_s16__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9929#endif
9930 }
9931}
9932
9933
9934#if 0
9935static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9936{
9937 for (drflac_uint64 i = 0; i < frameCount; ++i) {
9938 drflac_uint32 mid = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9939 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9940
9941 mid = (mid << 1) | (side & 0x01);
9942
9943 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
9944 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
9945 }
9946}
9947#endif
9948
9949static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9950{
9951 drflac_uint64 i;
9952 drflac_uint64 frameCount4 = frameCount >> 2;
9953 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9954 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9955 drflac_uint32 shift = unusedBitsPerSample;
9956
9957 if (shift > 0) {
9958 shift -= 1;
9959 for (i = 0; i < frameCount4; ++i) {
9960 drflac_uint32 temp0L;
9961 drflac_uint32 temp1L;
9962 drflac_uint32 temp2L;
9963 drflac_uint32 temp3L;
9964 drflac_uint32 temp0R;
9965 drflac_uint32 temp1R;
9966 drflac_uint32 temp2R;
9967 drflac_uint32 temp3R;
9968
9969 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9970 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9971 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9972 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9973
9974 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9975 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9976 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9977 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9978
9979 mid0 = (mid0 << 1) | (side0 & 0x01);
9980 mid1 = (mid1 << 1) | (side1 & 0x01);
9981 mid2 = (mid2 << 1) | (side2 & 0x01);
9982 mid3 = (mid3 << 1) | (side3 & 0x01);
9983
9984 temp0L = (mid0 + side0) << shift;
9985 temp1L = (mid1 + side1) << shift;
9986 temp2L = (mid2 + side2) << shift;
9987 temp3L = (mid3 + side3) << shift;
9988
9989 temp0R = (mid0 - side0) << shift;
9990 temp1R = (mid1 - side1) << shift;
9991 temp2R = (mid2 - side2) << shift;
9992 temp3R = (mid3 - side3) << shift;
9993
9994 temp0L >>= 16;
9995 temp1L >>= 16;
9996 temp2L >>= 16;
9997 temp3L >>= 16;
9998
9999 temp0R >>= 16;
10000 temp1R >>= 16;
10001 temp2R >>= 16;
10002 temp3R >>= 16;
10003
10004 pOutputSamples[i*8+0] = (drflac_int16)temp0L;
10005 pOutputSamples[i*8+1] = (drflac_int16)temp0R;
10006 pOutputSamples[i*8+2] = (drflac_int16)temp1L;
10007 pOutputSamples[i*8+3] = (drflac_int16)temp1R;
10008 pOutputSamples[i*8+4] = (drflac_int16)temp2L;
10009 pOutputSamples[i*8+5] = (drflac_int16)temp2R;
10010 pOutputSamples[i*8+6] = (drflac_int16)temp3L;
10011 pOutputSamples[i*8+7] = (drflac_int16)temp3R;
10012 }
10013 } else {
10014 for (i = 0; i < frameCount4; ++i) {
10015 drflac_uint32 temp0L;
10016 drflac_uint32 temp1L;
10017 drflac_uint32 temp2L;
10018 drflac_uint32 temp3L;
10019 drflac_uint32 temp0R;
10020 drflac_uint32 temp1R;
10021 drflac_uint32 temp2R;
10022 drflac_uint32 temp3R;
10023
10024 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10025 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10026 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10027 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10028
10029 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10030 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10031 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10032 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10033
10034 mid0 = (mid0 << 1) | (side0 & 0x01);
10035 mid1 = (mid1 << 1) | (side1 & 0x01);
10036 mid2 = (mid2 << 1) | (side2 & 0x01);
10037 mid3 = (mid3 << 1) | (side3 & 0x01);
10038
10039 temp0L = ((drflac_int32)(mid0 + side0) >> 1);
10040 temp1L = ((drflac_int32)(mid1 + side1) >> 1);
10041 temp2L = ((drflac_int32)(mid2 + side2) >> 1);
10042 temp3L = ((drflac_int32)(mid3 + side3) >> 1);
10043
10044 temp0R = ((drflac_int32)(mid0 - side0) >> 1);
10045 temp1R = ((drflac_int32)(mid1 - side1) >> 1);
10046 temp2R = ((drflac_int32)(mid2 - side2) >> 1);
10047 temp3R = ((drflac_int32)(mid3 - side3) >> 1);
10048
10049 temp0L >>= 16;
10050 temp1L >>= 16;
10051 temp2L >>= 16;
10052 temp3L >>= 16;
10053
10054 temp0R >>= 16;
10055 temp1R >>= 16;
10056 temp2R >>= 16;
10057 temp3R >>= 16;
10058
10059 pOutputSamples[i*8+0] = (drflac_int16)temp0L;
10060 pOutputSamples[i*8+1] = (drflac_int16)temp0R;
10061 pOutputSamples[i*8+2] = (drflac_int16)temp1L;
10062 pOutputSamples[i*8+3] = (drflac_int16)temp1R;
10063 pOutputSamples[i*8+4] = (drflac_int16)temp2L;
10064 pOutputSamples[i*8+5] = (drflac_int16)temp2R;
10065 pOutputSamples[i*8+6] = (drflac_int16)temp3L;
10066 pOutputSamples[i*8+7] = (drflac_int16)temp3R;
10067 }
10068 }
10069
10070 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10071 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10072 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10073
10074 mid = (mid << 1) | (side & 0x01);
10075
10076 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
10077 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
10078 }
10079}
10080
10081#if defined(DRFLAC_SUPPORT_SSE2)
10082static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10083{
10084 drflac_uint64 i;
10085 drflac_uint64 frameCount4 = frameCount >> 2;
10086 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10087 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10088 drflac_uint32 shift = unusedBitsPerSample;
10089
10090 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10091
10092 if (shift == 0) {
10093 for (i = 0; i < frameCount4; ++i) {
10094 __m128i mid;
10095 __m128i side;
10096 __m128i left;
10097 __m128i right;
10098
10099 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10100 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10101
10102 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10103
10104 left = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
10105 right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
10106
10107 left = _mm_srai_epi32(left, 16);
10108 right = _mm_srai_epi32(right, 16);
10109
10110 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10111 }
10112
10113 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10114 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10115 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10116
10117 mid = (mid << 1) | (side & 0x01);
10118
10119 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
10120 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
10121 }
10122 } else {
10123 shift -= 1;
10124 for (i = 0; i < frameCount4; ++i) {
10125 __m128i mid;
10126 __m128i side;
10127 __m128i left;
10128 __m128i right;
10129
10130 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10131 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10132
10133 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10134
10135 left = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
10136 right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
10137
10138 left = _mm_srai_epi32(left, 16);
10139 right = _mm_srai_epi32(right, 16);
10140
10141 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10142 }
10143
10144 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10145 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10146 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10147
10148 mid = (mid << 1) | (side & 0x01);
10149
10150 pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
10151 pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
10152 }
10153 }
10154}
10155#endif
10156
10157#if defined(DRFLAC_SUPPORT_NEON)
10158static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10159{
10160 drflac_uint64 i;
10161 drflac_uint64 frameCount4 = frameCount >> 2;
10162 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10163 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10164 drflac_uint32 shift = unusedBitsPerSample;
10165 int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
10166 int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
10167
10168 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10169
10170 wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10171 wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10172
10173 if (shift == 0) {
10174 for (i = 0; i < frameCount4; ++i) {
10175 uint32x4_t mid;
10176 uint32x4_t side;
10177 int32x4_t left;
10178 int32x4_t right;
10179
10180 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
10181 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
10182
10183 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
10184
10185 left = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
10186 right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
10187
10188 left = vshrq_n_s32(left, 16);
10189 right = vshrq_n_s32(right, 16);
10190
10191 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10192 }
10193
10194 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10195 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10196 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10197
10198 mid = (mid << 1) | (side & 0x01);
10199
10200 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
10201 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
10202 }
10203 } else {
10204 int32x4_t shift4;
10205
10206 shift -= 1;
10207 shift4 = vdupq_n_s32(shift);
10208
10209 for (i = 0; i < frameCount4; ++i) {
10210 uint32x4_t mid;
10211 uint32x4_t side;
10212 int32x4_t left;
10213 int32x4_t right;
10214
10215 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
10216 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
10217
10218 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
10219
10220 left = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
10221 right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
10222
10223 left = vshrq_n_s32(left, 16);
10224 right = vshrq_n_s32(right, 16);
10225
10226 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10227 }
10228
10229 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10230 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10231 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10232
10233 mid = (mid << 1) | (side & 0x01);
10234
10235 pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
10236 pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
10237 }
10238 }
10239}
10240#endif
10241
10242static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10243{
10244#if defined(DRFLAC_SUPPORT_SSE2)
10245 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10246 drflac_read_pcm_frames_s16__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10247 } else
10248#elif defined(DRFLAC_SUPPORT_NEON)
10249 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10250 drflac_read_pcm_frames_s16__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10251 } else
10252#endif
10253 {
10254 /* Scalar fallback. */
10255#if 0
10256 drflac_read_pcm_frames_s16__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10257#else
10258 drflac_read_pcm_frames_s16__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10259#endif
10260 }
10261}
10262
10263
10264#if 0
10265static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10266{
10267 for (drflac_uint64 i = 0; i < frameCount; ++i) {
10268 pOutputSamples[i*2+0] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) >> 16);
10269 pOutputSamples[i*2+1] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) >> 16);
10270 }
10271}
10272#endif
10273
10274static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10275{
10276 drflac_uint64 i;
10277 drflac_uint64 frameCount4 = frameCount >> 2;
10278 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10279 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10280 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10281 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10282
10283 for (i = 0; i < frameCount4; ++i) {
10284 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
10285 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
10286 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
10287 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
10288
10289 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
10290 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
10291 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
10292 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
10293
10294 tempL0 >>= 16;
10295 tempL1 >>= 16;
10296 tempL2 >>= 16;
10297 tempL3 >>= 16;
10298
10299 tempR0 >>= 16;
10300 tempR1 >>= 16;
10301 tempR2 >>= 16;
10302 tempR3 >>= 16;
10303
10304 pOutputSamples[i*8+0] = (drflac_int16)tempL0;
10305 pOutputSamples[i*8+1] = (drflac_int16)tempR0;
10306 pOutputSamples[i*8+2] = (drflac_int16)tempL1;
10307 pOutputSamples[i*8+3] = (drflac_int16)tempR1;
10308 pOutputSamples[i*8+4] = (drflac_int16)tempL2;
10309 pOutputSamples[i*8+5] = (drflac_int16)tempR2;
10310 pOutputSamples[i*8+6] = (drflac_int16)tempL3;
10311 pOutputSamples[i*8+7] = (drflac_int16)tempR3;
10312 }
10313
10314 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10315 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10316 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10317 }
10318}
10319
10320#if defined(DRFLAC_SUPPORT_SSE2)
10321static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10322{
10323 drflac_uint64 i;
10324 drflac_uint64 frameCount4 = frameCount >> 2;
10325 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10326 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10327 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10328 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10329
10330 for (i = 0; i < frameCount4; ++i) {
10331 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10332 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10333
10334 left = _mm_srai_epi32(left, 16);
10335 right = _mm_srai_epi32(right, 16);
10336
10337 /* At this point we have results. We can now pack and interleave these into a single __m128i object and then store the in the output buffer. */
10338 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10339 }
10340
10341 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10342 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10343 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10344 }
10345}
10346#endif
10347
10348#if defined(DRFLAC_SUPPORT_NEON)
10349static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10350{
10351 drflac_uint64 i;
10352 drflac_uint64 frameCount4 = frameCount >> 2;
10353 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10354 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10355 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10356 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10357
10358 int32x4_t shift0_4 = vdupq_n_s32(shift0);
10359 int32x4_t shift1_4 = vdupq_n_s32(shift1);
10360
10361 for (i = 0; i < frameCount4; ++i) {
10362 int32x4_t left;
10363 int32x4_t right;
10364
10365 left = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
10366 right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
10367
10368 left = vshrq_n_s32(left, 16);
10369 right = vshrq_n_s32(right, 16);
10370
10371 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10372 }
10373
10374 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10375 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10376 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10377 }
10378}
10379#endif
10380
10381static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10382{
10383#if defined(DRFLAC_SUPPORT_SSE2)
10384 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10385 drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10386 } else
10387#elif defined(DRFLAC_SUPPORT_NEON)
10388 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10389 drflac_read_pcm_frames_s16__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10390 } else
10391#endif
10392 {
10393 /* Scalar fallback. */
10394#if 0
10395 drflac_read_pcm_frames_s16__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10396#else
10397 drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10398#endif
10399 }
10400}
10401
10402DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut)
10403{
10404 drflac_uint64 framesRead;
10405 drflac_uint32 unusedBitsPerSample;
10406
10407 if (pFlac == NULL || framesToRead == 0) {
10408 return 0;
10409 }
10410
10411 if (pBufferOut == NULL) {
10412 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
10413 }
10414
10415 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
10416 unusedBitsPerSample = 32 - pFlac->bitsPerSample;
10417
10418 framesRead = 0;
10419 while (framesToRead > 0) {
10420 /* If we've run out of samples in this frame, go to the next. */
10421 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
10422 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
10423 break; /* Couldn't read the next frame, so just break from the loop and return. */
10424 }
10425 } else {
10426 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
10427 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
10428 drflac_uint64 frameCountThisIteration = framesToRead;
10429
10430 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
10431 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
10432 }
10433
10434 if (channelCount == 2) {
10435 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
10436 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
10437
10438 switch (pFlac->currentFLACFrame.header.channelAssignment)
10439 {
10440 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
10441 {
10442 drflac_read_pcm_frames_s16__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10443 } break;
10444
10445 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
10446 {
10447 drflac_read_pcm_frames_s16__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10448 } break;
10449
10450 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
10451 {
10452 drflac_read_pcm_frames_s16__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10453 } break;
10454
10455 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
10456 default:
10457 {
10458 drflac_read_pcm_frames_s16__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10459 } break;
10460 }
10461 } else {
10462 /* Generic interleaving. */
10463 drflac_uint64 i;
10464 for (i = 0; i < frameCountThisIteration; ++i) {
10465 unsigned int j;
10466 for (j = 0; j < channelCount; ++j) {
10467 drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
10468 pBufferOut[(i*channelCount)+j] = (drflac_int16)(sampleS32 >> 16);
10469 }
10470 }
10471 }
10472
10473 framesRead += frameCountThisIteration;
10474 pBufferOut += frameCountThisIteration * channelCount;
10475 framesToRead -= frameCountThisIteration;
10476 pFlac->currentPCMFrame += frameCountThisIteration;
10477 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
10478 }
10479 }
10480
10481 return framesRead;
10482}
10483
10484
10485#if 0
10486static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10487{
10488 drflac_uint64 i;
10489 for (i = 0; i < frameCount; ++i) {
10490 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10491 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10492 drflac_uint32 right = left - side;
10493
10494 pOutputSamples[i*2+0] = (float)((drflac_int32)left / 2147483648.0);
10495 pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
10496 }
10497}
10498#endif
10499
10500static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10501{
10502 drflac_uint64 i;
10503 drflac_uint64 frameCount4 = frameCount >> 2;
10504 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10505 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10506 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10507 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10508
10509 float factor = 1 / 2147483648.0;
10510
10511 for (i = 0; i < frameCount4; ++i) {
10512 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
10513 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
10514 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
10515 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
10516
10517 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
10518 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
10519 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
10520 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
10521
10522 drflac_uint32 right0 = left0 - side0;
10523 drflac_uint32 right1 = left1 - side1;
10524 drflac_uint32 right2 = left2 - side2;
10525 drflac_uint32 right3 = left3 - side3;
10526
10527 pOutputSamples[i*8+0] = (drflac_int32)left0 * factor;
10528 pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
10529 pOutputSamples[i*8+2] = (drflac_int32)left1 * factor;
10530 pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
10531 pOutputSamples[i*8+4] = (drflac_int32)left2 * factor;
10532 pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
10533 pOutputSamples[i*8+6] = (drflac_int32)left3 * factor;
10534 pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
10535 }
10536
10537 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10538 drflac_uint32 left = pInputSamples0U32[i] << shift0;
10539 drflac_uint32 side = pInputSamples1U32[i] << shift1;
10540 drflac_uint32 right = left - side;
10541
10542 pOutputSamples[i*2+0] = (drflac_int32)left * factor;
10543 pOutputSamples[i*2+1] = (drflac_int32)right * factor;
10544 }
10545}
10546
10547#if defined(DRFLAC_SUPPORT_SSE2)
10548static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10549{
10550 drflac_uint64 i;
10551 drflac_uint64 frameCount4 = frameCount >> 2;
10552 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10553 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10554 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10555 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10556 __m128 factor;
10557
10558 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10559
10560 factor = _mm_set1_ps(1.0f / 8388608.0f);
10561
10562 for (i = 0; i < frameCount4; ++i) {
10563 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10564 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10565 __m128i right = _mm_sub_epi32(left, side);
10566 __m128 leftf = _mm_mul_ps(_mm_cvtepi32_ps(left), factor);
10567 __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
10568
10569 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10570 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10571 }
10572
10573 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10574 drflac_uint32 left = pInputSamples0U32[i] << shift0;
10575 drflac_uint32 side = pInputSamples1U32[i] << shift1;
10576 drflac_uint32 right = left - side;
10577
10578 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10579 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10580 }
10581}
10582#endif
10583
10584#if defined(DRFLAC_SUPPORT_NEON)
10585static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10586{
10587 drflac_uint64 i;
10588 drflac_uint64 frameCount4 = frameCount >> 2;
10589 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10590 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10591 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10592 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10593 float32x4_t factor4;
10594 int32x4_t shift0_4;
10595 int32x4_t shift1_4;
10596
10597 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10598
10599 factor4 = vdupq_n_f32(1.0f / 8388608.0f);
10600 shift0_4 = vdupq_n_s32(shift0);
10601 shift1_4 = vdupq_n_s32(shift1);
10602
10603 for (i = 0; i < frameCount4; ++i) {
10604 uint32x4_t left;
10605 uint32x4_t side;
10606 uint32x4_t right;
10607 float32x4_t leftf;
10608 float32x4_t rightf;
10609
10610 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10611 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10612 right = vsubq_u32(left, side);
10613 leftf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)), factor4);
10614 rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
10615
10616 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
10617 }
10618
10619 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10620 drflac_uint32 left = pInputSamples0U32[i] << shift0;
10621 drflac_uint32 side = pInputSamples1U32[i] << shift1;
10622 drflac_uint32 right = left - side;
10623
10624 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10625 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10626 }
10627}
10628#endif
10629
10630static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10631{
10632#if defined(DRFLAC_SUPPORT_SSE2)
10633 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10634 drflac_read_pcm_frames_f32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10635 } else
10636#elif defined(DRFLAC_SUPPORT_NEON)
10637 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10638 drflac_read_pcm_frames_f32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10639 } else
10640#endif
10641 {
10642 /* Scalar fallback. */
10643#if 0
10644 drflac_read_pcm_frames_f32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10645#else
10646 drflac_read_pcm_frames_f32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10647#endif
10648 }
10649}
10650
10651
10652#if 0
10653static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10654{
10655 drflac_uint64 i;
10656 for (i = 0; i < frameCount; ++i) {
10657 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10658 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10659 drflac_uint32 left = right + side;
10660
10661 pOutputSamples[i*2+0] = (float)((drflac_int32)left / 2147483648.0);
10662 pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
10663 }
10664}
10665#endif
10666
10667static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10668{
10669 drflac_uint64 i;
10670 drflac_uint64 frameCount4 = frameCount >> 2;
10671 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10672 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10673 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10674 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10675 float factor = 1 / 2147483648.0;
10676
10677 for (i = 0; i < frameCount4; ++i) {
10678 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
10679 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
10680 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
10681 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
10682
10683 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
10684 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
10685 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
10686 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
10687
10688 drflac_uint32 left0 = right0 + side0;
10689 drflac_uint32 left1 = right1 + side1;
10690 drflac_uint32 left2 = right2 + side2;
10691 drflac_uint32 left3 = right3 + side3;
10692
10693 pOutputSamples[i*8+0] = (drflac_int32)left0 * factor;
10694 pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
10695 pOutputSamples[i*8+2] = (drflac_int32)left1 * factor;
10696 pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
10697 pOutputSamples[i*8+4] = (drflac_int32)left2 * factor;
10698 pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
10699 pOutputSamples[i*8+6] = (drflac_int32)left3 * factor;
10700 pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
10701 }
10702
10703 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10704 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10705 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10706 drflac_uint32 left = right + side;
10707
10708 pOutputSamples[i*2+0] = (drflac_int32)left * factor;
10709 pOutputSamples[i*2+1] = (drflac_int32)right * factor;
10710 }
10711}
10712
10713#if defined(DRFLAC_SUPPORT_SSE2)
10714static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10715{
10716 drflac_uint64 i;
10717 drflac_uint64 frameCount4 = frameCount >> 2;
10718 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10719 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10720 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10721 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10722 __m128 factor;
10723
10724 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10725
10726 factor = _mm_set1_ps(1.0f / 8388608.0f);
10727
10728 for (i = 0; i < frameCount4; ++i) {
10729 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10730 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10731 __m128i left = _mm_add_epi32(right, side);
10732 __m128 leftf = _mm_mul_ps(_mm_cvtepi32_ps(left), factor);
10733 __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
10734
10735 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10736 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10737 }
10738
10739 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10740 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10741 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10742 drflac_uint32 left = right + side;
10743
10744 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10745 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10746 }
10747}
10748#endif
10749
10750#if defined(DRFLAC_SUPPORT_NEON)
10751static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10752{
10753 drflac_uint64 i;
10754 drflac_uint64 frameCount4 = frameCount >> 2;
10755 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10756 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10757 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10758 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10759 float32x4_t factor4;
10760 int32x4_t shift0_4;
10761 int32x4_t shift1_4;
10762
10763 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10764
10765 factor4 = vdupq_n_f32(1.0f / 8388608.0f);
10766 shift0_4 = vdupq_n_s32(shift0);
10767 shift1_4 = vdupq_n_s32(shift1);
10768
10769 for (i = 0; i < frameCount4; ++i) {
10770 uint32x4_t side;
10771 uint32x4_t right;
10772 uint32x4_t left;
10773 float32x4_t leftf;
10774 float32x4_t rightf;
10775
10776 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10777 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10778 left = vaddq_u32(right, side);
10779 leftf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)), factor4);
10780 rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
10781
10782 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
10783 }
10784
10785 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10786 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10787 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10788 drflac_uint32 left = right + side;
10789
10790 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10791 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10792 }
10793}
10794#endif
10795
10796static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10797{
10798#if defined(DRFLAC_SUPPORT_SSE2)
10799 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10800 drflac_read_pcm_frames_f32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10801 } else
10802#elif defined(DRFLAC_SUPPORT_NEON)
10803 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10804 drflac_read_pcm_frames_f32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10805 } else
10806#endif
10807 {
10808 /* Scalar fallback. */
10809#if 0
10810 drflac_read_pcm_frames_f32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10811#else
10812 drflac_read_pcm_frames_f32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10813#endif
10814 }
10815}
10816
10817
10818#if 0
10819static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10820{
10821 for (drflac_uint64 i = 0; i < frameCount; ++i) {
10822 drflac_uint32 mid = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10823 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10824
10825 mid = (mid << 1) | (side & 0x01);
10826
10827 pOutputSamples[i*2+0] = (float)((((drflac_int32)(mid + side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
10828 pOutputSamples[i*2+1] = (float)((((drflac_int32)(mid - side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
10829 }
10830}
10831#endif
10832
10833static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10834{
10835 drflac_uint64 i;
10836 drflac_uint64 frameCount4 = frameCount >> 2;
10837 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10838 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10839 drflac_uint32 shift = unusedBitsPerSample;
10840 float factor = 1 / 2147483648.0;
10841
10842 if (shift > 0) {
10843 shift -= 1;
10844 for (i = 0; i < frameCount4; ++i) {
10845 drflac_uint32 temp0L;
10846 drflac_uint32 temp1L;
10847 drflac_uint32 temp2L;
10848 drflac_uint32 temp3L;
10849 drflac_uint32 temp0R;
10850 drflac_uint32 temp1R;
10851 drflac_uint32 temp2R;
10852 drflac_uint32 temp3R;
10853
10854 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10855 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10856 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10857 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10858
10859 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10860 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10861 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10862 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10863
10864 mid0 = (mid0 << 1) | (side0 & 0x01);
10865 mid1 = (mid1 << 1) | (side1 & 0x01);
10866 mid2 = (mid2 << 1) | (side2 & 0x01);
10867 mid3 = (mid3 << 1) | (side3 & 0x01);
10868
10869 temp0L = (mid0 + side0) << shift;
10870 temp1L = (mid1 + side1) << shift;
10871 temp2L = (mid2 + side2) << shift;
10872 temp3L = (mid3 + side3) << shift;
10873
10874 temp0R = (mid0 - side0) << shift;
10875 temp1R = (mid1 - side1) << shift;
10876 temp2R = (mid2 - side2) << shift;
10877 temp3R = (mid3 - side3) << shift;
10878
10879 pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
10880 pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
10881 pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
10882 pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
10883 pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
10884 pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
10885 pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
10886 pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
10887 }
10888 } else {
10889 for (i = 0; i < frameCount4; ++i) {
10890 drflac_uint32 temp0L;
10891 drflac_uint32 temp1L;
10892 drflac_uint32 temp2L;
10893 drflac_uint32 temp3L;
10894 drflac_uint32 temp0R;
10895 drflac_uint32 temp1R;
10896 drflac_uint32 temp2R;
10897 drflac_uint32 temp3R;
10898
10899 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10900 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10901 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10902 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10903
10904 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10905 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10906 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10907 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10908
10909 mid0 = (mid0 << 1) | (side0 & 0x01);
10910 mid1 = (mid1 << 1) | (side1 & 0x01);
10911 mid2 = (mid2 << 1) | (side2 & 0x01);
10912 mid3 = (mid3 << 1) | (side3 & 0x01);
10913
10914 temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
10915 temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
10916 temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
10917 temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
10918
10919 temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
10920 temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
10921 temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
10922 temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
10923
10924 pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
10925 pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
10926 pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
10927 pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
10928 pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
10929 pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
10930 pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
10931 pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
10932 }
10933 }
10934
10935 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10936 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10937 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10938
10939 mid = (mid << 1) | (side & 0x01);
10940
10941 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) * factor;
10942 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) * factor;
10943 }
10944}
10945
10946#if defined(DRFLAC_SUPPORT_SSE2)
10947static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10948{
10949 drflac_uint64 i;
10950 drflac_uint64 frameCount4 = frameCount >> 2;
10951 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10952 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10953 drflac_uint32 shift = unusedBitsPerSample - 8;
10954 float factor;
10955 __m128 factor128;
10956
10957 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10958
10959 factor = 1.0f / 8388608.0f;
10960 factor128 = _mm_set1_ps(factor);
10961
10962 if (shift == 0) {
10963 for (i = 0; i < frameCount4; ++i) {
10964 __m128i mid;
10965 __m128i side;
10966 __m128i tempL;
10967 __m128i tempR;
10968 __m128 leftf;
10969 __m128 rightf;
10970
10971 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10972 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10973
10974 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10975
10976 tempL = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
10977 tempR = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
10978
10979 leftf = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
10980 rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
10981
10982 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10983 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10984 }
10985
10986 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10987 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10988 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10989
10990 mid = (mid << 1) | (side & 0x01);
10991
10992 pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
10993 pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
10994 }
10995 } else {
10996 shift -= 1;
10997 for (i = 0; i < frameCount4; ++i) {
10998 __m128i mid;
10999 __m128i side;
11000 __m128i tempL;
11001 __m128i tempR;
11002 __m128 leftf;
11003 __m128 rightf;
11004
11005 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
11006 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
11007
11008 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
11009
11010 tempL = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
11011 tempR = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
11012
11013 leftf = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
11014 rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
11015
11016 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11017 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11018 }
11019
11020 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11021 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11022 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11023
11024 mid = (mid << 1) | (side & 0x01);
11025
11026 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
11027 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
11028 }
11029 }
11030}
11031#endif
11032
11033#if defined(DRFLAC_SUPPORT_NEON)
11034static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11035{
11036 drflac_uint64 i;
11037 drflac_uint64 frameCount4 = frameCount >> 2;
11038 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11039 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11040 drflac_uint32 shift = unusedBitsPerSample - 8;
11041 float factor;
11042 float32x4_t factor4;
11043 int32x4_t shift4;
11044 int32x4_t wbps0_4; /* Wasted Bits Per Sample */
11045 int32x4_t wbps1_4; /* Wasted Bits Per Sample */
11046
11047 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
11048
11049 factor = 1.0f / 8388608.0f;
11050 factor4 = vdupq_n_f32(factor);
11051 wbps0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
11052 wbps1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
11053
11054 if (shift == 0) {
11055 for (i = 0; i < frameCount4; ++i) {
11056 int32x4_t lefti;
11057 int32x4_t righti;
11058 float32x4_t leftf;
11059 float32x4_t rightf;
11060
11061 uint32x4_t mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
11062 uint32x4_t side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
11063
11064 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
11065
11066 lefti = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
11067 righti = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
11068
11069 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11070 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11071
11072 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11073 }
11074
11075 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11076 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11077 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11078
11079 mid = (mid << 1) | (side & 0x01);
11080
11081 pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
11082 pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
11083 }
11084 } else {
11085 shift -= 1;
11086 shift4 = vdupq_n_s32(shift);
11087 for (i = 0; i < frameCount4; ++i) {
11088 uint32x4_t mid;
11089 uint32x4_t side;
11090 int32x4_t lefti;
11091 int32x4_t righti;
11092 float32x4_t leftf;
11093 float32x4_t rightf;
11094
11095 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
11096 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
11097
11098 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
11099
11100 lefti = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
11101 righti = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
11102
11103 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11104 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11105
11106 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11107 }
11108
11109 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11110 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11111 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11112
11113 mid = (mid << 1) | (side & 0x01);
11114
11115 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
11116 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
11117 }
11118 }
11119}
11120#endif
11121
11122static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11123{
11124#if defined(DRFLAC_SUPPORT_SSE2)
11125 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11126 drflac_read_pcm_frames_f32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11127 } else
11128#elif defined(DRFLAC_SUPPORT_NEON)
11129 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11130 drflac_read_pcm_frames_f32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11131 } else
11132#endif
11133 {
11134 /* Scalar fallback. */
11135#if 0
11136 drflac_read_pcm_frames_f32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11137#else
11138 drflac_read_pcm_frames_f32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11139#endif
11140 }
11141}
11142
11143#if 0
11144static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11145{
11146 for (drflac_uint64 i = 0; i < frameCount; ++i) {
11147 pOutputSamples[i*2+0] = (float)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) / 2147483648.0);
11148 pOutputSamples[i*2+1] = (float)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) / 2147483648.0);
11149 }
11150}
11151#endif
11152
11153static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11154{
11155 drflac_uint64 i;
11156 drflac_uint64 frameCount4 = frameCount >> 2;
11157 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11158 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11159 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11160 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11161 float factor = 1 / 2147483648.0;
11162
11163 for (i = 0; i < frameCount4; ++i) {
11164 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
11165 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
11166 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
11167 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
11168
11169 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
11170 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
11171 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
11172 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
11173
11174 pOutputSamples[i*8+0] = (drflac_int32)tempL0 * factor;
11175 pOutputSamples[i*8+1] = (drflac_int32)tempR0 * factor;
11176 pOutputSamples[i*8+2] = (drflac_int32)tempL1 * factor;
11177 pOutputSamples[i*8+3] = (drflac_int32)tempR1 * factor;
11178 pOutputSamples[i*8+4] = (drflac_int32)tempL2 * factor;
11179 pOutputSamples[i*8+5] = (drflac_int32)tempR2 * factor;
11180 pOutputSamples[i*8+6] = (drflac_int32)tempL3 * factor;
11181 pOutputSamples[i*8+7] = (drflac_int32)tempR3 * factor;
11182 }
11183
11184 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11185 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11186 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11187 }
11188}
11189
11190#if defined(DRFLAC_SUPPORT_SSE2)
11191static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11192{
11193 drflac_uint64 i;
11194 drflac_uint64 frameCount4 = frameCount >> 2;
11195 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11196 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11197 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
11198 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
11199
11200 float factor = 1.0f / 8388608.0f;
11201 __m128 factor128 = _mm_set1_ps(factor);
11202
11203 for (i = 0; i < frameCount4; ++i) {
11204 __m128i lefti;
11205 __m128i righti;
11206 __m128 leftf;
11207 __m128 rightf;
11208
11209 lefti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
11210 righti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
11211
11212 leftf = _mm_mul_ps(_mm_cvtepi32_ps(lefti), factor128);
11213 rightf = _mm_mul_ps(_mm_cvtepi32_ps(righti), factor128);
11214
11215 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11216 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11217 }
11218
11219 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11220 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11221 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11222 }
11223}
11224#endif
11225
11226#if defined(DRFLAC_SUPPORT_NEON)
11227static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11228{
11229 drflac_uint64 i;
11230 drflac_uint64 frameCount4 = frameCount >> 2;
11231 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11232 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11233 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
11234 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
11235
11236 float factor = 1.0f / 8388608.0f;
11237 float32x4_t factor4 = vdupq_n_f32(factor);
11238 int32x4_t shift0_4 = vdupq_n_s32(shift0);
11239 int32x4_t shift1_4 = vdupq_n_s32(shift1);
11240
11241 for (i = 0; i < frameCount4; ++i) {
11242 int32x4_t lefti;
11243 int32x4_t righti;
11244 float32x4_t leftf;
11245 float32x4_t rightf;
11246
11247 lefti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
11248 righti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
11249
11250 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11251 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11252
11253 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11254 }
11255
11256 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11257 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11258 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11259 }
11260}
11261#endif
11262
11263static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11264{
11265#if defined(DRFLAC_SUPPORT_SSE2)
11266 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11267 drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11268 } else
11269#elif defined(DRFLAC_SUPPORT_NEON)
11270 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11271 drflac_read_pcm_frames_f32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11272 } else
11273#endif
11274 {
11275 /* Scalar fallback. */
11276#if 0
11277 drflac_read_pcm_frames_f32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11278#else
11279 drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11280#endif
11281 }
11282}
11283
11284DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut)
11285{
11286 drflac_uint64 framesRead;
11287 drflac_uint32 unusedBitsPerSample;
11288
11289 if (pFlac == NULL || framesToRead == 0) {
11290 return 0;
11291 }
11292
11293 if (pBufferOut == NULL) {
11294 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
11295 }
11296
11297 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
11298 unusedBitsPerSample = 32 - pFlac->bitsPerSample;
11299
11300 framesRead = 0;
11301 while (framesToRead > 0) {
11302 /* If we've run out of samples in this frame, go to the next. */
11303 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
11304 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
11305 break; /* Couldn't read the next frame, so just break from the loop and return. */
11306 }
11307 } else {
11308 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
11309 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
11310 drflac_uint64 frameCountThisIteration = framesToRead;
11311
11312 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
11313 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
11314 }
11315
11316 if (channelCount == 2) {
11317 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
11318 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
11319
11320 switch (pFlac->currentFLACFrame.header.channelAssignment)
11321 {
11322 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
11323 {
11324 drflac_read_pcm_frames_f32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11325 } break;
11326
11327 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
11328 {
11329 drflac_read_pcm_frames_f32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11330 } break;
11331
11332 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
11333 {
11334 drflac_read_pcm_frames_f32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11335 } break;
11336
11337 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
11338 default:
11339 {
11340 drflac_read_pcm_frames_f32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11341 } break;
11342 }
11343 } else {
11344 /* Generic interleaving. */
11345 drflac_uint64 i;
11346 for (i = 0; i < frameCountThisIteration; ++i) {
11347 unsigned int j;
11348 for (j = 0; j < channelCount; ++j) {
11349 drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
11350 pBufferOut[(i*channelCount)+j] = (float)(sampleS32 / 2147483648.0);
11351 }
11352 }
11353 }
11354
11355 framesRead += frameCountThisIteration;
11356 pBufferOut += frameCountThisIteration * channelCount;
11357 framesToRead -= frameCountThisIteration;
11358 pFlac->currentPCMFrame += frameCountThisIteration;
11359 pFlac->currentFLACFrame.pcmFramesRemaining -= (unsigned int)frameCountThisIteration;
11360 }
11361 }
11362
11363 return framesRead;
11364}
11365
11366
11367DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
11368{
11369 if (pFlac == NULL) {
11370 return DRFLAC_FALSE;
11371 }
11372
11373 /* Don't do anything if we're already on the seek point. */
11374 if (pFlac->currentPCMFrame == pcmFrameIndex) {
11375 return DRFLAC_TRUE;
11376 }
11377
11378 /*
11379 If we don't know where the first frame begins then we can't seek. This will happen when the STREAMINFO block was not present
11380 when the decoder was opened.
11381 */
11382 if (pFlac->firstFLACFramePosInBytes == 0) {
11383 return DRFLAC_FALSE;
11384 }
11385
11386 if (pcmFrameIndex == 0) {
11387 pFlac->currentPCMFrame = 0;
11388 return drflac__seek_to_first_frame(pFlac);
11389 } else {
11390 drflac_bool32 wasSuccessful = DRFLAC_FALSE;
11391 drflac_uint64 originalPCMFrame = pFlac->currentPCMFrame;
11392
11393 /* Clamp the sample to the end. */
11394 if (pcmFrameIndex > pFlac->totalPCMFrameCount) {
11395 pcmFrameIndex = pFlac->totalPCMFrameCount;
11396 }
11397
11398 /* If the target sample and the current sample are in the same frame we just move the position forward. */
11399 if (pcmFrameIndex > pFlac->currentPCMFrame) {
11400 /* Forward. */
11401 drflac_uint32 offset = (drflac_uint32)(pcmFrameIndex - pFlac->currentPCMFrame);
11402 if (pFlac->currentFLACFrame.pcmFramesRemaining > offset) {
11403 pFlac->currentFLACFrame.pcmFramesRemaining -= offset;
11404 pFlac->currentPCMFrame = pcmFrameIndex;
11405 return DRFLAC_TRUE;
11406 }
11407 } else {
11408 /* Backward. */
11409 drflac_uint32 offsetAbs = (drflac_uint32)(pFlac->currentPCMFrame - pcmFrameIndex);
11410 drflac_uint32 currentFLACFramePCMFrameCount = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
11411 drflac_uint32 currentFLACFramePCMFramesConsumed = currentFLACFramePCMFrameCount - pFlac->currentFLACFrame.pcmFramesRemaining;
11412 if (currentFLACFramePCMFramesConsumed > offsetAbs) {
11413 pFlac->currentFLACFrame.pcmFramesRemaining += offsetAbs;
11414 pFlac->currentPCMFrame = pcmFrameIndex;
11415 return DRFLAC_TRUE;
11416 }
11417 }
11418
11419 /*
11420 Different techniques depending on encapsulation. Using the native FLAC seektable with Ogg encapsulation is a bit awkward so
11421 we'll instead use Ogg's natural seeking facility.
11422 */
11423#ifndef DR_FLAC_NO_OGG
11424 if (pFlac->container == drflac_container_ogg)
11425 {
11426 wasSuccessful = drflac_ogg__seek_to_pcm_frame(pFlac, pcmFrameIndex);
11427 }
11428 else
11429#endif
11430 {
11431 /* First try seeking via the seek table. If this fails, fall back to a brute force seek which is much slower. */
11432 if (/*!wasSuccessful && */!pFlac->_noSeekTableSeek) {
11433 wasSuccessful = drflac__seek_to_pcm_frame__seek_table(pFlac, pcmFrameIndex);
11434 }
11435
11436#if !defined(DR_FLAC_NO_CRC)
11437 /* Fall back to binary search if seek table seeking fails. This requires the length of the stream to be known. */
11438 if (!wasSuccessful && !pFlac->_noBinarySearchSeek && pFlac->totalPCMFrameCount > 0) {
11439 wasSuccessful = drflac__seek_to_pcm_frame__binary_search(pFlac, pcmFrameIndex);
11440 }
11441#endif
11442
11443 /* Fall back to brute force if all else fails. */
11444 if (!wasSuccessful && !pFlac->_noBruteForceSeek) {
11445 wasSuccessful = drflac__seek_to_pcm_frame__brute_force(pFlac, pcmFrameIndex);
11446 }
11447 }
11448
11449 if (wasSuccessful) {
11450 pFlac->currentPCMFrame = pcmFrameIndex;
11451 } else {
11452 /* Seek failed. Try putting the decoder back to it's original state. */
11453 if (drflac_seek_to_pcm_frame(pFlac, originalPCMFrame) == DRFLAC_FALSE) {
11454 /* Failed to seek back to the original PCM frame. Fall back to 0. */
11455 drflac_seek_to_pcm_frame(pFlac, 0);
11456 }
11457 }
11458
11459 return wasSuccessful;
11460 }
11461}
11462
11463
11464
11465/* High Level APIs */
11466
11467#if defined(SIZE_MAX)
11468 #define DRFLAC_SIZE_MAX SIZE_MAX
11469#else
11470 #if defined(DRFLAC_64BIT)
11471 #define DRFLAC_SIZE_MAX ((drflac_uint64)0xFFFFFFFFFFFFFFFF)
11472 #else
11473 #define DRFLAC_SIZE_MAX 0xFFFFFFFF
11474 #endif
11475#endif
11476
11477
11478/* Using a macro as the definition of the drflac__full_decode_and_close_*() API family. Sue me. */
11479#define DRFLAC_DEFINE_FULL_READ_AND_CLOSE(extension, type) \
11480static type* drflac__full_read_and_close_ ## extension (drflac* pFlac, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut)\
11481{ \
11482 type* pSampleData = NULL; \
11483 drflac_uint64 totalPCMFrameCount; \
11484 \
11485 DRFLAC_ASSERT(pFlac != NULL); \
11486 \
11487 totalPCMFrameCount = pFlac->totalPCMFrameCount; \
11488 \
11489 if (totalPCMFrameCount == 0) { \
11490 type buffer[4096]; \
11491 drflac_uint64 pcmFramesRead; \
11492 size_t sampleDataBufferSize = sizeof(buffer); \
11493 \
11494 pSampleData = (type*)drflac__malloc_from_callbacks(sampleDataBufferSize, &pFlac->allocationCallbacks); \
11495 if (pSampleData == NULL) { \
11496 goto on_error; \
11497 } \
11498 \
11499 while ((pcmFramesRead = (drflac_uint64)drflac_read_pcm_frames_##extension(pFlac, sizeof(buffer)/sizeof(buffer[0])/pFlac->channels, buffer)) > 0) { \
11500 if (((totalPCMFrameCount + pcmFramesRead) * pFlac->channels * sizeof(type)) > sampleDataBufferSize) { \
11501 type* pNewSampleData; \
11502 size_t newSampleDataBufferSize; \
11503 \
11504 newSampleDataBufferSize = sampleDataBufferSize * 2; \
11505 pNewSampleData = (type*)drflac__realloc_from_callbacks(pSampleData, newSampleDataBufferSize, sampleDataBufferSize, &pFlac->allocationCallbacks); \
11506 if (pNewSampleData == NULL) { \
11507 drflac__free_from_callbacks(pSampleData, &pFlac->allocationCallbacks); \
11508 goto on_error; \
11509 } \
11510 \
11511 sampleDataBufferSize = newSampleDataBufferSize; \
11512 pSampleData = pNewSampleData; \
11513 } \
11514 \
11515 DRFLAC_COPY_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), buffer, (size_t)(pcmFramesRead*pFlac->channels*sizeof(type))); \
11516 totalPCMFrameCount += pcmFramesRead; \
11517 } \
11518 \
11519 /* At this point everything should be decoded, but we just want to fill the unused part buffer with silence - need to \
11520 protect those ears from random noise! */ \
11521 DRFLAC_ZERO_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), (size_t)(sampleDataBufferSize - totalPCMFrameCount*pFlac->channels*sizeof(type))); \
11522 } else { \
11523 drflac_uint64 dataSize = totalPCMFrameCount*pFlac->channels*sizeof(type); \
11524 if (dataSize > (drflac_uint64)DRFLAC_SIZE_MAX) { \
11525 goto on_error; /* The decoded data is too big. */ \
11526 } \
11527 \
11528 pSampleData = (type*)drflac__malloc_from_callbacks((size_t)dataSize, &pFlac->allocationCallbacks); /* <-- Safe cast as per the check above. */ \
11529 if (pSampleData == NULL) { \
11530 goto on_error; \
11531 } \
11532 \
11533 totalPCMFrameCount = drflac_read_pcm_frames_##extension(pFlac, pFlac->totalPCMFrameCount, pSampleData); \
11534 } \
11535 \
11536 if (sampleRateOut) *sampleRateOut = pFlac->sampleRate; \
11537 if (channelsOut) *channelsOut = pFlac->channels; \
11538 if (totalPCMFrameCountOut) *totalPCMFrameCountOut = totalPCMFrameCount; \
11539 \
11540 drflac_close(pFlac); \
11541 return pSampleData; \
11542 \
11543on_error: \
11544 drflac_close(pFlac); \
11545 return NULL; \
11546}
11547
11548DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s32, drflac_int32)
11549DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s16, drflac_int16)
11550DRFLAC_DEFINE_FULL_READ_AND_CLOSE(f32, float)
11551
11552DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11553{
11554 drflac* pFlac;
11555
11556 if (channelsOut) {
11557 *channelsOut = 0;
11558 }
11559 if (sampleRateOut) {
11560 *sampleRateOut = 0;
11561 }
11562 if (totalPCMFrameCountOut) {
11563 *totalPCMFrameCountOut = 0;
11564 }
11565
11566 pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11567 if (pFlac == NULL) {
11568 return NULL;
11569 }
11570
11571 return drflac__full_read_and_close_s32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11572}
11573
11574DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11575{
11576 drflac* pFlac;
11577
11578 if (channelsOut) {
11579 *channelsOut = 0;
11580 }
11581 if (sampleRateOut) {
11582 *sampleRateOut = 0;
11583 }
11584 if (totalPCMFrameCountOut) {
11585 *totalPCMFrameCountOut = 0;
11586 }
11587
11588 pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11589 if (pFlac == NULL) {
11590 return NULL;
11591 }
11592
11593 return drflac__full_read_and_close_s16(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11594}
11595
11596DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11597{
11598 drflac* pFlac;
11599
11600 if (channelsOut) {
11601 *channelsOut = 0;
11602 }
11603 if (sampleRateOut) {
11604 *sampleRateOut = 0;
11605 }
11606 if (totalPCMFrameCountOut) {
11607 *totalPCMFrameCountOut = 0;
11608 }
11609
11610 pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11611 if (pFlac == NULL) {
11612 return NULL;
11613 }
11614
11615 return drflac__full_read_and_close_f32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11616}
11617
11618#ifndef DR_FLAC_NO_STDIO
11619DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11620{
11621 drflac* pFlac;
11622
11623 if (sampleRate) {
11624 *sampleRate = 0;
11625 }
11626 if (channels) {
11627 *channels = 0;
11628 }
11629 if (totalPCMFrameCount) {
11630 *totalPCMFrameCount = 0;
11631 }
11632
11633 pFlac = drflac_open_file(filename, pAllocationCallbacks);
11634 if (pFlac == NULL) {
11635 return NULL;
11636 }
11637
11638 return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
11639}
11640
11641DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11642{
11643 drflac* pFlac;
11644
11645 if (sampleRate) {
11646 *sampleRate = 0;
11647 }
11648 if (channels) {
11649 *channels = 0;
11650 }
11651 if (totalPCMFrameCount) {
11652 *totalPCMFrameCount = 0;
11653 }
11654
11655 pFlac = drflac_open_file(filename, pAllocationCallbacks);
11656 if (pFlac == NULL) {
11657 return NULL;
11658 }
11659
11660 return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
11661}
11662
11663DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11664{
11665 drflac* pFlac;
11666
11667 if (sampleRate) {
11668 *sampleRate = 0;
11669 }
11670 if (channels) {
11671 *channels = 0;
11672 }
11673 if (totalPCMFrameCount) {
11674 *totalPCMFrameCount = 0;
11675 }
11676
11677 pFlac = drflac_open_file(filename, pAllocationCallbacks);
11678 if (pFlac == NULL) {
11679 return NULL;
11680 }
11681
11682 return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
11683}
11684#endif
11685
11686DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11687{
11688 drflac* pFlac;
11689
11690 if (sampleRate) {
11691 *sampleRate = 0;
11692 }
11693 if (channels) {
11694 *channels = 0;
11695 }
11696 if (totalPCMFrameCount) {
11697 *totalPCMFrameCount = 0;
11698 }
11699
11700 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11701 if (pFlac == NULL) {
11702 return NULL;
11703 }
11704
11705 return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
11706}
11707
11708DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11709{
11710 drflac* pFlac;
11711
11712 if (sampleRate) {
11713 *sampleRate = 0;
11714 }
11715 if (channels) {
11716 *channels = 0;
11717 }
11718 if (totalPCMFrameCount) {
11719 *totalPCMFrameCount = 0;
11720 }
11721
11722 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11723 if (pFlac == NULL) {
11724 return NULL;
11725 }
11726
11727 return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
11728}
11729
11730DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11731{
11732 drflac* pFlac;
11733
11734 if (sampleRate) {
11735 *sampleRate = 0;
11736 }
11737 if (channels) {
11738 *channels = 0;
11739 }
11740 if (totalPCMFrameCount) {
11741 *totalPCMFrameCount = 0;
11742 }
11743
11744 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11745 if (pFlac == NULL) {
11746 return NULL;
11747 }
11748
11749 return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
11750}
11751
11752
11753DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
11754{
11755 if (pAllocationCallbacks != NULL) {
11756 drflac__free_from_callbacks(p, pAllocationCallbacks);
11757 } else {
11758 drflac__free_default(p, NULL);
11759 }
11760}
11761
11762
11763
11764
11765DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments)
11766{
11767 if (pIter == NULL) {
11768 return;
11769 }
11770
11771 pIter->countRemaining = commentCount;
11772 pIter->pRunningData = (const char*)pComments;
11773}
11774
11775DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut)
11776{
11777 drflac_int32 length;
11778 const char* pComment;
11779
11780 /* Safety. */
11781 if (pCommentLengthOut) {
11782 *pCommentLengthOut = 0;
11783 }
11784
11785 if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
11786 return NULL;
11787 }
11788
11789 length = drflac__le2host_32(*(const drflac_uint32*)pIter->pRunningData);
11790 pIter->pRunningData += 4;
11791
11792 pComment = pIter->pRunningData;
11793 pIter->pRunningData += length;
11794 pIter->countRemaining -= 1;
11795
11796 if (pCommentLengthOut) {
11797 *pCommentLengthOut = length;
11798 }
11799
11800 return pComment;
11801}
11802
11803
11804
11805
11806DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData)
11807{
11808 if (pIter == NULL) {
11809 return;
11810 }
11811
11812 pIter->countRemaining = trackCount;
11813 pIter->pRunningData = (const char*)pTrackData;
11814}
11815
11816DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack)
11817{
11818 drflac_cuesheet_track cuesheetTrack;
11819 const char* pRunningData;
11820 drflac_uint64 offsetHi;
11821 drflac_uint64 offsetLo;
11822
11823 if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
11824 return DRFLAC_FALSE;
11825 }
11826
11827 pRunningData = pIter->pRunningData;
11828
11829 offsetHi = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
11830 offsetLo = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
11831 cuesheetTrack.offset = offsetLo | (offsetHi << 32);
11832 cuesheetTrack.trackNumber = pRunningData[0]; pRunningData += 1;
11833 DRFLAC_COPY_MEMORY(cuesheetTrack.ISRC, pRunningData, sizeof(cuesheetTrack.ISRC)); pRunningData += 12;
11834 cuesheetTrack.isAudio = (pRunningData[0] & 0x80) != 0;
11835 cuesheetTrack.preEmphasis = (pRunningData[0] & 0x40) != 0; pRunningData += 14;
11836 cuesheetTrack.indexCount = pRunningData[0]; pRunningData += 1;
11837 cuesheetTrack.pIndexPoints = (const drflac_cuesheet_track_index*)pRunningData; pRunningData += cuesheetTrack.indexCount * sizeof(drflac_cuesheet_track_index);
11838
11839 pIter->pRunningData = pRunningData;
11840 pIter->countRemaining -= 1;
11841
11842 if (pCuesheetTrack) {
11843 *pCuesheetTrack = cuesheetTrack;
11844 }
11845
11846 return DRFLAC_TRUE;
11847}
11848
11849#if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
11850 #pragma GCC diagnostic pop
11851#endif
11852#endif /* dr_flac_c */
11853#endif /* DR_FLAC_IMPLEMENTATION */
11854
11855
11856/*
11857REVISION HISTORY
11858================
11859v0.12.33 - 2021-12-22
11860 - Fix a bug with seeking when the seek table does not start at PCM frame 0.
11861
11862v0.12.32 - 2021-12-11
11863 - Fix a warning with Clang.
11864
11865v0.12.31 - 2021-08-16
11866 - Silence some warnings.
11867
11868v0.12.30 - 2021-07-31
11869 - Fix platform detection for ARM64.
11870
11871v0.12.29 - 2021-04-02
11872 - Fix a bug where the running PCM frame index is set to an invalid value when over-seeking.
11873 - Fix a decoding error due to an incorrect validation check.
11874
11875v0.12.28 - 2021-02-21
11876 - Fix a warning due to referencing _MSC_VER when it is undefined.
11877
11878v0.12.27 - 2021-01-31
11879 - Fix a static analysis warning.
11880
11881v0.12.26 - 2021-01-17
11882 - Fix a compilation warning due to _BSD_SOURCE being deprecated.
11883
11884v0.12.25 - 2020-12-26
11885 - Update documentation.
11886
11887v0.12.24 - 2020-11-29
11888 - Fix ARM64/NEON detection when compiling with MSVC.
11889
11890v0.12.23 - 2020-11-21
11891 - Fix compilation with OpenWatcom.
11892
11893v0.12.22 - 2020-11-01
11894 - Fix an error with the previous release.
11895
11896v0.12.21 - 2020-11-01
11897 - Fix a possible deadlock when seeking.
11898 - Improve compiler support for older versions of GCC.
11899
11900v0.12.20 - 2020-09-08
11901 - Fix a compilation error on older compilers.
11902
11903v0.12.19 - 2020-08-30
11904 - Fix a bug due to an undefined 32-bit shift.
11905
11906v0.12.18 - 2020-08-14
11907 - Fix a crash when compiling with clang-cl.
11908
11909v0.12.17 - 2020-08-02
11910 - Simplify sized types.
11911
11912v0.12.16 - 2020-07-25
11913 - Fix a compilation warning.
11914
11915v0.12.15 - 2020-07-06
11916 - Check for negative LPC shifts and return an error.
11917
11918v0.12.14 - 2020-06-23
11919 - Add include guard for the implementation section.
11920
11921v0.12.13 - 2020-05-16
11922 - Add compile-time and run-time version querying.
11923 - DRFLAC_VERSION_MINOR
11924 - DRFLAC_VERSION_MAJOR
11925 - DRFLAC_VERSION_REVISION
11926 - DRFLAC_VERSION_STRING
11927 - drflac_version()
11928 - drflac_version_string()
11929
11930v0.12.12 - 2020-04-30
11931 - Fix compilation errors with VC6.
11932
11933v0.12.11 - 2020-04-19
11934 - Fix some pedantic warnings.
11935 - Fix some undefined behaviour warnings.
11936
11937v0.12.10 - 2020-04-10
11938 - Fix some bugs when trying to seek with an invalid seek table.
11939
11940v0.12.9 - 2020-04-05
11941 - Fix warnings.
11942
11943v0.12.8 - 2020-04-04
11944 - Add drflac_open_file_w() and drflac_open_file_with_metadata_w().
11945 - Fix some static analysis warnings.
11946 - Minor documentation updates.
11947
11948v0.12.7 - 2020-03-14
11949 - Fix compilation errors with VC6.
11950
11951v0.12.6 - 2020-03-07
11952 - Fix compilation error with Visual Studio .NET 2003.
11953
11954v0.12.5 - 2020-01-30
11955 - Silence some static analysis warnings.
11956
11957v0.12.4 - 2020-01-29
11958 - Silence some static analysis warnings.
11959
11960v0.12.3 - 2019-12-02
11961 - Fix some warnings when compiling with GCC and the -Og flag.
11962 - Fix a crash in out-of-memory situations.
11963 - Fix potential integer overflow bug.
11964 - Fix some static analysis warnings.
11965 - Fix a possible crash when using custom memory allocators without a custom realloc() implementation.
11966 - Fix a bug with binary search seeking where the bits per sample is not a multiple of 8.
11967
11968v0.12.2 - 2019-10-07
11969 - Internal code clean up.
11970
11971v0.12.1 - 2019-09-29
11972 - Fix some Clang Static Analyzer warnings.
11973 - Fix an unused variable warning.
11974
11975v0.12.0 - 2019-09-23
11976 - API CHANGE: Add support for user defined memory allocation routines. This system allows the program to specify their own memory allocation
11977 routines with a user data pointer for client-specific contextual data. This adds an extra parameter to the end of the following APIs:
11978 - drflac_open()
11979 - drflac_open_relaxed()
11980 - drflac_open_with_metadata()
11981 - drflac_open_with_metadata_relaxed()
11982 - drflac_open_file()
11983 - drflac_open_file_with_metadata()
11984 - drflac_open_memory()
11985 - drflac_open_memory_with_metadata()
11986 - drflac_open_and_read_pcm_frames_s32()
11987 - drflac_open_and_read_pcm_frames_s16()
11988 - drflac_open_and_read_pcm_frames_f32()
11989 - drflac_open_file_and_read_pcm_frames_s32()
11990 - drflac_open_file_and_read_pcm_frames_s16()
11991 - drflac_open_file_and_read_pcm_frames_f32()
11992 - drflac_open_memory_and_read_pcm_frames_s32()
11993 - drflac_open_memory_and_read_pcm_frames_s16()
11994 - drflac_open_memory_and_read_pcm_frames_f32()
11995 Set this extra parameter to NULL to use defaults which is the same as the previous behaviour. Setting this NULL will use
11996 DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
11997 - Remove deprecated APIs:
11998 - drflac_read_s32()
11999 - drflac_read_s16()
12000 - drflac_read_f32()
12001 - drflac_seek_to_sample()
12002 - drflac_open_and_decode_s32()
12003 - drflac_open_and_decode_s16()
12004 - drflac_open_and_decode_f32()
12005 - drflac_open_and_decode_file_s32()
12006 - drflac_open_and_decode_file_s16()
12007 - drflac_open_and_decode_file_f32()
12008 - drflac_open_and_decode_memory_s32()
12009 - drflac_open_and_decode_memory_s16()
12010 - drflac_open_and_decode_memory_f32()
12011 - Remove drflac.totalSampleCount which is now replaced with drflac.totalPCMFrameCount. You can emulate drflac.totalSampleCount
12012 by doing pFlac->totalPCMFrameCount*pFlac->channels.
12013 - Rename drflac.currentFrame to drflac.currentFLACFrame to remove ambiguity with PCM frames.
12014 - Fix errors when seeking to the end of a stream.
12015 - Optimizations to seeking.
12016 - SSE improvements and optimizations.
12017 - ARM NEON optimizations.
12018 - Optimizations to drflac_read_pcm_frames_s16().
12019 - Optimizations to drflac_read_pcm_frames_s32().
12020
12021v0.11.10 - 2019-06-26
12022 - Fix a compiler error.
12023
12024v0.11.9 - 2019-06-16
12025 - Silence some ThreadSanitizer warnings.
12026
12027v0.11.8 - 2019-05-21
12028 - Fix warnings.
12029
12030v0.11.7 - 2019-05-06
12031 - C89 fixes.
12032
12033v0.11.6 - 2019-05-05
12034 - Add support for C89.
12035 - Fix a compiler warning when CRC is disabled.
12036 - Change license to choice of public domain or MIT-0.
12037
12038v0.11.5 - 2019-04-19
12039 - Fix a compiler error with GCC.
12040
12041v0.11.4 - 2019-04-17
12042 - Fix some warnings with GCC when compiling with -std=c99.
12043
12044v0.11.3 - 2019-04-07
12045 - Silence warnings with GCC.
12046
12047v0.11.2 - 2019-03-10
12048 - Fix a warning.
12049
12050v0.11.1 - 2019-02-17
12051 - Fix a potential bug with seeking.
12052
12053v0.11.0 - 2018-12-16
12054 - API CHANGE: Deprecated drflac_read_s32(), drflac_read_s16() and drflac_read_f32() and replaced them with
12055 drflac_read_pcm_frames_s32(), drflac_read_pcm_frames_s16() and drflac_read_pcm_frames_f32(). The new APIs take
12056 and return PCM frame counts instead of sample counts. To upgrade you will need to change the input count by
12057 dividing it by the channel count, and then do the same with the return value.
12058 - API_CHANGE: Deprecated drflac_seek_to_sample() and replaced with drflac_seek_to_pcm_frame(). Same rules as
12059 the changes to drflac_read_*() apply.
12060 - API CHANGE: Deprecated drflac_open_and_decode_*() and replaced with drflac_open_*_and_read_*(). Same rules as
12061 the changes to drflac_read_*() apply.
12062 - Optimizations.
12063
12064v0.10.0 - 2018-09-11
12065 - Remove the DR_FLAC_NO_WIN32_IO option and the Win32 file IO functionality. If you need to use Win32 file IO you
12066 need to do it yourself via the callback API.
12067 - Fix the clang build.
12068 - Fix undefined behavior.
12069 - Fix errors with CUESHEET metdata blocks.
12070 - Add an API for iterating over each cuesheet track in the CUESHEET metadata block. This works the same way as the
12071 Vorbis comment API.
12072 - Other miscellaneous bug fixes, mostly relating to invalid FLAC streams.
12073 - Minor optimizations.
12074
12075v0.9.11 - 2018-08-29
12076 - Fix a bug with sample reconstruction.
12077
12078v0.9.10 - 2018-08-07
12079 - Improve 64-bit detection.
12080
12081v0.9.9 - 2018-08-05
12082 - Fix C++ build on older versions of GCC.
12083
12084v0.9.8 - 2018-07-24
12085 - Fix compilation errors.
12086
12087v0.9.7 - 2018-07-05
12088 - Fix a warning.
12089
12090v0.9.6 - 2018-06-29
12091 - Fix some typos.
12092
12093v0.9.5 - 2018-06-23
12094 - Fix some warnings.
12095
12096v0.9.4 - 2018-06-14
12097 - Optimizations to seeking.
12098 - Clean up.
12099
12100v0.9.3 - 2018-05-22
12101 - Bug fix.
12102
12103v0.9.2 - 2018-05-12
12104 - Fix a compilation error due to a missing break statement.
12105
12106v0.9.1 - 2018-04-29
12107 - Fix compilation error with Clang.
12108
12109v0.9 - 2018-04-24
12110 - Fix Clang build.
12111 - Start using major.minor.revision versioning.
12112
12113v0.8g - 2018-04-19
12114 - Fix build on non-x86/x64 architectures.
12115
12116v0.8f - 2018-02-02
12117 - Stop pretending to support changing rate/channels mid stream.
12118
12119v0.8e - 2018-02-01
12120 - Fix a crash when the block size of a frame is larger than the maximum block size defined by the FLAC stream.
12121 - Fix a crash the the Rice partition order is invalid.
12122
12123v0.8d - 2017-09-22
12124 - Add support for decoding streams with ID3 tags. ID3 tags are just skipped.
12125
12126v0.8c - 2017-09-07
12127 - Fix warning on non-x86/x64 architectures.
12128
12129v0.8b - 2017-08-19
12130 - Fix build on non-x86/x64 architectures.
12131
12132v0.8a - 2017-08-13
12133 - A small optimization for the Clang build.
12134
12135v0.8 - 2017-08-12
12136 - API CHANGE: Rename dr_* types to drflac_*.
12137 - Optimizations. This brings dr_flac back to about the same class of efficiency as the reference implementation.
12138 - Add support for custom implementations of malloc(), realloc(), etc.
12139 - Add CRC checking to Ogg encapsulated streams.
12140 - Fix VC++ 6 build. This is only for the C++ compiler. The C compiler is not currently supported.
12141 - Bug fixes.
12142
12143v0.7 - 2017-07-23
12144 - Add support for opening a stream without a header block. To do this, use drflac_open_relaxed() / drflac_open_with_metadata_relaxed().
12145
12146v0.6 - 2017-07-22
12147 - Add support for recovering from invalid frames. With this change, dr_flac will simply skip over invalid frames as if they
12148 never existed. Frames are checked against their sync code, the CRC-8 of the frame header and the CRC-16 of the whole frame.
12149
12150v0.5 - 2017-07-16
12151 - Fix typos.
12152 - Change drflac_bool* types to unsigned.
12153 - Add CRC checking. This makes dr_flac slower, but can be disabled with #define DR_FLAC_NO_CRC.
12154
12155v0.4f - 2017-03-10
12156 - Fix a couple of bugs with the bitstreaming code.
12157
12158v0.4e - 2017-02-17
12159 - Fix some warnings.
12160
12161v0.4d - 2016-12-26
12162 - Add support for 32-bit floating-point PCM decoding.
12163 - Use drflac_int* and drflac_uint* sized types to improve compiler support.
12164 - Minor improvements to documentation.
12165
12166v0.4c - 2016-12-26
12167 - Add support for signed 16-bit integer PCM decoding.
12168
12169v0.4b - 2016-10-23
12170 - A minor change to drflac_bool8 and drflac_bool32 types.
12171
12172v0.4a - 2016-10-11
12173 - Rename drBool32 to drflac_bool32 for styling consistency.
12174
12175v0.4 - 2016-09-29
12176 - API/ABI CHANGE: Use fixed size 32-bit booleans instead of the built-in bool type.
12177 - API CHANGE: Rename drflac_open_and_decode*() to drflac_open_and_decode*_s32().
12178 - API CHANGE: Swap the order of "channels" and "sampleRate" parameters in drflac_open_and_decode*(). Rationale for this is to
12179 keep it consistent with drflac_audio.
12180
12181v0.3f - 2016-09-21
12182 - Fix a warning with GCC.
12183
12184v0.3e - 2016-09-18
12185 - Fixed a bug where GCC 4.3+ was not getting properly identified.
12186 - Fixed a few typos.
12187 - Changed date formats to ISO 8601 (YYYY-MM-DD).
12188
12189v0.3d - 2016-06-11
12190 - Minor clean up.
12191
12192v0.3c - 2016-05-28
12193 - Fixed compilation error.
12194
12195v0.3b - 2016-05-16
12196 - Fixed Linux/GCC build.
12197 - Updated documentation.
12198
12199v0.3a - 2016-05-15
12200 - Minor fixes to documentation.
12201
12202v0.3 - 2016-05-11
12203 - Optimizations. Now at about parity with the reference implementation on 32-bit builds.
12204 - Lots of clean up.
12205
12206v0.2b - 2016-05-10
12207 - Bug fixes.
12208
12209v0.2a - 2016-05-10
12210 - Made drflac_open_and_decode() more robust.
12211 - Removed an unused debugging variable
12212
12213v0.2 - 2016-05-09
12214 - Added support for Ogg encapsulation.
12215 - API CHANGE. Have the onSeek callback take a third argument which specifies whether or not the seek
12216 should be relative to the start or the current position. Also changes the seeking rules such that
12217 seeking offsets will never be negative.
12218 - Have drflac_open_and_decode() fail gracefully if the stream has an unknown total sample count.
12219
12220v0.1b - 2016-05-07
12221 - Properly close the file handle in drflac_open_file() and family when the decoder fails to initialize.
12222 - Removed a stale comment.
12223
12224v0.1a - 2016-05-05
12225 - Minor formatting changes.
12226 - Fixed a warning on the GCC build.
12227
12228v0.1 - 2016-05-03
12229 - Initial versioned release.
12230*/
12231
12232/*
12233This software is available as a choice of the following licenses. Choose
12234whichever you prefer.
12235
12236===============================================================================
12237ALTERNATIVE 1 - Public Domain (www.unlicense.org)
12238===============================================================================
12239This is free and unencumbered software released into the public domain.
12240
12241Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
12242software, either in source code form or as a compiled binary, for any purpose,
12243commercial or non-commercial, and by any means.
12244
12245In jurisdictions that recognize copyright laws, the author or authors of this
12246software dedicate any and all copyright interest in the software to the public
12247domain. We make this dedication for the benefit of the public at large and to
12248the detriment of our heirs and successors. We intend this dedication to be an
12249overt act of relinquishment in perpetuity of all present and future rights to
12250this software under copyright law.
12251
12252THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12253IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
12254FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
12255AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
12256ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
12257WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
12258
12259For more information, please refer to <http://unlicense.org/>
12260
12261===============================================================================
12262ALTERNATIVE 2 - MIT No Attribution
12263===============================================================================
12264Copyright 2020 David Reid
12265
12266Permission is hereby granted, free of charge, to any person obtaining a copy of
12267this software and associated documentation files (the "Software"), to deal in
12268the Software without restriction, including without limitation the rights to
12269use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
12270of the Software, and to permit persons to whom the Software is furnished to do
12271so.
12272
12273THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12274IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
12275FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
12276AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
12277LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
12278OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
12279SOFTWARE.
12280*/
12281