dr_flac.h source code [LOVE/libraries/dr_flac/dr_flac.h]

1	/*
2	FLAC audio decoder. Choice of public domain or MIT-0. See license statements at the end of this file.
3	dr_flac - v0.12.33 - 2021-12-22
4
5	David Reid - mackron@gmail.com
6
7	GitHub: https://github.com/mackron/dr_libs
8	*/
9
10	/*
11	RELEASE NOTES - v0.12.0
12	=======================
13	Version 0.12.0 has breaking API changes including changes to the existing API and the removal of deprecated APIs.
14
15
16	Improved Client-Defined Memory Allocation
17	-----------------------------------------
18	The main change with this release is the addition of a more flexible way of implementing custom memory allocation routines. The
19	existing system of DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE are still in place and will be used by default when no custom
20	allocation callbacks are specified.
21
22	To use the new system, you pass in a pointer to a drflac_allocation_callbacks object to drflac_open() and family, like this:
23
24	void my_malloc(size_t sz, void* pUserData)*
25	{
26	return malloc(sz);
27	}
28	void my_realloc(void* p, size_t sz, void* pUserData)*
29	{
30	return realloc(p, sz);
31	}
32	void my_free(void p, void* pUserData)*
33	{
34	free(p);
35	}
36
37	...
38
39	drflac_allocation_callbacks allocationCallbacks;
40	allocationCallbacks.pUserData = &myData;
41	allocationCallbacks.onMalloc = my_malloc;
42	allocationCallbacks.onRealloc = my_realloc;
43	allocationCallbacks.onFree = my_free;
44	drflac pFlac = drflac_open_file("my_file.flac", &allocationCallbacks);*
45
46	The advantage of this new system is that it allows you to specify user data which will be passed in to the allocation routines.
47
48	Passing in null for the allocation callbacks object will cause dr_flac to use defaults which is the same as DRFLAC_MALLOC,
49	DRFLAC_REALLOC and DRFLAC_FREE and the equivalent of how it worked in previous versions.
50
51	Every API that opens a drflac object now takes this extra parameter. These include the following:
52
53	drflac_open()
54	drflac_open_relaxed()
55	drflac_open_with_metadata()
56	drflac_open_with_metadata_relaxed()
57	drflac_open_file()
58	drflac_open_file_with_metadata()
59	drflac_open_memory()
60	drflac_open_memory_with_metadata()
61	drflac_open_and_read_pcm_frames_s32()
62	drflac_open_and_read_pcm_frames_s16()
63	drflac_open_and_read_pcm_frames_f32()
64	drflac_open_file_and_read_pcm_frames_s32()
65	drflac_open_file_and_read_pcm_frames_s16()
66	drflac_open_file_and_read_pcm_frames_f32()
67	drflac_open_memory_and_read_pcm_frames_s32()
68	drflac_open_memory_and_read_pcm_frames_s16()
69	drflac_open_memory_and_read_pcm_frames_f32()
70
71
72
73	Optimizations
74	-------------
75	Seeking performance has been greatly improved. A new binary search based seeking algorithm has been introduced which significantly
76	improves performance over the brute force method which was used when no seek table was present. Seek table based seeking also takes
77	advantage of the new binary search seeking system to further improve performance there as well. Note that this depends on CRC which
78	means it will be disabled when DR_FLAC_NO_CRC is used.
79
80	The SSE4.1 pipeline has been cleaned up and optimized. You should see some improvements with decoding speed of 24-bit files in
81	particular. 16-bit streams should also see some improvement.
82
83	drflac_read_pcm_frames_s16() has been optimized. Previously this sat on top of drflac_read_pcm_frames_s32() and performed it's s32
84	to s16 conversion in a second pass. This is now all done in a single pass. This includes SSE2 and ARM NEON optimized paths.
85
86	A minor optimization has been implemented for drflac_read_pcm_frames_s32(). This will now use an SSE2 optimized pipeline for stereo
87	channel reconstruction which is the last part of the decoding process.
88
89	The ARM build has seen a few improvements. The CLZ (count leading zeroes) and REV (byte swap) instructions are now used when
90	compiling with GCC and Clang which is achieved using inline assembly. The CLZ instruction requires ARM architecture version 5 at
91	compile time and the REV instruction requires ARM architecture version 6.
92
93	An ARM NEON optimized pipeline has been implemented. To enable this you'll need to add -mfpu=neon to the command line when compiling.
94
95
96	Removed APIs
97	------------
98	The following APIs were deprecated in version 0.11.0 and have been completely removed in version 0.12.0:
99
100	drflac_read_s32() -> drflac_read_pcm_frames_s32()
101	drflac_read_s16() -> drflac_read_pcm_frames_s16()
102	drflac_read_f32() -> drflac_read_pcm_frames_f32()
103	drflac_seek_to_sample() -> drflac_seek_to_pcm_frame()
104	drflac_open_and_decode_s32() -> drflac_open_and_read_pcm_frames_s32()
105	drflac_open_and_decode_s16() -> drflac_open_and_read_pcm_frames_s16()
106	drflac_open_and_decode_f32() -> drflac_open_and_read_pcm_frames_f32()
107	drflac_open_and_decode_file_s32() -> drflac_open_file_and_read_pcm_frames_s32()
108	drflac_open_and_decode_file_s16() -> drflac_open_file_and_read_pcm_frames_s16()
109	drflac_open_and_decode_file_f32() -> drflac_open_file_and_read_pcm_frames_f32()
110	drflac_open_and_decode_memory_s32() -> drflac_open_memory_and_read_pcm_frames_s32()
111	drflac_open_and_decode_memory_s16() -> drflac_open_memory_and_read_pcm_frames_s16()
112	drflac_open_and_decode_memory_f32() -> drflac_open_memroy_and_read_pcm_frames_f32()
113
114	Prior versions of dr_flac operated on a per-sample basis whereas now it operates on PCM frames. The removed APIs all relate
115	to the old per-sample APIs. You now need to use the "pcm_frame" versions.
116	*/
117
118
119	/*
120	Introduction
121	============
122	dr_flac is a single file library. To use it, do something like the following in one .c file.
123
124	```c
125	#define DR_FLAC_IMPLEMENTATION
126	#include "dr_flac.h"
127	```
128
129	You can then #include this file in other parts of the program as you would with any other header file. To decode audio data, do something like the following:
130
131	```c
132	drflac pFlac = drflac_open_file("MySong.flac", NULL);*
133	if (pFlac == NULL) {
134	// Failed to open FLAC file
135	}
136
137	drflac_int32 pSamples = malloc(pFlac->totalPCMFrameCount * pFlac->channels * sizeof(drflac_int32));*
138	drflac_uint64 numberOfInterleavedSamplesActuallyRead = drflac_read_pcm_frames_s32(pFlac, pFlac->totalPCMFrameCount, pSamples);
139	```
140
141	The drflac object represents the decoder. It is a transparent type so all the information you need, such as the number of channels and the bits per sample,
142	should be directly accessible - just make sure you don't change their values. Samples are always output as interleaved signed 32-bit PCM. In the example above
143	a native FLAC stream was opened, however dr_flac has seamless support for Ogg encapsulated FLAC streams as well.
144
145	You do not need to decode the entire stream in one go - you just specify how many samples you'd like at any given time and the decoder will give you as many
146	samples as it can, up to the amount requested. Later on when you need the next batch of samples, just call it again. Example:
147
148	```c
149	while (drflac_read_pcm_frames_s32(pFlac, chunkSizeInPCMFrames, pChunkSamples) > 0) {
150	do_something();
151	}
152	```
153
154	You can seek to a specific PCM frame with `drflac_seek_to_pcm_frame()`.
155
156	If you just want to quickly decode an entire FLAC file in one go you can do something like this:
157
158	```c
159	unsigned int channels;
160	unsigned int sampleRate;
161	drflac_uint64 totalPCMFrameCount;
162	drflac_int32 pSampleData = drflac_open_file_and_read_pcm_frames_s32("MySong.flac", &channels, &sampleRate, &totalPCMFrameCount, NULL);*
163	if (pSampleData == NULL) {
164	// Failed to open and decode FLAC file.
165	}
166
167	...
168
169	drflac_free(pSampleData, NULL);
170	```
171
172	You can read samples as signed 16-bit integer and 32-bit floating-point PCM with the _s16() and _f32() family of APIs respectively, but note that these
173	should be considered lossy.
174
175
176	If you need access to metadata (album art, etc.), use `drflac_open_with_metadata()`, `drflac_open_file_with_metdata()` or `drflac_open_memory_with_metadata()`.
177	The rationale for keeping these APIs separate is that they're slightly slower than the normal versions and also just a little bit harder to use. dr_flac
178	reports metadata to the application through the use of a callback, and every metadata block is reported before `drflac_open_with_metdata()` returns.
179
180	The main opening APIs (`drflac_open()`, etc.) will fail if the header is not present. The presents a problem in certain scenarios such as broadcast style
181	streams or internet radio where the header may not be present because the user has started playback mid-stream. To handle this, use the relaxed APIs:
182
183	`drflac_open_relaxed()`
184	`drflac_open_with_metadata_relaxed()`
185
186	It is not recommended to use these APIs for file based streams because a missing header would usually indicate a corrupt or perverse file. In addition, these
187	APIs can take a long time to initialize because they may need to spend a lot of time finding the first frame.
188
189
190
191	Build Options
192	=============
193	#define these options before including this file.
194
195	#define DR_FLAC_NO_STDIO
196	Disable `drflac_open_file()` and family.
197
198	#define DR_FLAC_NO_OGG
199	Disables support for Ogg/FLAC streams.
200
201	#define DR_FLAC_BUFFER_SIZE <number>
202	Defines the size of the internal buffer to store data from onRead(). This buffer is used to reduce the number of calls back to the client for more data.
203	Larger values means more memory, but better performance. My tests show diminishing returns after about 4KB (which is the default). Consider reducing this if
204	you have a very efficient implementation of onRead(), or increase it if it's very inefficient. Must be a multiple of 8.
205
206	#define DR_FLAC_NO_CRC
207	Disables CRC checks. This will offer a performance boost when CRC is unnecessary. This will disable binary search seeking. When seeking, the seek table will
208	be used if available. Otherwise the seek will be performed using brute force.
209
210	#define DR_FLAC_NO_SIMD
211	Disables SIMD optimizations (SSE on x86/x64 architectures, NEON on ARM architectures). Use this if you are having compatibility issues with your compiler.
212
213
214
215	Notes
216	=====
217	- dr_flac does not support changing the sample rate nor channel count mid stream.
218	- dr_flac is not thread-safe, but its APIs can be called from any thread so long as you do your own synchronization.
219	- When using Ogg encapsulation, a corrupted metadata block will result in `drflac_open_with_metadata()` and `drflac_open()` returning inconsistent samples due
220	to differences in corrupted stream recorvery logic between the two APIs.
221	*/
222
223	#ifndef dr_flac_h
224	#define dr_flac_h
225
226	#ifdef __cplusplus
227	extern "C" {
228	#endif
229
230	#define DRFLAC_STRINGIFY(x) #x
231	#define DRFLAC_XSTRINGIFY(x) DRFLAC_STRINGIFY(x)
232
233	#define DRFLAC_VERSION_MAJOR 0
234	#define DRFLAC_VERSION_MINOR 12
235	#define DRFLAC_VERSION_REVISION 33
236	#define DRFLAC_VERSION_STRING DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MAJOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MINOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_REVISION)
237
238	#include <stddef.h> /* For size_t. */
239
240	/ Sized types. /
241	typedef signed char drflac_int8;
242	typedef unsigned char drflac_uint8;
243	typedef signed short drflac_int16;
244	typedef unsigned short drflac_uint16;
245	typedef signed int drflac_int32;
246	typedef unsigned int drflac_uint32;
247	#if defined(_MSC_VER) && !defined(__clang__)
248	typedef signed __int64 drflac_int64;
249	typedef unsigned __int64 drflac_uint64;
250	#else
251	#if defined(__clang__) \|\| (defined(__GNUC__) && (__GNUC__ > 4 \|\| (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
252	#pragma GCC diagnostic push
253	#pragma GCC diagnostic ignored "-Wlong-long"
254	#if defined(__clang__)
255	#pragma GCC diagnostic ignored "-Wc++11-long-long"
256	#endif
257	#endif
258	typedef signed long long drflac_int64;
259	typedef unsigned long long drflac_uint64;
260	#if defined(__clang__) \|\| (defined(__GNUC__) && (__GNUC__ > 4 \|\| (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
261	#pragma GCC diagnostic pop
262	#endif
263	#endif
264	#if defined(__LP64__) \|\| defined(_WIN64) \|\| (defined(__x86_64__) && !defined(__ILP32__)) \|\| defined(_M_X64) \|\| defined(__ia64) \|\| defined(_M_IA64) \|\| defined(__aarch64__) \|\| defined(_M_ARM64) \|\| defined(__powerpc64__)
265	typedef drflac_uint64 drflac_uintptr;
266	#else
267	typedef drflac_uint32 drflac_uintptr;
268	#endif
269	typedef drflac_uint8 drflac_bool8;
270	typedef drflac_uint32 drflac_bool32;
271	#define DRFLAC_TRUE 1
272	#define DRFLAC_FALSE 0
273
274	#if !defined(DRFLAC_API)
275	#if defined(DRFLAC_DLL)
276	#if defined(_WIN32)
277	#define DRFLAC_DLL_IMPORT __declspec(dllimport)
278	#define DRFLAC_DLL_EXPORT __declspec(dllexport)
279	#define DRFLAC_DLL_PRIVATE static
280	#else
281	#if defined(__GNUC__) && __GNUC__ >= 4
282	#define DRFLAC_DLL_IMPORT __attribute__((visibility("default")))
283	#define DRFLAC_DLL_EXPORT __attribute__((visibility("default")))
284	#define DRFLAC_DLL_PRIVATE __attribute__((visibility("hidden")))
285	#else
286	#define DRFLAC_DLL_IMPORT
287	#define DRFLAC_DLL_EXPORT
288	#define DRFLAC_DLL_PRIVATE static
289	#endif
290	#endif
291
292	#if defined(DR_FLAC_IMPLEMENTATION) \|\| defined(DRFLAC_IMPLEMENTATION)
293	#define DRFLAC_API DRFLAC_DLL_EXPORT
294	#else
295	#define DRFLAC_API DRFLAC_DLL_IMPORT
296	#endif
297	#define DRFLAC_PRIVATE DRFLAC_DLL_PRIVATE
298	#else
299	#define DRFLAC_API extern
300	#define DRFLAC_PRIVATE static
301	#endif
302	#endif
303
304	#if defined(_MSC_VER) && _MSC_VER >= 1700 /* Visual Studio 2012 */
305	#define DRFLAC_DEPRECATED __declspec(deprecated)
306	#elif (defined(__GNUC__) && __GNUC__ >= 4) /* GCC 4 */
307	#define DRFLAC_DEPRECATED __attribute__((deprecated))
308	#elif defined(__has_feature) /* Clang */
309	#if __has_feature(attribute_deprecated)
310	#define DRFLAC_DEPRECATED __attribute__((deprecated))
311	#else
312	#define DRFLAC_DEPRECATED
313	#endif
314	#else
315	#define DRFLAC_DEPRECATED
316	#endif
317
318	DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision);
319	DRFLAC_API const char* drflac_version_string(void);
320
321	/*
322	As data is read from the client it is placed into an internal buffer for fast access. This controls the size of that buffer. Larger values means more speed,
323	but also more memory. In my testing there is diminishing returns after about 4KB, but you can fiddle with this to suit your own needs. Must be a multiple of 8.
324	*/
325	#ifndef DR_FLAC_BUFFER_SIZE
326	#define DR_FLAC_BUFFER_SIZE 4096
327	#endif
328
329	/ Check if we can enable 64-bit optimizations. /
330	#if defined(_WIN64) \|\| defined(_LP64) \|\| defined(__LP64__)
331	#define DRFLAC_64BIT
332	#endif
333
334	#ifdef DRFLAC_64BIT
335	typedef drflac_uint64 drflac_cache_t;
336	#else
337	typedef drflac_uint32 drflac_cache_t;
338	#endif
339
340	/ The various metadata block types. /
341	#define DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO 0
342	#define DRFLAC_METADATA_BLOCK_TYPE_PADDING 1
343	#define DRFLAC_METADATA_BLOCK_TYPE_APPLICATION 2
344	#define DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE 3
345	#define DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT 4
346	#define DRFLAC_METADATA_BLOCK_TYPE_CUESHEET 5
347	#define DRFLAC_METADATA_BLOCK_TYPE_PICTURE 6
348	#define DRFLAC_METADATA_BLOCK_TYPE_INVALID 127
349
350	/ The various picture types specified in the PICTURE block. /
351	#define DRFLAC_PICTURE_TYPE_OTHER 0
352	#define DRFLAC_PICTURE_TYPE_FILE_ICON 1
353	#define DRFLAC_PICTURE_TYPE_OTHER_FILE_ICON 2
354	#define DRFLAC_PICTURE_TYPE_COVER_FRONT 3
355	#define DRFLAC_PICTURE_TYPE_COVER_BACK 4
356	#define DRFLAC_PICTURE_TYPE_LEAFLET_PAGE 5
357	#define DRFLAC_PICTURE_TYPE_MEDIA 6
358	#define DRFLAC_PICTURE_TYPE_LEAD_ARTIST 7
359	#define DRFLAC_PICTURE_TYPE_ARTIST 8
360	#define DRFLAC_PICTURE_TYPE_CONDUCTOR 9
361	#define DRFLAC_PICTURE_TYPE_BAND 10
362	#define DRFLAC_PICTURE_TYPE_COMPOSER 11
363	#define DRFLAC_PICTURE_TYPE_LYRICIST 12
364	#define DRFLAC_PICTURE_TYPE_RECORDING_LOCATION 13
365	#define DRFLAC_PICTURE_TYPE_DURING_RECORDING 14
366	#define DRFLAC_PICTURE_TYPE_DURING_PERFORMANCE 15
367	#define DRFLAC_PICTURE_TYPE_SCREEN_CAPTURE 16
368	#define DRFLAC_PICTURE_TYPE_BRIGHT_COLORED_FISH 17
369	#define DRFLAC_PICTURE_TYPE_ILLUSTRATION 18
370	#define DRFLAC_PICTURE_TYPE_BAND_LOGOTYPE 19
371	#define DRFLAC_PICTURE_TYPE_PUBLISHER_LOGOTYPE 20
372
373	typedef enum
374	{
375	drflac_container_native,
376	drflac_container_ogg,
377	drflac_container_unknown
378	} drflac_container;
379
380	typedef enum
381	{
382	drflac_seek_origin_start,
383	drflac_seek_origin_current
384	} drflac_seek_origin;
385
386	/ Packing is important on this structure because we map this directly to the raw data within the SEEKTABLE metadata block. /
387	#pragma pack(2)
388	typedef struct
389	{
390	drflac_uint64 firstPCMFrame;
391	drflac_uint64 flacFrameOffset; / The offset from the first byte of the header of the first frame. /
392	drflac_uint16 pcmFrameCount;
393	} drflac_seekpoint;
394	#pragma pack()
395
396	typedef struct
397	{
398	drflac_uint16 minBlockSizeInPCMFrames;
399	drflac_uint16 maxBlockSizeInPCMFrames;
400	drflac_uint32 minFrameSizeInPCMFrames;
401	drflac_uint32 maxFrameSizeInPCMFrames;
402	drflac_uint32 sampleRate;
403	drflac_uint8 channels;
404	drflac_uint8 bitsPerSample;
405	drflac_uint64 totalPCMFrameCount;
406	drflac_uint8 md5[`16`];
407	} drflac_streaminfo;
408
409	typedef struct
410	{
411	/*
412	The metadata type. Use this to know how to interpret the data below. Will be set to one of the
413	DRFLAC_METADATA_BLOCK_TYPE_ tokens.*
414	*/
415	drflac_uint32 type;
416
417	/*
418	A pointer to the raw data. This points to a temporary buffer so don't hold on to it. It's best to
419	not modify the contents of this buffer. Use the structures below for more meaningful and structured
420	information about the metadata. It's possible for this to be null.
421	*/
422	const void* pRawData;
423
424	/ The size in bytes of the block and the buffer pointed to by pRawData if it's non-NULL. /
425	drflac_uint32 rawDataSize;
426
427	union
428	{
429	drflac_streaminfo streaminfo;
430
431	struct
432	{
433	int unused;
434	} padding;
435
436	struct
437	{
438	drflac_uint32 id;
439	const void* pData;
440	drflac_uint32 dataSize;
441	} application;
442
443	struct
444	{
445	drflac_uint32 seekpointCount;
446	const drflac_seekpoint* pSeekpoints;
447	} seektable;
448
449	struct
450	{
451	drflac_uint32 vendorLength;
452	const char* vendor;
453	drflac_uint32 commentCount;
454	const void* pComments;
455	} vorbis_comment;
456
457	struct
458	{
459	char catalog[`128`];
460	drflac_uint64 leadInSampleCount;
461	drflac_bool32 isCD;
462	drflac_uint8 trackCount;
463	const void* pTrackData;
464	} cuesheet;
465
466	struct
467	{
468	drflac_uint32 type;
469	drflac_uint32 mimeLength;
470	const char* mime;
471	drflac_uint32 descriptionLength;
472	const char* description;
473	drflac_uint32 width;
474	drflac_uint32 height;
475	drflac_uint32 colorDepth;
476	drflac_uint32 indexColorCount;
477	drflac_uint32 pictureDataSize;
478	const drflac_uint8* pPictureData;
479	} picture;
480	} data;
481	} drflac_metadata;
482
483
484	/*
485	Callback for when data needs to be read from the client.
486
487
488	Parameters
489	----------
490	pUserData (in)
491	The user data that was passed to drflac_open() and family.
492
493	pBufferOut (out)
494	The output buffer.
495
496	bytesToRead (in)
497	The number of bytes to read.
498
499
500	Return Value
501	------------
502	The number of bytes actually read.
503
504
505	Remarks
506	-------
507	A return value of less than bytesToRead indicates the end of the stream. Do _not_ return from this callback until either the entire bytesToRead is filled or
508	you have reached the end of the stream.
509	*/
510	typedef size_t (* drflac_read_proc)(void* pUserData, void* pBufferOut, size_t bytesToRead);
511
512	/*
513	Callback for when data needs to be seeked.
514
515
516	Parameters
517	----------
518	pUserData (in)
519	The user data that was passed to drflac_open() and family.
520
521	offset (in)
522	The number of bytes to move, relative to the origin. Will never be negative.
523
524	origin (in)
525	The origin of the seek - the current position or the start of the stream.
526
527
528	Return Value
529	------------
530	Whether or not the seek was successful.
531
532
533	Remarks
534	-------
535	The offset will never be negative. Whether or not it is relative to the beginning or current position is determined by the "origin" parameter which will be
536	either drflac_seek_origin_start or drflac_seek_origin_current.
537
538	When seeking to a PCM frame using drflac_seek_to_pcm_frame(), dr_flac may call this with an offset beyond the end of the FLAC stream. This needs to be detected
539	and handled by returning DRFLAC_FALSE.
540	*/
541	typedef drflac_bool32 (* drflac_seek_proc)(void* pUserData, int offset, drflac_seek_origin origin);
542
543	/*
544	Callback for when a metadata block is read.
545
546
547	Parameters
548	----------
549	pUserData (in)
550	The user data that was passed to drflac_open() and family.
551
552	pMetadata (in)
553	A pointer to a structure containing the data of the metadata block.
554
555
556	Remarks
557	-------
558	Use pMetadata->type to determine which metadata block is being handled and how to read the data. This
559	will be set to one of the DRFLAC_METADATA_BLOCK_TYPE_ tokens.*
560	*/
561	typedef void (* drflac_meta_proc)(void* pUserData, drflac_metadata* pMetadata);
562
563
564	typedef struct
565	{
566	void* pUserData;
567	void* (* onMalloc)(size_t sz, void* pUserData);
568	void* (* onRealloc)(void* p, size_t sz, void* pUserData);
569	void (* onFree)(void* p, void* pUserData);
570	} drflac_allocation_callbacks;
571
572	/ Structure for internal use. Only used for decoders opened with drflac_open_memory. /
573	typedef struct
574	{
575	const drflac_uint8* data;
576	size_t dataSize;
577	size_t currentReadPos;
578	} drflac__memory_stream;
579
580	/ Structure for internal use. Used for bit streaming. /
581	typedef struct
582	{
583	/ The function to call when more data needs to be read. /
584	drflac_read_proc onRead;
585
586	/ The function to call when the current read position needs to be moved. /
587	drflac_seek_proc onSeek;
588
589	/ The user data to pass around to onRead and onSeek. /
590	void* pUserData;
591
592
593	/*
594	The number of unaligned bytes in the L2 cache. This will always be 0 until the end of the stream is hit. At the end of the
595	stream there will be a number of bytes that don't cleanly fit in an L1 cache line, so we use this variable to know whether
596	or not the bistreamer needs to run on a slower path to read those last bytes. This will never be more than sizeof(drflac_cache_t).
597	*/
598	size_t unalignedByteCount;
599
600	/ The content of the unaligned bytes. /
601	drflac_cache_t unalignedCache;
602
603	/ The index of the next valid cache line in the "L2" cache. /
604	drflac_uint32 nextL2Line;
605
606	/ The number of bits that have been consumed by the cache. This is used to determine how many valid bits are remaining. /
607	drflac_uint32 consumedBits;
608
609	/*
610	The cached data which was most recently read from the client. There are two levels of cache. Data flows as such:
611	Client -> L2 -> L1. The L2 -> L1 movement is aligned and runs on a fast path in just a few instructions.
612	*/
613	drflac_cache_t cacheL2[DR_FLAC_BUFFER_SIZE/sizeof(drflac_cache_t)];
614	drflac_cache_t cache;
615
616	/*
617	CRC-16. This is updated whenever bits are read from the bit stream. Manually set this to 0 to reset the CRC. For FLAC, this
618	is reset to 0 at the beginning of each frame.
619	*/
620	drflac_uint16 crc16;
621	drflac_cache_t crc16Cache; / A cache for optimizing CRC calculations. This is filled when when the L1 cache is reloaded. /
622	drflac_uint32 crc16CacheIgnoredBytes; / The number of bytes to ignore when updating the CRC-16 from the CRC-16 cache. /
623	} drflac_bs;
624
625	typedef struct
626	{
627	/ The type of the subframe: SUBFRAME_CONSTANT, SUBFRAME_VERBATIM, SUBFRAME_FIXED or SUBFRAME_LPC. /
628	drflac_uint8 subframeType;
629
630	/ The number of wasted bits per sample as specified by the sub-frame header. /
631	drflac_uint8 wastedBitsPerSample;
632
633	/ The order to use for the prediction stage for SUBFRAME_FIXED and SUBFRAME_LPC. /
634	drflac_uint8 lpcOrder;
635
636	/ A pointer to the buffer containing the decoded samples in the subframe. This pointer is an offset from drflac::pExtraData. /
637	drflac_int32* pSamplesS32;
638	} drflac_subframe;
639
640	typedef struct
641	{
642	/*
643	If the stream uses variable block sizes, this will be set to the index of the first PCM frame. If fixed block sizes are used, this will
644	always be set to 0. This is 64-bit because the decoded PCM frame number will be 36 bits.
645	*/
646	drflac_uint64 pcmFrameNumber;
647
648	/*
649	If the stream uses fixed block sizes, this will be set to the frame number. If variable block sizes are used, this will always be 0. This
650	is 32-bit because in fixed block sizes, the maximum frame number will be 31 bits.
651	*/
652	drflac_uint32 flacFrameNumber;
653
654	/ The sample rate of this frame. /
655	drflac_uint32 sampleRate;
656
657	/ The number of PCM frames in each sub-frame within this frame. /
658	drflac_uint16 blockSizeInPCMFrames;
659
660	/*
661	The channel assignment of this frame. This is not always set to the channel count. If interchannel decorrelation is being used this
662	will be set to DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE, DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE or DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE.
663	*/
664	drflac_uint8 channelAssignment;
665
666	/ The number of bits per sample within this frame. /
667	drflac_uint8 bitsPerSample;
668
669	/ The frame's CRC. /
670	drflac_uint8 crc8;
671	} drflac_frame_header;
672
673	typedef struct
674	{
675	/ The header. /
676	drflac_frame_header header;
677
678	/*
679	The number of PCM frames left to be read in this FLAC frame. This is initially set to the block size. As PCM frames are read,
680	this will be decremented. When it reaches 0, the decoder will see this frame as fully consumed and load the next frame.
681	*/
682	drflac_uint32 pcmFramesRemaining;
683
684	/ The list of sub-frames within the frame. There is one sub-frame for each channel, and there's a maximum of 8 channels. /
685	drflac_subframe subframes[`8`];
686	} drflac_frame;
687
688	typedef struct
689	{
690	/ The function to call when a metadata block is read. /
691	drflac_meta_proc onMeta;
692
693	/ The user data posted to the metadata callback function. /
694	void* pUserDataMD;
695
696	/ Memory allocation callbacks. /
697	drflac_allocation_callbacks allocationCallbacks;
698
699
700	/ The sample rate. Will be set to something like 44100. /
701	drflac_uint32 sampleRate;
702
703	/*
704	The number of channels. This will be set to 1 for monaural streams, 2 for stereo, etc. Maximum 8. This is set based on the
705	value specified in the STREAMINFO block.
706	*/
707	drflac_uint8 channels;
708
709	/ The bits per sample. Will be set to something like 16, 24, etc. /
710	drflac_uint8 bitsPerSample;
711
712	/ The maximum block size, in samples. This number represents the number of samples in each channel (not combined). /
713	drflac_uint16 maxBlockSizeInPCMFrames;
714
715	/*
716	The total number of PCM Frames making up the stream. Can be 0 in which case it's still a valid stream, but just means
717	the total PCM frame count is unknown. Likely the case with streams like internet radio.
718	*/
719	drflac_uint64 totalPCMFrameCount;
720
721
722	/ The container type. This is set based on whether or not the decoder was opened from a native or Ogg stream. /
723	drflac_container container;
724
725	/ The number of seekpoints in the seektable. /
726	drflac_uint32 seekpointCount;
727
728
729	/ Information about the frame the decoder is currently sitting on. /
730	drflac_frame currentFLACFrame;
731
732
733	/ The index of the PCM frame the decoder is currently sitting on. This is only used for seeking. /
734	drflac_uint64 currentPCMFrame;
735
736	/ The position of the first FLAC frame in the stream. This is only ever used for seeking. /
737	drflac_uint64 firstFLACFramePosInBytes;
738
739
740	/ A hack to avoid a malloc() when opening a decoder with drflac_open_memory(). /
741	drflac__memory_stream memoryStream;
742
743
744	/ A pointer to the decoded sample data. This is an offset of pExtraData. /
745	drflac_int32* pDecodedSamples;
746
747	/ A pointer to the seek table. This is an offset of pExtraData, or NULL if there is no seek table. /
748	drflac_seekpoint* pSeekpoints;
749
750	/ Internal use only. Only used with Ogg containers. Points to a drflac_oggbs object. This is an offset of pExtraData. /
751	void* _oggbs;
752
753	/ Internal use only. Used for profiling and testing different seeking modes. /
754	drflac_bool32 _noSeekTableSeek : `1`;
755	drflac_bool32 _noBinarySearchSeek : `1`;
756	drflac_bool32 _noBruteForceSeek : `1`;
757
758	/ The bit streamer. The raw FLAC data is fed through this object. /
759	drflac_bs bs;
760
761	/ Variable length extra data. We attach this to the end of the object so we can avoid unnecessary mallocs. /
762	drflac_uint8 pExtraData[`1`];
763	} drflac;
764
765
766	/*
767	Opens a FLAC decoder.
768
769
770	Parameters
771	----------
772	onRead (in)
773	The function to call when data needs to be read from the client.
774
775	onSeek (in)
776	The function to call when the read position of the client data needs to move.
777
778	pUserData (in, optional)
779	A pointer to application defined data that will be passed to onRead and onSeek.
780
781	pAllocationCallbacks (in, optional)
782	A pointer to application defined callbacks for managing memory allocations.
783
784
785	Return Value
786	------------
787	Returns a pointer to an object representing the decoder.
788
789
790	Remarks
791	-------
792	Close the decoder with `drflac_close()`.
793
794	`pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
795
796	This function will automatically detect whether or not you are attempting to open a native or Ogg encapsulated FLAC, both of which should work seamlessly
797	without any manual intervention. Ogg encapsulation also works with multiplexed streams which basically means it can play FLAC encoded audio tracks in videos.
798
799	This is the lowest level function for opening a FLAC stream. You can also use `drflac_open_file()` and `drflac_open_memory()` to open the stream from a file or
800	from a block of memory respectively.
801
802	The STREAMINFO block must be present for this to succeed. Use `drflac_open_relaxed()` to open a FLAC stream where the header may not be present.
803
804	Use `drflac_open_with_metadata()` if you need access to metadata.
805
806
807	Seek Also
808	---------
809	drflac_open_file()
810	drflac_open_memory()
811	drflac_open_with_metadata()
812	drflac_close()
813	*/
814	DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
815
816	/*
817	Opens a FLAC stream with relaxed validation of the header block.
818
819
820	Parameters
821	----------
822	onRead (in)
823	The function to call when data needs to be read from the client.
824
825	onSeek (in)
826	The function to call when the read position of the client data needs to move.
827
828	container (in)
829	Whether or not the FLAC stream is encapsulated using standard FLAC encapsulation or Ogg encapsulation.
830
831	pUserData (in, optional)
832	A pointer to application defined data that will be passed to onRead and onSeek.
833
834	pAllocationCallbacks (in, optional)
835	A pointer to application defined callbacks for managing memory allocations.
836
837
838	Return Value
839	------------
840	A pointer to an object representing the decoder.
841
842
843	Remarks
844	-------
845	The same as drflac_open(), except attempts to open the stream even when a header block is not present.
846
847	Because the header is not necessarily available, the caller must explicitly define the container (Native or Ogg). Do not set this to `drflac_container_unknown`
848	as that is for internal use only.
849
850	Opening in relaxed mode will continue reading data from onRead until it finds a valid frame. If a frame is never found it will continue forever. To abort,
851	force your `onRead` callback to return 0, which dr_flac will use as an indicator that the end of the stream was found.
852
853	Use `drflac_open_with_metadata_relaxed()` if you need access to metadata.
854	*/
855	DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
856
857	/*
858	Opens a FLAC decoder and notifies the caller of the metadata chunks (album art, etc.).
859
860
861	Parameters
862	----------
863	onRead (in)
864	The function to call when data needs to be read from the client.
865
866	onSeek (in)
867	The function to call when the read position of the client data needs to move.
868
869	onMeta (in)
870	The function to call for every metadata block.
871
872	pUserData (in, optional)
873	A pointer to application defined data that will be passed to onRead, onSeek and onMeta.
874
875	pAllocationCallbacks (in, optional)
876	A pointer to application defined callbacks for managing memory allocations.
877
878
879	Return Value
880	------------
881	A pointer to an object representing the decoder.
882
883
884	Remarks
885	-------
886	Close the decoder with `drflac_close()`.
887
888	`pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
889
890	This is slower than `drflac_open()`, so avoid this one if you don't need metadata. Internally, this will allocate and free memory on the heap for every
891	metadata block except for STREAMINFO and PADDING blocks.
892
893	The caller is notified of the metadata via the `onMeta` callback. All metadata blocks will be handled before the function returns. This callback takes a
894	pointer to a `drflac_metadata` object which is a union containing the data of all relevant metadata blocks. Use the `type` member to discriminate against
895	the different metadata types.
896
897	The STREAMINFO block must be present for this to succeed. Use `drflac_open_with_metadata_relaxed()` to open a FLAC stream where the header may not be present.
898
899	Note that this will behave inconsistently with `drflac_open()` if the stream is an Ogg encapsulated stream and a metadata block is corrupted. This is due to
900	the way the Ogg stream recovers from corrupted pages. When `drflac_open_with_metadata()` is being used, the open routine will try to read the contents of the
901	metadata block, whereas `drflac_open()` will simply seek past it (for the sake of efficiency). This inconsistency can result in different samples being
902	returned depending on whether or not the stream is being opened with metadata.
903
904
905	Seek Also
906	---------
907	drflac_open_file_with_metadata()
908	drflac_open_memory_with_metadata()
909	drflac_open()
910	drflac_close()
911	*/
912	DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
913
914	/*
915	The same as drflac_open_with_metadata(), except attempts to open the stream even when a header block is not present.
916
917	See Also
918	--------
919	drflac_open_with_metadata()
920	drflac_open_relaxed()
921	*/
922	DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
923
924	/*
925	Closes the given FLAC decoder.
926
927
928	Parameters
929	----------
930	pFlac (in)
931	The decoder to close.
932
933
934	Remarks
935	-------
936	This will destroy the decoder object.
937
938
939	See Also
940	--------
941	drflac_open()
942	drflac_open_with_metadata()
943	drflac_open_file()
944	drflac_open_file_w()
945	drflac_open_file_with_metadata()
946	drflac_open_file_with_metadata_w()
947	drflac_open_memory()
948	drflac_open_memory_with_metadata()
949	*/
950	DRFLAC_API void drflac_close(drflac* pFlac);
951
952
953	/*
954	Reads sample data from the given FLAC decoder, output as interleaved signed 32-bit PCM.
955
956
957	Parameters
958	----------
959	pFlac (in)
960	The decoder.
961
962	framesToRead (in)
963	The number of PCM frames to read.
964
965	pBufferOut (out, optional)
966	A pointer to the buffer that will receive the decoded samples.
967
968
969	Return Value
970	------------
971	Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
972
973
974	Remarks
975	-------
976	pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
977	*/
978	DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut);
979
980
981	/*
982	Reads sample data from the given FLAC decoder, output as interleaved signed 16-bit PCM.
983
984
985	Parameters
986	----------
987	pFlac (in)
988	The decoder.
989
990	framesToRead (in)
991	The number of PCM frames to read.
992
993	pBufferOut (out, optional)
994	A pointer to the buffer that will receive the decoded samples.
995
996
997	Return Value
998	------------
999	Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
1000
1001
1002	Remarks
1003	-------
1004	pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
1005
1006	Note that this is lossy for streams where the bits per sample is larger than 16.
1007	*/
1008	DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut);
1009
1010	/*
1011	Reads sample data from the given FLAC decoder, output as interleaved 32-bit floating point PCM.
1012
1013
1014	Parameters
1015	----------
1016	pFlac (in)
1017	The decoder.
1018
1019	framesToRead (in)
1020	The number of PCM frames to read.
1021
1022	pBufferOut (out, optional)
1023	A pointer to the buffer that will receive the decoded samples.
1024
1025
1026	Return Value
1027	------------
1028	Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
1029
1030
1031	Remarks
1032	-------
1033	pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
1034
1035	Note that this should be considered lossy due to the nature of floating point numbers not being able to exactly represent every possible number.
1036	*/
1037	DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut);
1038
1039	/*
1040	Seeks to the PCM frame at the given index.
1041
1042
1043	Parameters
1044	----------
1045	pFlac (in)
1046	The decoder.
1047
1048	pcmFrameIndex (in)
1049	The index of the PCM frame to seek to. See notes below.
1050
1051
1052	Return Value
1053	-------------
1054	`DRFLAC_TRUE` if successful; `DRFLAC_FALSE` otherwise.
1055	*/
1056	DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex);
1057
1058
1059
1060	#ifndef DR_FLAC_NO_STDIO
1061	/*
1062	Opens a FLAC decoder from the file at the given path.
1063
1064
1065	Parameters
1066	----------
1067	pFileName (in)
1068	The path of the file to open, either absolute or relative to the current directory.
1069
1070	pAllocationCallbacks (in, optional)
1071	A pointer to application defined callbacks for managing memory allocations.
1072
1073
1074	Return Value
1075	------------
1076	A pointer to an object representing the decoder.
1077
1078
1079	Remarks
1080	-------
1081	Close the decoder with drflac_close().
1082
1083
1084	Remarks
1085	-------
1086	This will hold a handle to the file until the decoder is closed with drflac_close(). Some platforms will restrict the number of files a process can have open
1087	at any given time, so keep this mind if you have many decoders open at the same time.
1088
1089
1090	See Also
1091	--------
1092	drflac_open_file_with_metadata()
1093	drflac_open()
1094	drflac_close()
1095	*/
1096	DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
1097	DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
1098
1099	/*
1100	Opens a FLAC decoder from the file at the given path and notifies the caller of the metadata chunks (album art, etc.)
1101
1102
1103	Parameters
1104	----------
1105	pFileName (in)
1106	The path of the file to open, either absolute or relative to the current directory.
1107
1108	pAllocationCallbacks (in, optional)
1109	A pointer to application defined callbacks for managing memory allocations.
1110
1111	onMeta (in)
1112	The callback to fire for each metadata block.
1113
1114	pUserData (in)
1115	A pointer to the user data to pass to the metadata callback.
1116
1117	pAllocationCallbacks (in)
1118	A pointer to application defined callbacks for managing memory allocations.
1119
1120
1121	Remarks
1122	-------
1123	Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
1124
1125
1126	See Also
1127	--------
1128	drflac_open_with_metadata()
1129	drflac_open()
1130	drflac_close()
1131	*/
1132	DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1133	DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1134	#endif
1135
1136	/*
1137	Opens a FLAC decoder from a pre-allocated block of memory
1138
1139
1140	Parameters
1141	----------
1142	pData (in)
1143	A pointer to the raw encoded FLAC data.
1144
1145	dataSize (in)
1146	The size in bytes of `data`.
1147
1148	pAllocationCallbacks (in)
1149	A pointer to application defined callbacks for managing memory allocations.
1150
1151
1152	Return Value
1153	------------
1154	A pointer to an object representing the decoder.
1155
1156
1157	Remarks
1158	-------
1159	This does not create a copy of the data. It is up to the application to ensure the buffer remains valid for the lifetime of the decoder.
1160
1161
1162	See Also
1163	--------
1164	drflac_open()
1165	drflac_close()
1166	*/
1167	DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks);
1168
1169	/*
1170	Opens a FLAC decoder from a pre-allocated block of memory and notifies the caller of the metadata chunks (album art, etc.)
1171
1172
1173	Parameters
1174	----------
1175	pData (in)
1176	A pointer to the raw encoded FLAC data.
1177
1178	dataSize (in)
1179	The size in bytes of `data`.
1180
1181	onMeta (in)
1182	The callback to fire for each metadata block.
1183
1184	pUserData (in)
1185	A pointer to the user data to pass to the metadata callback.
1186
1187	pAllocationCallbacks (in)
1188	A pointer to application defined callbacks for managing memory allocations.
1189
1190
1191	Remarks
1192	-------
1193	Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
1194
1195
1196	See Also
1197	-------
1198	drflac_open_with_metadata()
1199	drflac_open()
1200	drflac_close()
1201	*/
1202	DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1203
1204
1205
1206	/ High Level APIs /
1207
1208	/*
1209	Opens a FLAC stream from the given callbacks and fully decodes it in a single operation. The return value is a
1210	pointer to the sample data as interleaved signed 32-bit PCM. The returned data must be freed with drflac_free().
1211
1212	You can pass in custom memory allocation callbacks via the pAllocationCallbacks parameter. This can be NULL in which
1213	case it will use DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
1214
1215	Sometimes a FLAC file won't keep track of the total sample count. In this situation the function will continuously
1216	read samples into a dynamically sized buffer on the heap until no samples are left.
1217
1218	Do not call this function on a broadcast type of stream (like internet radio streams and whatnot).
1219	*/
1220	DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1221
1222	/ Same as drflac_open_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. /
1223	DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1224
1225	/ Same as drflac_open_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. /
1226	DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1227
1228	#ifndef DR_FLAC_NO_STDIO
1229	/ Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a file. /
1230	DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1231
1232	/ Same as drflac_open_file_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. /
1233	DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1234
1235	/ Same as drflac_open_file_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. /
1236	DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1237	#endif
1238
1239	/ Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a block of memory. /
1240	DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1241
1242	/ Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. /
1243	DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1244
1245	/ Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. /
1246	DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1247
1248	/*
1249	Frees memory that was allocated internally by dr_flac.
1250
1251	Set pAllocationCallbacks to the same object that was passed to drflac_open__and_read_pcm_frames_(). If you originally passed in NULL, pass in NULL for this.
1252	*/
1253	DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks);
1254
1255
1256	/ Structure representing an iterator for vorbis comments in a VORBIS_COMMENT metadata block. /
1257	typedef struct
1258	{
1259	drflac_uint32 countRemaining;
1260	const char* pRunningData;
1261	} drflac_vorbis_comment_iterator;
1262
1263	/*
1264	Initializes a vorbis comment iterator. This can be used for iterating over the vorbis comments in a VORBIS_COMMENT
1265	metadata block.
1266	*/
1267	DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments);
1268
1269	/*
1270	Goes to the next vorbis comment in the given iterator. If null is returned it means there are no more comments. The
1271	returned string is NOT null terminated.
1272	*/
1273	DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut);
1274
1275
1276	/ Structure representing an iterator for cuesheet tracks in a CUESHEET metadata block. /
1277	typedef struct
1278	{
1279	drflac_uint32 countRemaining;
1280	const char* pRunningData;
1281	} drflac_cuesheet_track_iterator;
1282
1283	/ Packing is important on this structure because we map this directly to the raw data within the CUESHEET metadata block. /
1284	#pragma pack(4)
1285	typedef struct
1286	{
1287	drflac_uint64 offset;
1288	drflac_uint8 index;
1289	drflac_uint8 reserved[`3`];
1290	} drflac_cuesheet_track_index;
1291	#pragma pack()
1292
1293	typedef struct
1294	{
1295	drflac_uint64 offset;
1296	drflac_uint8 trackNumber;
1297	char ISRC[`12`];
1298	drflac_bool8 isAudio;
1299	drflac_bool8 preEmphasis;
1300	drflac_uint8 indexCount;
1301	const drflac_cuesheet_track_index* pIndexPoints;
1302	} drflac_cuesheet_track;
1303
1304	/*
1305	Initializes a cuesheet track iterator. This can be used for iterating over the cuesheet tracks in a CUESHEET metadata
1306	block.
1307	*/
1308	DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData);
1309
1310	/ Goes to the next cuesheet track in the given iterator. If DRFLAC_FALSE is returned it means there are no more comments. /
1311	DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack);
1312
1313
1314	#ifdef __cplusplus
1315	}
1316	#endif
1317	#endif /* dr_flac_h */
1318
1319
1320	/************************************************************************************************************************************************************
1321	************************************************************************************************************************************************************
1322
1323	IMPLEMENTATION
1324
1325	************************************************************************************************************************************************************
1326	************************************************************************************************************************************************************/
1327	#if defined(DR_FLAC_IMPLEMENTATION) \|\| defined(DRFLAC_IMPLEMENTATION)
1328	#ifndef dr_flac_c
1329	#define dr_flac_c
1330
1331	/ Disable some annoying warnings. /
1332	#if defined(__clang__) \|\| (defined(__GNUC__) && (__GNUC__ > 4 \|\| (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
1333	#pragma GCC diagnostic push
1334	#if __GNUC__ >= 7
1335	#pragma GCC diagnostic ignored "-Wimplicit-fallthrough"
1336	#endif
1337	#endif
1338
1339	#ifdef __linux__
1340	#ifndef _BSD_SOURCE
1341	#define _BSD_SOURCE
1342	#endif
1343	#ifndef _DEFAULT_SOURCE
1344	#define _DEFAULT_SOURCE
1345	#endif
1346	#ifndef __USE_BSD
1347	#define __USE_BSD
1348	#endif
1349	#include <endian.h>
1350	#endif
1351
1352	#include <stdlib.h>
1353	#include <string.h>
1354
1355	#ifdef _MSC_VER
1356	#define DRFLAC_INLINE __forceinline
1357	#elif defined(__GNUC__)
1358	/*
1359	I've had a bug report where GCC is emitting warnings about functions possibly not being inlineable. This warning happens when
1360	the __attribute__((always_inline)) attribute is defined without an "inline" statement. I think therefore there must be some
1361	case where "__inline__" is not always defined, thus the compiler emitting these warnings. When using -std=c89 or -ansi on the
1362	command line, we cannot use the "inline" keyword and instead need to use "__inline__". In an attempt to work around this issue
1363	I am using "__inline__" only when we're compiling in strict ANSI mode.
1364	*/
1365	#if defined(__STRICT_ANSI__)
1366	#define DRFLAC_INLINE __inline__ __attribute__((always_inline))
1367	#else
1368	#define DRFLAC_INLINE inline __attribute__((always_inline))
1369	#endif
1370	#elif defined(__WATCOMC__)
1371	#define DRFLAC_INLINE __inline
1372	#else
1373	#define DRFLAC_INLINE
1374	#endif
1375
1376	/ CPU architecture. /
1377	#if defined(__x86_64__) \|\| defined(_M_X64)
1378	#define DRFLAC_X64
1379	#elif defined(__i386) \|\| defined(_M_IX86)
1380	#define DRFLAC_X86
1381	#elif defined(__arm__) \|\| defined(_M_ARM) \|\| defined(_M_ARM64)
1382	#define DRFLAC_ARM
1383	#endif
1384
1385	/*
1386	Intrinsics Support
1387
1388	There's a bug in GCC 4.2.x which results in an incorrect compilation error when using _mm_slli_epi32() where it complains with
1389
1390	"error: shift must be an immediate"
1391
1392	Unfortuantely dr_flac depends on this for a few things so we're just going to disable SSE on GCC 4.2 and below.
1393	*/
1394	#if !defined(DR_FLAC_NO_SIMD)
1395	#if defined(DRFLAC_X64) \|\| defined(DRFLAC_X86)
1396	#if defined(_MSC_VER) && !defined(__clang__)
1397	/ MSVC. /
1398	#if _MSC_VER >= 1400 && !defined(DRFLAC_NO_SSE2) /* 2005 */
1399	#define DRFLAC_SUPPORT_SSE2
1400	#endif
1401	#if _MSC_VER >= 1600 && !defined(DRFLAC_NO_SSE41) /* 2010 */
1402	#define DRFLAC_SUPPORT_SSE41
1403	#endif
1404	#elif defined(__clang__) \|\| (defined(__GNUC__) && (__GNUC__ > 4 \|\| (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)))
1405	/ Assume GNUC-style. /
1406	#if defined(__SSE2__) && !defined(DRFLAC_NO_SSE2)
1407	#define DRFLAC_SUPPORT_SSE2
1408	#endif
1409	#if defined(__SSE4_1__) && !defined(DRFLAC_NO_SSE41)
1410	#define DRFLAC_SUPPORT_SSE41
1411	#endif
1412	#endif
1413
1414	/ If at this point we still haven't determined compiler support for the intrinsics just fall back to __has_include. /
1415	#if !defined(__GNUC__) && !defined(__clang__) && defined(__has_include)
1416	#if !defined(DRFLAC_SUPPORT_SSE2) && !defined(DRFLAC_NO_SSE2) && __has_include(<emmintrin.h>)
1417	#define DRFLAC_SUPPORT_SSE2
1418	#endif
1419	#if !defined(DRFLAC_SUPPORT_SSE41) && !defined(DRFLAC_NO_SSE41) && __has_include(<smmintrin.h>)
1420	#define DRFLAC_SUPPORT_SSE41
1421	#endif
1422	#endif
1423
1424	#if defined(DRFLAC_SUPPORT_SSE41)
1425	#include <smmintrin.h>
1426	#elif defined(DRFLAC_SUPPORT_SSE2)
1427	#include <emmintrin.h>
1428	#endif
1429	#endif
1430
1431	#if defined(DRFLAC_ARM)
1432	#if !defined(DRFLAC_NO_NEON) && (defined(__ARM_NEON) \|\| defined(__aarch64__) \|\| defined(_M_ARM64))
1433	#define DRFLAC_SUPPORT_NEON
1434	#endif
1435
1436	/ Fall back to looking for the #include file. /
1437	#if !defined(__GNUC__) && !defined(__clang__) && defined(__has_include)
1438	#if !defined(DRFLAC_SUPPORT_NEON) && !defined(DRFLAC_NO_NEON) && __has_include(<arm_neon.h>)
1439	#define DRFLAC_SUPPORT_NEON
1440	#endif
1441	#endif
1442
1443	#if defined(DRFLAC_SUPPORT_NEON)
1444	#include <arm_neon.h>
1445	#endif
1446	#endif
1447	#endif
1448
1449	/ Compile-time CPU feature support. /
1450	#if !defined(DR_FLAC_NO_SIMD) && (defined(DRFLAC_X86) \|\| defined(DRFLAC_X64))
1451	#if defined(_MSC_VER) && !defined(__clang__)
1452	#if _MSC_VER >= 1400
1453	#include <intrin.h>
1454	static void drflac__cpuid(int info[`4`], int fid)
1455	{
1456	__cpuid(info, fid);
1457	}
1458	#else
1459	#define DRFLAC_NO_CPUID
1460	#endif
1461	#else
1462	#if defined(__GNUC__) \|\| defined(__clang__)
1463	static void drflac__cpuid(int info[`4`], int fid)
1464	{
1465	/*
1466	It looks like the -fPIC option uses the ebx register which GCC complains about. We can work around this by just using a different register, the
1467	specific register of which I'm letting the compiler decide on. The "k" prefix is used to specify a 32-bit register. The {...} syntax is for
1468	supporting different assembly dialects.
1469
1470	What's basically happening is that we're saving and restoring the ebx register manually.
1471	*/
1472	#if defined(DRFLAC_X86) && defined(__PIC__)
1473	__asm__ __volatile__ (
1474	"xchg{l} {%%}ebx, %k1;"
1475	"cpuid;"
1476	"xchg{l} {%%}ebx, %k1;"
1477	: "=a"(info[`0`]), "=&r"(info[`1`]), "=c"(info[`2`]), "=d"(info[`3`]) : "a"(fid), "c"(`0`)
1478	);
1479	#else
1480	__asm__ __volatile__ (
1481	"cpuid" : "=a"(info[`0`]), "=b"(info[`1`]), "=c"(info[`2`]), "=d"(info[`3`]) : "a"(fid), "c"(`0`)
1482	);
1483	#endif
1484	}
1485	#else
1486	#define DRFLAC_NO_CPUID
1487	#endif
1488	#endif
1489	#else
1490	#define DRFLAC_NO_CPUID
1491	#endif
1492
1493	static DRFLAC_INLINE drflac_bool32 drflac_has_sse2(void)
1494	{
1495	#if defined(DRFLAC_SUPPORT_SSE2)
1496	#if (defined(DRFLAC_X64) \|\| defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE2)
1497	#if defined(DRFLAC_X64)
1498	return DRFLAC_TRUE; / 64-bit targets always support SSE2. /
1499	#elif (defined(_M_IX86_FP) && _M_IX86_FP == 2) \|\| defined(__SSE2__)
1500	return DRFLAC_TRUE; / If the compiler is allowed to freely generate SSE2 code we can assume support. /
1501	#else
1502	#if defined(DRFLAC_NO_CPUID)
1503	return DRFLAC_FALSE;
1504	#else
1505	int info[`4`];
1506	drflac__cpuid(info, `1`);
1507	return (info[`3`] & (`1` << `26`)) != `0`;
1508	#endif
1509	#endif
1510	#else
1511	return DRFLAC_FALSE; / SSE2 is only supported on x86 and x64 architectures. /
1512	#endif
1513	#else
1514	return DRFLAC_FALSE; / No compiler support. /
1515	#endif
1516	}
1517
1518	static DRFLAC_INLINE drflac_bool32 drflac_has_sse41(void)
1519	{
1520	#if defined(DRFLAC_SUPPORT_SSE41)
1521	#if (defined(DRFLAC_X64) \|\| defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE41)
1522	#if defined(DRFLAC_X64)
1523	return DRFLAC_TRUE; / 64-bit targets always support SSE4.1. /
1524	#elif (defined(_M_IX86_FP) && _M_IX86_FP == 2) \|\| defined(__SSE4_1__)
1525	return DRFLAC_TRUE; / If the compiler is allowed to freely generate SSE41 code we can assume support. /
1526	#else
1527	#if defined(DRFLAC_NO_CPUID)
1528	return DRFLAC_FALSE;
1529	#else
1530	int info[`4`];
1531	drflac__cpuid(info, `1`);
1532	return (info[`2`] & (`1` << `19`)) != `0`;
1533	#endif
1534	#endif
1535	#else
1536	return DRFLAC_FALSE; / SSE41 is only supported on x86 and x64 architectures. /
1537	#endif
1538	#else
1539	return DRFLAC_FALSE; / No compiler support. /
1540	#endif
1541	}
1542
1543
1544	#if defined(_MSC_VER) && _MSC_VER >= 1500 && (defined(DRFLAC_X86) \|\| defined(DRFLAC_X64)) && !defined(__clang__)
1545	#define DRFLAC_HAS_LZCNT_INTRINSIC
1546	#elif (defined(__GNUC__) && ((__GNUC__ > 4) \|\| (__GNUC__ == 4 && __GNUC_MINOR__ >= 7)))
1547	#define DRFLAC_HAS_LZCNT_INTRINSIC
1548	#elif defined(__clang__)
1549	#if defined(__has_builtin)
1550	#if __has_builtin(__builtin_clzll) \|\| __has_builtin(__builtin_clzl)
1551	#define DRFLAC_HAS_LZCNT_INTRINSIC
1552	#endif
1553	#endif
1554	#endif
1555
1556	#if defined(_MSC_VER) && _MSC_VER >= 1400 && !defined(__clang__)
1557	#define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1558	#define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1559	#define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1560	#elif defined(__clang__)
1561	#if defined(__has_builtin)
1562	#if __has_builtin(__builtin_bswap16)
1563	#define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1564	#endif
1565	#if __has_builtin(__builtin_bswap32)
1566	#define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1567	#endif
1568	#if __has_builtin(__builtin_bswap64)
1569	#define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1570	#endif
1571	#endif
1572	#elif defined(__GNUC__)
1573	#if ((__GNUC__ > 4) \|\| (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))
1574	#define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1575	#define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1576	#endif
1577	#if ((__GNUC__ > 4) \|\| (__GNUC__ == 4 && __GNUC_MINOR__ >= 8))
1578	#define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1579	#endif
1580	#elif defined(__WATCOMC__) && defined(__386__)
1581	#define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1582	#define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1583	#define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1584	extern __inline drflac_uint16 _watcom_bswap16(drflac_uint16);
1585	extern __inline drflac_uint32 _watcom_bswap32(drflac_uint32);
1586	extern __inline drflac_uint64 _watcom_bswap64(drflac_uint64);
1587	#pragma aux _watcom_bswap16 = \
1588	"xchg al, ah" \
1589	parm [ax] \
1590	modify [ax];
1591	#pragma aux _watcom_bswap32 = \
1592	"bswap eax" \
1593	parm [eax] \
1594	modify [eax];
1595	#pragma aux _watcom_bswap64 = \
1596	"bswap eax" \
1597	"bswap edx" \
1598	"xchg eax,edx" \
1599	parm [eax edx] \
1600	modify [eax edx];
1601	#endif
1602
1603
1604	/ Standard library stuff. /
1605	#ifndef DRFLAC_ASSERT
1606	#include <assert.h>
1607	#define DRFLAC_ASSERT(expression) assert(expression)
1608	#endif
1609	#ifndef DRFLAC_MALLOC
1610	#define DRFLAC_MALLOC(sz) malloc((sz))
1611	#endif
1612	#ifndef DRFLAC_REALLOC
1613	#define DRFLAC_REALLOC(p, sz) realloc((p), (sz))
1614	#endif
1615	#ifndef DRFLAC_FREE
1616	#define DRFLAC_FREE(p) free((p))
1617	#endif
1618	#ifndef DRFLAC_COPY_MEMORY
1619	#define DRFLAC_COPY_MEMORY(dst, src, sz) memcpy((dst), (src), (sz))
1620	#endif
1621	#ifndef DRFLAC_ZERO_MEMORY
1622	#define DRFLAC_ZERO_MEMORY(p, sz) memset((p), 0, (sz))
1623	#endif
1624	#ifndef DRFLAC_ZERO_OBJECT
1625	#define DRFLAC_ZERO_OBJECT(p) DRFLAC_ZERO_MEMORY((p), sizeof(*(p)))
1626	#endif
1627
1628	#define DRFLAC_MAX_SIMD_VECTOR_SIZE 64 /* 64 for AVX-512 in the future. */
1629
1630	typedef drflac_int32 drflac_result;
1631	#define DRFLAC_SUCCESS 0
1632	#define DRFLAC_ERROR -1 /* A generic error. */
1633	#define DRFLAC_INVALID_ARGS -2
1634	#define DRFLAC_INVALID_OPERATION -3
1635	#define DRFLAC_OUT_OF_MEMORY -4
1636	#define DRFLAC_OUT_OF_RANGE -5
1637	#define DRFLAC_ACCESS_DENIED -6
1638	#define DRFLAC_DOES_NOT_EXIST -7
1639	#define DRFLAC_ALREADY_EXISTS -8
1640	#define DRFLAC_TOO_MANY_OPEN_FILES -9
1641	#define DRFLAC_INVALID_FILE -10
1642	#define DRFLAC_TOO_BIG -11
1643	#define DRFLAC_PATH_TOO_LONG -12
1644	#define DRFLAC_NAME_TOO_LONG -13
1645	#define DRFLAC_NOT_DIRECTORY -14
1646	#define DRFLAC_IS_DIRECTORY -15
1647	#define DRFLAC_DIRECTORY_NOT_EMPTY -16
1648	#define DRFLAC_END_OF_FILE -17
1649	#define DRFLAC_NO_SPACE -18
1650	#define DRFLAC_BUSY -19
1651	#define DRFLAC_IO_ERROR -20
1652	#define DRFLAC_INTERRUPT -21
1653	#define DRFLAC_UNAVAILABLE -22
1654	#define DRFLAC_ALREADY_IN_USE -23
1655	#define DRFLAC_BAD_ADDRESS -24
1656	#define DRFLAC_BAD_SEEK -25
1657	#define DRFLAC_BAD_PIPE -26
1658	#define DRFLAC_DEADLOCK -27
1659	#define DRFLAC_TOO_MANY_LINKS -28
1660	#define DRFLAC_NOT_IMPLEMENTED -29
1661	#define DRFLAC_NO_MESSAGE -30
1662	#define DRFLAC_BAD_MESSAGE -31
1663	#define DRFLAC_NO_DATA_AVAILABLE -32
1664	#define DRFLAC_INVALID_DATA -33
1665	#define DRFLAC_TIMEOUT -34
1666	#define DRFLAC_NO_NETWORK -35
1667	#define DRFLAC_NOT_UNIQUE -36
1668	#define DRFLAC_NOT_SOCKET -37
1669	#define DRFLAC_NO_ADDRESS -38
1670	#define DRFLAC_BAD_PROTOCOL -39
1671	#define DRFLAC_PROTOCOL_UNAVAILABLE -40
1672	#define DRFLAC_PROTOCOL_NOT_SUPPORTED -41
1673	#define DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED -42
1674	#define DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED -43
1675	#define DRFLAC_SOCKET_NOT_SUPPORTED -44
1676	#define DRFLAC_CONNECTION_RESET -45
1677	#define DRFLAC_ALREADY_CONNECTED -46
1678	#define DRFLAC_NOT_CONNECTED -47
1679	#define DRFLAC_CONNECTION_REFUSED -48
1680	#define DRFLAC_NO_HOST -49
1681	#define DRFLAC_IN_PROGRESS -50
1682	#define DRFLAC_CANCELLED -51
1683	#define DRFLAC_MEMORY_ALREADY_MAPPED -52
1684	#define DRFLAC_AT_END -53
1685	#define DRFLAC_CRC_MISMATCH -128
1686
1687	#define DRFLAC_SUBFRAME_CONSTANT 0
1688	#define DRFLAC_SUBFRAME_VERBATIM 1
1689	#define DRFLAC_SUBFRAME_FIXED 8
1690	#define DRFLAC_SUBFRAME_LPC 32
1691	#define DRFLAC_SUBFRAME_RESERVED 255
1692
1693	#define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE 0
1694	#define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2 1
1695
1696	#define DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT 0
1697	#define DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE 8
1698	#define DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE 9
1699	#define DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE 10
1700
1701	#define drflac_align(x, a) ((((x) + (a) - 1) / (a)) * (a))
1702
1703
1704	DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision)
1705	{
1706	if (pMajor) {
1707	*pMajor = DRFLAC_VERSION_MAJOR;
1708	}
1709
1710	if (pMinor) {
1711	*pMinor = DRFLAC_VERSION_MINOR;
1712	}
1713
1714	if (pRevision) {
1715	*pRevision = DRFLAC_VERSION_REVISION;
1716	}
1717	}
1718
1719	DRFLAC_API const char* drflac_version_string(void)
1720	{
1721	return DRFLAC_VERSION_STRING;
1722	}
1723
1724
1725	/ CPU caps. /
1726	#if defined(__has_feature)
1727	#if __has_feature(thread_sanitizer)
1728	#define DRFLAC_NO_THREAD_SANITIZE __attribute__((no_sanitize("thread")))
1729	#else
1730	#define DRFLAC_NO_THREAD_SANITIZE
1731	#endif
1732	#else
1733	#define DRFLAC_NO_THREAD_SANITIZE
1734	#endif
1735
1736	#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
1737	static drflac_bool32 drflac__gIsLZCNTSupported = DRFLAC_FALSE;
1738	#endif
1739
1740	#ifndef DRFLAC_NO_CPUID
1741	static drflac_bool32 drflac__gIsSSE2Supported = DRFLAC_FALSE;
1742	static drflac_bool32 drflac__gIsSSE41Supported = DRFLAC_FALSE;
1743
1744	/*
1745	I've had a bug report that Clang's ThreadSanitizer presents a warning in this function. Having reviewed this, this does
1746	actually make sense. However, since CPU caps should never differ for a running process, I don't think the trade off of
1747	complicating internal API's by passing around CPU caps versus just disabling the warnings is worthwhile. I'm therefore
1748	just going to disable these warnings. This is disabled via the DRFLAC_NO_THREAD_SANITIZE attribute.
1749	*/
1750	DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
1751	{
1752	static drflac_bool32 isCPUCapsInitialized = DRFLAC_FALSE;
1753
1754	if (!isCPUCapsInitialized) {
1755	/ LZCNT /
1756	#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
1757	int info[`4`] = {`0`};
1758	drflac__cpuid(info, `0x80000001`);
1759	drflac__gIsLZCNTSupported = (info[`2`] & (`1` << `5`)) != `0`;
1760	#endif
1761
1762	/ SSE2 /
1763	drflac__gIsSSE2Supported = drflac_has_sse2();
1764
1765	/ SSE4.1 /
1766	drflac__gIsSSE41Supported = drflac_has_sse41();
1767
1768	/ Initialized. /
1769	isCPUCapsInitialized = DRFLAC_TRUE;
1770	}
1771	}
1772	#else
1773	static drflac_bool32 drflac__gIsNEONSupported = DRFLAC_FALSE;
1774
1775	static DRFLAC_INLINE drflac_bool32 drflac__has_neon(void)
1776	{
1777	#if defined(DRFLAC_SUPPORT_NEON)
1778	#if defined(DRFLAC_ARM) && !defined(DRFLAC_NO_NEON)
1779	#if (defined(__ARM_NEON) \|\| defined(__aarch64__) \|\| defined(_M_ARM64))
1780	return DRFLAC_TRUE; / If the compiler is allowed to freely generate NEON code we can assume support. /
1781	#else
1782	/ TODO: Runtime check. /
1783	return DRFLAC_FALSE;
1784	#endif
1785	#else
1786	return DRFLAC_FALSE; / NEON is only supported on ARM architectures. /
1787	#endif
1788	#else
1789	return DRFLAC_FALSE; / No compiler support. /
1790	#endif
1791	}
1792
1793	DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
1794	{
1795	drflac__gIsNEONSupported = drflac__has_neon();
1796
1797	#if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
1798	drflac__gIsLZCNTSupported = DRFLAC_TRUE;
1799	#endif
1800	}
1801	#endif
1802
1803
1804	/ Endian Management /
1805	static DRFLAC_INLINE drflac_bool32 drflac__is_little_endian(void)
1806	{
1807	#if defined(DRFLAC_X86) \|\| defined(DRFLAC_X64)
1808	return DRFLAC_TRUE;
1809	#elif defined(__BYTE_ORDER) && defined(__LITTLE_ENDIAN) && __BYTE_ORDER == __LITTLE_ENDIAN
1810	return DRFLAC_TRUE;
1811	#else
1812	int n = `1`;
1813	return ((char**)&n) == `1`;
1814	#endif
1815	}
1816
1817	static DRFLAC_INLINE drflac_uint16 drflac__swap_endian_uint16(drflac_uint16 n)
1818	{
1819	#ifdef DRFLAC_HAS_BYTESWAP16_INTRINSIC
1820	#if defined(_MSC_VER) && !defined(__clang__)
1821	return _byteswap_ushort(n);
1822	#elif defined(__GNUC__) \|\| defined(__clang__)
1823	return __builtin_bswap16(n);
1824	#elif defined(__WATCOMC__) && defined(__386__)
1825	return _watcom_bswap16(n);
1826	#else
1827	#error "This compiler does not support the byte swap intrinsic."
1828	#endif
1829	#else
1830	return ((n & `0xFF00`) >> `8`) \|
1831	((n & `0x00FF`) << `8`);
1832	#endif
1833	}
1834
1835	static DRFLAC_INLINE drflac_uint32 drflac__swap_endian_uint32(drflac_uint32 n)
1836	{
1837	#ifdef DRFLAC_HAS_BYTESWAP32_INTRINSIC
1838	#if defined(_MSC_VER) && !defined(__clang__)
1839	return _byteswap_ulong(n);
1840	#elif defined(__GNUC__) \|\| defined(__clang__)
1841	#if defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 6) && !defined(DRFLAC_64BIT) /* <-- 64-bit inline assembly has not been tested, so disabling for now. */
1842	/ Inline assembly optimized implementation for ARM. In my testing, GCC does not generate optimized code with __builtin_bswap32(). /
1843	drflac_uint32 r;
1844	__asm__ __volatile__ (
1845	#if defined(DRFLAC_64BIT)
1846	"rev %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(n) / <-- This is untested. If someone in the community could test this, that would be appreciated! /
1847	#else
1848	"rev %[out], %[in]" : [out]"=r"(r) : [in]"r"(n)
1849	#endif
1850	);
1851	return r;
1852	#else
1853	return __builtin_bswap32(n);
1854	#endif
1855	#elif defined(__WATCOMC__) && defined(__386__)
1856	return _watcom_bswap32(n);
1857	#else
1858	#error "This compiler does not support the byte swap intrinsic."
1859	#endif
1860	#else
1861	return ((n & `0xFF000000`) >> `24`) \|
1862	((n & `0x00FF0000`) >> `8`) \|
1863	((n & `0x0000FF00`) << `8`) \|
1864	((n & `0x000000FF`) << `24`);
1865	#endif
1866	}
1867
1868	static DRFLAC_INLINE drflac_uint64 drflac__swap_endian_uint64(drflac_uint64 n)
1869	{
1870	#ifdef DRFLAC_HAS_BYTESWAP64_INTRINSIC
1871	#if defined(_MSC_VER) && !defined(__clang__)
1872	return _byteswap_uint64(n);
1873	#elif defined(__GNUC__) \|\| defined(__clang__)
1874	return __builtin_bswap64(n);
1875	#elif defined(__WATCOMC__) && defined(__386__)
1876	return _watcom_bswap64(n);
1877	#else
1878	#error "This compiler does not support the byte swap intrinsic."
1879	#endif
1880	#else
1881	/ Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. /
1882	return ((n & ((drflac_uint64)`0xFF000000` << `32`)) >> `56`) \|
1883	((n & ((drflac_uint64)`0x00FF0000` << `32`)) >> `40`) \|
1884	((n & ((drflac_uint64)`0x0000FF00` << `32`)) >> `24`) \|
1885	((n & ((drflac_uint64)`0x000000FF` << `32`)) >> `8`) \|
1886	((n & ((drflac_uint64)`0xFF000000` )) << `8`) \|
1887	((n & ((drflac_uint64)`0x00FF0000` )) << `24`) \|
1888	((n & ((drflac_uint64)`0x0000FF00` )) << `40`) \|
1889	((n & ((drflac_uint64)`0x000000FF` )) << `56`);
1890	#endif
1891	}
1892
1893
1894	static DRFLAC_INLINE drflac_uint16 drflac__be2host_16(drflac_uint16 n)
1895	{
1896	if (drflac__is_little_endian()) {
1897	return drflac__swap_endian_uint16(n);
1898	}
1899
1900	return n;
1901	}
1902
1903	static DRFLAC_INLINE drflac_uint32 drflac__be2host_32(drflac_uint32 n)
1904	{
1905	if (drflac__is_little_endian()) {
1906	return drflac__swap_endian_uint32(n);
1907	}
1908
1909	return n;
1910	}
1911
1912	static DRFLAC_INLINE drflac_uint64 drflac__be2host_64(drflac_uint64 n)
1913	{
1914	if (drflac__is_little_endian()) {
1915	return drflac__swap_endian_uint64(n);
1916	}
1917
1918	return n;
1919	}
1920
1921
1922	static DRFLAC_INLINE drflac_uint32 drflac__le2host_32(drflac_uint32 n)
1923	{
1924	if (!drflac__is_little_endian()) {
1925	return drflac__swap_endian_uint32(n);
1926	}
1927
1928	return n;
1929	}
1930
1931
1932	static DRFLAC_INLINE drflac_uint32 drflac__unsynchsafe_32(drflac_uint32 n)
1933	{
1934	drflac_uint32 result = `0`;
1935	result \|= (n & `0x7F000000`) >> `3`;
1936	result \|= (n & `0x007F0000`) >> `2`;
1937	result \|= (n & `0x00007F00`) >> `1`;
1938	result \|= (n & `0x0000007F`) >> `0`;
1939
1940	return result;
1941	}
1942
1943
1944
1945	/ The CRC code below is based on this document: http://zlib.net/crc_v3.txt /
1946	static drflac_uint8 drflac__crc8_table[] = {
1947	`0x00`, `0x07`, `0x0E`, `0x09`, `0x1C`, `0x1B`, `0x12`, `0x15`, `0x38`, `0x3F`, `0x36`, `0x31`, `0x24`, `0x23`, `0x2A`, `0x2D`,
1948	`0x70`, `0x77`, `0x7E`, `0x79`, `0x6C`, `0x6B`, `0x62`, `0x65`, `0x48`, `0x4F`, `0x46`, `0x41`, `0x54`, `0x53`, `0x5A`, `0x5D`,
1949	`0xE0`, `0xE7`, `0xEE`, `0xE9`, `0xFC`, `0xFB`, `0xF2`, `0xF5`, `0xD8`, `0xDF`, `0xD6`, `0xD1`, `0xC4`, `0xC3`, `0xCA`, `0xCD`,
1950	`0x90`, `0x97`, `0x9E`, `0x99`, `0x8C`, `0x8B`, `0x82`, `0x85`, `0xA8`, `0xAF`, `0xA6`, `0xA1`, `0xB4`, `0xB3`, `0xBA`, `0xBD`,
1951	`0xC7`, `0xC0`, `0xC9`, `0xCE`, `0xDB`, `0xDC`, `0xD5`, `0xD2`, `0xFF`, `0xF8`, `0xF1`, `0xF6`, `0xE3`, `0xE4`, `0xED`, `0xEA`,
1952	`0xB7`, `0xB0`, `0xB9`, `0xBE`, `0xAB`, `0xAC`, `0xA5`, `0xA2`, `0x8F`, `0x88`, `0x81`, `0x86`, `0x93`, `0x94`, `0x9D`, `0x9A`,
1953	`0x27`, `0x20`, `0x29`, `0x2E`, `0x3B`, `0x3C`, `0x35`, `0x32`, `0x1F`, `0x18`, `0x11`, `0x16`, `0x03`, `0x04`, `0x0D`, `0x0A`,
1954	`0x57`, `0x50`, `0x59`, `0x5E`, `0x4B`, `0x4C`, `0x45`, `0x42`, `0x6F`, `0x68`, `0x61`, `0x66`, `0x73`, `0x74`, `0x7D`, `0x7A`,
1955	`0x89`, `0x8E`, `0x87`, `0x80`, `0x95`, `0x92`, `0x9B`, `0x9C`, `0xB1`, `0xB6`, `0xBF`, `0xB8`, `0xAD`, `0xAA`, `0xA3`, `0xA4`,
1956	`0xF9`, `0xFE`, `0xF7`, `0xF0`, `0xE5`, `0xE2`, `0xEB`, `0xEC`, `0xC1`, `0xC6`, `0xCF`, `0xC8`, `0xDD`, `0xDA`, `0xD3`, `0xD4`,
1957	`0x69`, `0x6E`, `0x67`, `0x60`, `0x75`, `0x72`, `0x7B`, `0x7C`, `0x51`, `0x56`, `0x5F`, `0x58`, `0x4D`, `0x4A`, `0x43`, `0x44`,
1958	`0x19`, `0x1E`, `0x17`, `0x10`, `0x05`, `0x02`, `0x0B`, `0x0C`, `0x21`, `0x26`, `0x2F`, `0x28`, `0x3D`, `0x3A`, `0x33`, `0x34`,
1959	`0x4E`, `0x49`, `0x40`, `0x47`, `0x52`, `0x55`, `0x5C`, `0x5B`, `0x76`, `0x71`, `0x78`, `0x7F`, `0x6A`, `0x6D`, `0x64`, `0x63`,
1960	`0x3E`, `0x39`, `0x30`, `0x37`, `0x22`, `0x25`, `0x2C`, `0x2B`, `0x06`, `0x01`, `0x08`, `0x0F`, `0x1A`, `0x1D`, `0x14`, `0x13`,
1961	`0xAE`, `0xA9`, `0xA0`, `0xA7`, `0xB2`, `0xB5`, `0xBC`, `0xBB`, `0x96`, `0x91`, `0x98`, `0x9F`, `0x8A`, `0x8D`, `0x84`, `0x83`,
1962	`0xDE`, `0xD9`, `0xD0`, `0xD7`, `0xC2`, `0xC5`, `0xCC`, `0xCB`, `0xE6`, `0xE1`, `0xE8`, `0xEF`, `0xFA`, `0xFD`, `0xF4`, `0xF3`
1963	};
1964
1965	static drflac_uint16 drflac__crc16_table[] = {
1966	`0x0000`, `0x8005`, `0x800F`, `0x000A`, `0x801B`, `0x001E`, `0x0014`, `0x8011`,
1967	`0x8033`, `0x0036`, `0x003C`, `0x8039`, `0x0028`, `0x802D`, `0x8027`, `0x0022`,
1968	`0x8063`, `0x0066`, `0x006C`, `0x8069`, `0x0078`, `0x807D`, `0x8077`, `0x0072`,
1969	`0x0050`, `0x8055`, `0x805F`, `0x005A`, `0x804B`, `0x004E`, `0x0044`, `0x8041`,
1970	`0x80C3`, `0x00C6`, `0x00CC`, `0x80C9`, `0x00D8`, `0x80DD`, `0x80D7`, `0x00D2`,
1971	`0x00F0`, `0x80F5`, `0x80FF`, `0x00FA`, `0x80EB`, `0x00EE`, `0x00E4`, `0x80E1`,
1972	`0x00A0`, `0x80A5`, `0x80AF`, `0x00AA`, `0x80BB`, `0x00BE`, `0x00B4`, `0x80B1`,
1973	`0x8093`, `0x0096`, `0x009C`, `0x8099`, `0x0088`, `0x808D`, `0x8087`, `0x0082`,
1974	`0x8183`, `0x0186`, `0x018C`, `0x8189`, `0x0198`, `0x819D`, `0x8197`, `0x0192`,
1975	`0x01B0`, `0x81B5`, `0x81BF`, `0x01BA`, `0x81AB`, `0x01AE`, `0x01A4`, `0x81A1`,
1976	`0x01E0`, `0x81E5`, `0x81EF`, `0x01EA`, `0x81FB`, `0x01FE`, `0x01F4`, `0x81F1`,
1977	`0x81D3`, `0x01D6`, `0x01DC`, `0x81D9`, `0x01C8`, `0x81CD`, `0x81C7`, `0x01C2`,
1978	`0x0140`, `0x8145`, `0x814F`, `0x014A`, `0x815B`, `0x015E`, `0x0154`, `0x8151`,
1979	`0x8173`, `0x0176`, `0x017C`, `0x8179`, `0x0168`, `0x816D`, `0x8167`, `0x0162`,
1980	`0x8123`, `0x0126`, `0x012C`, `0x8129`, `0x0138`, `0x813D`, `0x8137`, `0x0132`,
1981	`0x0110`, `0x8115`, `0x811F`, `0x011A`, `0x810B`, `0x010E`, `0x0104`, `0x8101`,
1982	`0x8303`, `0x0306`, `0x030C`, `0x8309`, `0x0318`, `0x831D`, `0x8317`, `0x0312`,
1983	`0x0330`, `0x8335`, `0x833F`, `0x033A`, `0x832B`, `0x032E`, `0x0324`, `0x8321`,
1984	`0x0360`, `0x8365`, `0x836F`, `0x036A`, `0x837B`, `0x037E`, `0x0374`, `0x8371`,
1985	`0x8353`, `0x0356`, `0x035C`, `0x8359`, `0x0348`, `0x834D`, `0x8347`, `0x0342`,
1986	`0x03C0`, `0x83C5`, `0x83CF`, `0x03CA`, `0x83DB`, `0x03DE`, `0x03D4`, `0x83D1`,
1987	`0x83F3`, `0x03F6`, `0x03FC`, `0x83F9`, `0x03E8`, `0x83ED`, `0x83E7`, `0x03E2`,
1988	`0x83A3`, `0x03A6`, `0x03AC`, `0x83A9`, `0x03B8`, `0x83BD`, `0x83B7`, `0x03B2`,
1989	`0x0390`, `0x8395`, `0x839F`, `0x039A`, `0x838B`, `0x038E`, `0x0384`, `0x8381`,
1990	`0x0280`, `0x8285`, `0x828F`, `0x028A`, `0x829B`, `0x029E`, `0x0294`, `0x8291`,
1991	`0x82B3`, `0x02B6`, `0x02BC`, `0x82B9`, `0x02A8`, `0x82AD`, `0x82A7`, `0x02A2`,
1992	`0x82E3`, `0x02E6`, `0x02EC`, `0x82E9`, `0x02F8`, `0x82FD`, `0x82F7`, `0x02F2`,
1993	`0x02D0`, `0x82D5`, `0x82DF`, `0x02DA`, `0x82CB`, `0x02CE`, `0x02C4`, `0x82C1`,
1994	`0x8243`, `0x0246`, `0x024C`, `0x8249`, `0x0258`, `0x825D`, `0x8257`, `0x0252`,
1995	`0x0270`, `0x8275`, `0x827F`, `0x027A`, `0x826B`, `0x026E`, `0x0264`, `0x8261`,
1996	`0x0220`, `0x8225`, `0x822F`, `0x022A`, `0x823B`, `0x023E`, `0x0234`, `0x8231`,
1997	`0x8213`, `0x0216`, `0x021C`, `0x8219`, `0x0208`, `0x820D`, `0x8207`, `0x0202`
1998	};
1999
2000	static DRFLAC_INLINE drflac_uint8 drflac_crc8_byte(drflac_uint8 crc, drflac_uint8 data)
2001	{
2002	return drflac__crc8_table[crc ^ data];
2003	}
2004
2005	static DRFLAC_INLINE drflac_uint8 drflac_crc8(drflac_uint8 crc, drflac_uint32 data, drflac_uint32 count)
2006	{
2007	#ifdef DR_FLAC_NO_CRC
2008	(void)crc;
2009	(void)data;
2010	(void)count;
2011	return `0`;
2012	#else
2013	#if 0
2014	/ REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc8(crc, 0, 8);") /
2015	drflac_uint8 p = `0x07`;
2016	for (int i = count-`1`; i >= `0`; --i) {
2017	drflac_uint8 bit = (data & (`1` << i)) >> i;
2018	if (crc & `0x80`) {
2019	crc = ((crc << `1`) \| bit) ^ p;
2020	} else {
2021	crc = ((crc << `1`) \| bit);
2022	}
2023	}
2024	return crc;
2025	#else
2026	drflac_uint32 wholeBytes;
2027	drflac_uint32 leftoverBits;
2028	drflac_uint64 leftoverDataMask;
2029
2030	static drflac_uint64 leftoverDataMaskTable[`8`] = {
2031	`0x00`, `0x01`, `0x03`, `0x07`, `0x0F`, `0x1F`, `0x3F`, `0x7F`
2032	};
2033
2034	DRFLAC_ASSERT(count <= `32`);
2035
2036	wholeBytes = count >> `3`;
2037	leftoverBits = count - (wholeBytes*`8`);
2038	leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2039
2040	switch (wholeBytes) {
2041	case `4`: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (`0xFF000000UL` << leftoverBits)) >> (`24` + leftoverBits)));
2042	case `3`: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (`0x00FF0000UL` << leftoverBits)) >> (`16` + leftoverBits)));
2043	case `2`: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (`0x0000FF00UL` << leftoverBits)) >> ( `8` + leftoverBits)));
2044	case `1`: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (`0x000000FFUL` << leftoverBits)) >> ( `0` + leftoverBits)));
2045	case `0`: if (leftoverBits > `0`) crc = (drflac_uint8)((crc << leftoverBits) ^ drflac__crc8_table[(crc >> (`8` - leftoverBits)) ^ (data & leftoverDataMask)]);
2046	}
2047	return crc;
2048	#endif
2049	#endif
2050	}
2051
2052	static DRFLAC_INLINE drflac_uint16 drflac_crc16_byte(drflac_uint16 crc, drflac_uint8 data)
2053	{
2054	return (crc << `8`) ^ drflac__crc16_table[(drflac_uint8)(crc >> `8`) ^ data];
2055	}
2056
2057	static DRFLAC_INLINE drflac_uint16 drflac_crc16_cache(drflac_uint16 crc, drflac_cache_t data)
2058	{
2059	#ifdef DRFLAC_64BIT
2060	crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> `56`) & `0xFF`));
2061	crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> `48`) & `0xFF`));
2062	crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> `40`) & `0xFF`));
2063	crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> `32`) & `0xFF`));
2064	#endif
2065	crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> `24`) & `0xFF`));
2066	crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> `16`) & `0xFF`));
2067	crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> `8`) & `0xFF`));
2068	crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> `0`) & `0xFF`));
2069
2070	return crc;
2071	}
2072
2073	static DRFLAC_INLINE drflac_uint16 drflac_crc16_bytes(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 byteCount)
2074	{
2075	switch (byteCount)
2076	{
2077	#ifdef DRFLAC_64BIT
2078	case `8`: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> `56`) & `0xFF`));
2079	case `7`: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> `48`) & `0xFF`));
2080	case `6`: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> `40`) & `0xFF`));
2081	case `5`: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> `32`) & `0xFF`));
2082	#endif
2083	case `4`: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> `24`) & `0xFF`));
2084	case `3`: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> `16`) & `0xFF`));
2085	case `2`: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> `8`) & `0xFF`));
2086	case `1`: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> `0`) & `0xFF`));
2087	}
2088
2089	return crc;
2090	}
2091
2092	#if 0
2093	static DRFLAC_INLINE drflac_uint16 drflac_crc16__32bit(drflac_uint16 crc, drflac_uint32 data, drflac_uint32 count)
2094	{
2095	#ifdef DR_FLAC_NO_CRC
2096	(void)crc;
2097	(void)data;
2098	(void)count;
2099	return `0`;
2100	#else
2101	#if 0
2102	/ REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc16(crc, 0, 16);") /
2103	drflac_uint16 p = `0x8005`;
2104	for (int i = count-`1`; i >= `0`; --i) {
2105	drflac_uint16 bit = (data & (`1ULL` << i)) >> i;
2106	if (r & `0x8000`) {
2107	r = ((r << `1`) \| bit) ^ p;
2108	} else {
2109	r = ((r << `1`) \| bit);
2110	}
2111	}
2112
2113	return crc;
2114	#else
2115	drflac_uint32 wholeBytes;
2116	drflac_uint32 leftoverBits;
2117	drflac_uint64 leftoverDataMask;
2118
2119	static drflac_uint64 leftoverDataMaskTable[`8`] = {
2120	`0x00`, `0x01`, `0x03`, `0x07`, `0x0F`, `0x1F`, `0x3F`, `0x7F`
2121	};
2122
2123	DRFLAC_ASSERT(count <= `64`);
2124
2125	wholeBytes = count >> `3`;
2126	leftoverBits = count & `7`;
2127	leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2128
2129	switch (wholeBytes) {
2130	default:
2131	case `4`: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (`0xFF000000UL` << leftoverBits)) >> (`24` + leftoverBits)));
2132	case `3`: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (`0x00FF0000UL` << leftoverBits)) >> (`16` + leftoverBits)));
2133	case `2`: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (`0x0000FF00UL` << leftoverBits)) >> ( `8` + leftoverBits)));
2134	case `1`: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (`0x000000FFUL` << leftoverBits)) >> ( `0` + leftoverBits)));
2135	case `0`: if (leftoverBits > `0`) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (`16` - leftoverBits)) ^ (data & leftoverDataMask)];
2136	}
2137	return crc;
2138	#endif
2139	#endif
2140	}
2141
2142	static DRFLAC_INLINE drflac_uint16 drflac_crc16__64bit(drflac_uint16 crc, drflac_uint64 data, drflac_uint32 count)
2143	{
2144	#ifdef DR_FLAC_NO_CRC
2145	(void)crc;
2146	(void)data;
2147	(void)count;
2148	return `0`;
2149	#else
2150	drflac_uint32 wholeBytes;
2151	drflac_uint32 leftoverBits;
2152	drflac_uint64 leftoverDataMask;
2153
2154	static drflac_uint64 leftoverDataMaskTable[`8`] = {
2155	`0x00`, `0x01`, `0x03`, `0x07`, `0x0F`, `0x1F`, `0x3F`, `0x7F`
2156	};
2157
2158	DRFLAC_ASSERT(count <= `64`);
2159
2160	wholeBytes = count >> `3`;
2161	leftoverBits = count & `7`;
2162	leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2163
2164	switch (wholeBytes) {
2165	default:
2166	case `8`: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)`0xFF000000` << `32`) << leftoverBits)) >> (`56` + leftoverBits))); / Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. /
2167	case `7`: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)`0x00FF0000` << `32`) << leftoverBits)) >> (`48` + leftoverBits)));
2168	case `6`: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)`0x0000FF00` << `32`) << leftoverBits)) >> (`40` + leftoverBits)));
2169	case `5`: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)`0x000000FF` << `32`) << leftoverBits)) >> (`32` + leftoverBits)));
2170	case `4`: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)`0xFF000000` ) << leftoverBits)) >> (`24` + leftoverBits)));
2171	case `3`: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)`0x00FF0000` ) << leftoverBits)) >> (`16` + leftoverBits)));
2172	case `2`: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)`0x0000FF00` ) << leftoverBits)) >> ( `8` + leftoverBits)));
2173	case `1`: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)`0x000000FF` ) << leftoverBits)) >> ( `0` + leftoverBits)));
2174	case `0`: if (leftoverBits > `0`) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (`16` - leftoverBits)) ^ (data & leftoverDataMask)];
2175	}
2176	return crc;
2177	#endif
2178	}
2179
2180
2181	static DRFLAC_INLINE drflac_uint16 drflac_crc16(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 count)
2182	{
2183	#ifdef DRFLAC_64BIT
2184	return drflac_crc16__64bit(crc, data, count);
2185	#else
2186	return drflac_crc16__32bit(crc, data, count);
2187	#endif
2188	}
2189	#endif
2190
2191
2192	#ifdef DRFLAC_64BIT
2193	#define drflac__be2host__cache_line drflac__be2host_64
2194	#else
2195	#define drflac__be2host__cache_line drflac__be2host_32
2196	#endif
2197
2198	/*
2199	BIT READING ATTEMPT #2
2200
2201	This uses a 32- or 64-bit bit-shifted cache - as bits are read, the cache is shifted such that the first valid bit is sitting
2202	on the most significant bit. It uses the notion of an L1 and L2 cache (borrowed from CPU architecture), where the L1 cache
2203	is a 32- or 64-bit unsigned integer (depending on whether or not a 32- or 64-bit build is being compiled) and the L2 is an
2204	array of "cache lines", with each cache line being the same size as the L1. The L2 is a buffer of about 4KB and is where data
2205	from onRead() is read into.
2206	*/
2207	#define DRFLAC_CACHE_L1_SIZE_BYTES(bs) (sizeof((bs)->cache))
2208	#define DRFLAC_CACHE_L1_SIZE_BITS(bs) (sizeof((bs)->cache)*8)
2209	#define DRFLAC_CACHE_L1_BITS_REMAINING(bs) (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (bs)->consumedBits)
2210	#define DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount) (~((~(drflac_cache_t)0) >> (_bitCount)))
2211	#define DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, _bitCount) (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (_bitCount))
2212	#define DRFLAC_CACHE_L1_SELECT(bs, _bitCount) (((bs)->cache) & DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount))
2213	#define DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, _bitCount) (DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)))
2214	#define DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, _bitCount)(DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> (DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)) & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1)))
2215	#define DRFLAC_CACHE_L2_SIZE_BYTES(bs) (sizeof((bs)->cacheL2))
2216	#define DRFLAC_CACHE_L2_LINE_COUNT(bs) (DRFLAC_CACHE_L2_SIZE_BYTES(bs) / sizeof((bs)->cacheL2[0]))
2217	#define DRFLAC_CACHE_L2_LINES_REMAINING(bs) (DRFLAC_CACHE_L2_LINE_COUNT(bs) - (bs)->nextL2Line)
2218
2219
2220	#ifndef DR_FLAC_NO_CRC
2221	static DRFLAC_INLINE void drflac__reset_crc16(drflac_bs* bs)
2222	{
2223	bs->crc16 = `0`;
2224	bs->crc16CacheIgnoredBytes = bs->consumedBits >> `3`;
2225	}
2226
2227	static DRFLAC_INLINE void drflac__update_crc16(drflac_bs* bs)
2228	{
2229	if (bs->crc16CacheIgnoredBytes == `0`) {
2230	bs->crc16 = drflac_crc16_cache(bs->crc16, bs->crc16Cache);
2231	} else {
2232	bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache, DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bs->crc16CacheIgnoredBytes);
2233	bs->crc16CacheIgnoredBytes = `0`;
2234	}
2235	}
2236
2237	static DRFLAC_INLINE drflac_uint16 drflac__flush_crc16(drflac_bs* bs)
2238	{
2239	/ We should never be flushing in a situation where we are not aligned on a byte boundary. /
2240	DRFLAC_ASSERT((DRFLAC_CACHE_L1_BITS_REMAINING(bs) & `7`) == `0`);
2241
2242	/*
2243	The bits that were read from the L1 cache need to be accumulated. The number of bytes needing to be accumulated is determined
2244	by the number of bits that have been consumed.
2245	*/
2246	if (DRFLAC_CACHE_L1_BITS_REMAINING(bs) == `0`) {
2247	drflac__update_crc16(bs);
2248	} else {
2249	/ We only accumulate the consumed bits. /
2250	bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache >> DRFLAC_CACHE_L1_BITS_REMAINING(bs), (bs->consumedBits >> `3`) - bs->crc16CacheIgnoredBytes);
2251
2252	/*
2253	The bits that we just accumulated should never be accumulated again. We need to keep track of how many bytes were accumulated
2254	so we can handle that later.
2255	*/
2256	bs->crc16CacheIgnoredBytes = bs->consumedBits >> `3`;
2257	}
2258
2259	return bs->crc16;
2260	}
2261	#endif
2262
2263	static DRFLAC_INLINE drflac_bool32 drflac__reload_l1_cache_from_l2(drflac_bs* bs)
2264	{
2265	size_t bytesRead;
2266	size_t alignedL1LineCount;
2267
2268	/ Fast path. Try loading straight from L2. /
2269	if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
2270	bs->cache = bs->cacheL2[bs->nextL2Line++];
2271	return DRFLAC_TRUE;
2272	}
2273
2274	/*
2275	If we get here it means we've run out of data in the L2 cache. We'll need to fetch more from the client, if there's
2276	any left.
2277	*/
2278	if (bs->unalignedByteCount > `0`) {
2279	return DRFLAC_FALSE; / If we have any unaligned bytes it means there's no more aligned bytes left in the client. /
2280	}
2281
2282	bytesRead = bs->onRead(bs->pUserData, bs->cacheL2, DRFLAC_CACHE_L2_SIZE_BYTES(bs));
2283
2284	bs->nextL2Line = `0`;
2285	if (bytesRead == DRFLAC_CACHE_L2_SIZE_BYTES(bs)) {
2286	bs->cache = bs->cacheL2[bs->nextL2Line++];
2287	return DRFLAC_TRUE;
2288	}
2289
2290
2291	/*
2292	If we get here it means we were unable to retrieve enough data to fill the entire L2 cache. It probably
2293	means we've just reached the end of the file. We need to move the valid data down to the end of the buffer
2294	and adjust the index of the next line accordingly. Also keep in mind that the L2 cache must be aligned to
2295	the size of the L1 so we'll need to seek backwards by any misaligned bytes.
2296	*/
2297	alignedL1LineCount = bytesRead / DRFLAC_CACHE_L1_SIZE_BYTES(bs);
2298
2299	/ We need to keep track of any unaligned bytes for later use. /
2300	bs->unalignedByteCount = bytesRead - (alignedL1LineCount * DRFLAC_CACHE_L1_SIZE_BYTES(bs));
2301	if (bs->unalignedByteCount > `0`) {
2302	bs->unalignedCache = bs->cacheL2[alignedL1LineCount];
2303	}
2304
2305	if (alignedL1LineCount > `0`) {
2306	size_t offset = DRFLAC_CACHE_L2_LINE_COUNT(bs) - alignedL1LineCount;
2307	size_t i;
2308	for (i = alignedL1LineCount; i > `0`; --i) {
2309	bs->cacheL2[i-`1` + offset] = bs->cacheL2[i-`1`];
2310	}
2311
2312	bs->nextL2Line = (drflac_uint32)offset;
2313	bs->cache = bs->cacheL2[bs->nextL2Line++];
2314	return DRFLAC_TRUE;
2315	} else {
2316	/ If we get into this branch it means we weren't able to load any L1-aligned data. /
2317	bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs);
2318	return DRFLAC_FALSE;
2319	}
2320	}
2321
2322	static drflac_bool32 drflac__reload_cache(drflac_bs* bs)
2323	{
2324	size_t bytesRead;
2325
2326	#ifndef DR_FLAC_NO_CRC
2327	drflac__update_crc16(bs);
2328	#endif
2329
2330	/ Fast path. Try just moving the next value in the L2 cache to the L1 cache. /
2331	if (drflac__reload_l1_cache_from_l2(bs)) {
2332	bs->cache = drflac__be2host__cache_line(bs->cache);
2333	bs->consumedBits = `0`;
2334	#ifndef DR_FLAC_NO_CRC
2335	bs->crc16Cache = bs->cache;
2336	#endif
2337	return DRFLAC_TRUE;
2338	}
2339
2340	/ Slow path. /
2341
2342	/*
2343	If we get here it means we have failed to load the L1 cache from the L2. Likely we've just reached the end of the stream and the last
2344	few bytes did not meet the alignment requirements for the L2 cache. In this case we need to fall back to a slower path and read the
2345	data from the unaligned cache.
2346	*/
2347	bytesRead = bs->unalignedByteCount;
2348	if (bytesRead == `0`) {
2349	bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs); / <-- The stream has been exhausted, so marked the bits as consumed. /
2350	return DRFLAC_FALSE;
2351	}
2352
2353	DRFLAC_ASSERT(bytesRead < DRFLAC_CACHE_L1_SIZE_BYTES(bs));
2354	bs->consumedBits = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bytesRead) * `8`;
2355
2356	bs->cache = drflac__be2host__cache_line(bs->unalignedCache);
2357	bs->cache &= DRFLAC_CACHE_L1_SELECTION_MASK(DRFLAC_CACHE_L1_BITS_REMAINING(bs)); / <-- Make sure the consumed bits are always set to zero. Other parts of the library depend on this property. /
2358	bs->unalignedByteCount = `0`; / <-- At this point the unaligned bytes have been moved into the cache and we thus have no more unaligned bytes. /
2359
2360	#ifndef DR_FLAC_NO_CRC
2361	bs->crc16Cache = bs->cache >> bs->consumedBits;
2362	bs->crc16CacheIgnoredBytes = bs->consumedBits >> `3`;
2363	#endif
2364	return DRFLAC_TRUE;
2365	}
2366
2367	static void drflac__reset_cache(drflac_bs* bs)
2368	{
2369	bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs); / <-- This clears the L2 cache. /
2370	bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs); / <-- This clears the L1 cache. /
2371	bs->cache = `0`;
2372	bs->unalignedByteCount = `0`; / <-- This clears the trailing unaligned bytes. /
2373	bs->unalignedCache = `0`;
2374
2375	#ifndef DR_FLAC_NO_CRC
2376	bs->crc16Cache = `0`;
2377	bs->crc16CacheIgnoredBytes = `0`;
2378	#endif
2379	}
2380
2381
2382	static DRFLAC_INLINE drflac_bool32 drflac__read_uint32(drflac_bs* bs, unsigned int bitCount, drflac_uint32* pResultOut)
2383	{
2384	DRFLAC_ASSERT(bs != NULL);
2385	DRFLAC_ASSERT(pResultOut != NULL);
2386	DRFLAC_ASSERT(bitCount > `0`);
2387	DRFLAC_ASSERT(bitCount <= `32`);
2388
2389	if (bs->consumedBits == DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2390	if (!drflac__reload_cache(bs)) {
2391	return DRFLAC_FALSE;
2392	}
2393	}
2394
2395	if (bitCount <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2396	/*
2397	If we want to load all 32-bits from a 32-bit cache we need to do it slightly differently because we can't do
2398	a 32-bit shift on a 32-bit integer. This will never be the case on 64-bit caches, so we can have a slightly
2399	more optimal solution for this.
2400	*/
2401	#ifdef DRFLAC_64BIT
2402	*pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
2403	bs->consumedBits += bitCount;
2404	bs->cache <<= bitCount;
2405	#else
2406	if (bitCount < DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2407	*pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
2408	bs->consumedBits += bitCount;
2409	bs->cache <<= bitCount;
2410	} else {
2411	/ Cannot shift by 32-bits, so need to do it differently. /
2412	*pResultOut = (drflac_uint32)bs->cache;
2413	bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs);
2414	bs->cache = `0`;
2415	}
2416	#endif
2417
2418	return DRFLAC_TRUE;
2419	} else {
2420	/ It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. /
2421	drflac_uint32 bitCountHi = DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2422	drflac_uint32 bitCountLo = bitCount - bitCountHi;
2423	drflac_uint32 resultHi;
2424
2425	DRFLAC_ASSERT(bitCountHi > `0`);
2426	DRFLAC_ASSERT(bitCountHi < `32`);
2427	resultHi = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountHi);
2428
2429	if (!drflac__reload_cache(bs)) {
2430	return DRFLAC_FALSE;
2431	}
2432
2433	*pResultOut = (resultHi << bitCountLo) \| (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountLo);
2434	bs->consumedBits += bitCountLo;
2435	bs->cache <<= bitCountLo;
2436	return DRFLAC_TRUE;
2437	}
2438	}
2439
2440	static drflac_bool32 drflac__read_int32(drflac_bs* bs, unsigned int bitCount, drflac_int32* pResult)
2441	{
2442	drflac_uint32 result;
2443
2444	DRFLAC_ASSERT(bs != NULL);
2445	DRFLAC_ASSERT(pResult != NULL);
2446	DRFLAC_ASSERT(bitCount > `0`);
2447	DRFLAC_ASSERT(bitCount <= `32`);
2448
2449	if (!drflac__read_uint32(bs, bitCount, &result)) {
2450	return DRFLAC_FALSE;
2451	}
2452
2453	/ Do not attempt to shift by 32 as it's undefined. /
2454	if (bitCount < `32`) {
2455	drflac_uint32 signbit;
2456	signbit = ((result >> (bitCount-`1`)) & `0x01`);
2457	result \|= (~signbit + `1`) << bitCount;
2458	}
2459
2460	*pResult = (drflac_int32)result;
2461	return DRFLAC_TRUE;
2462	}
2463
2464	#ifdef DRFLAC_64BIT
2465	static drflac_bool32 drflac__read_uint64(drflac_bs* bs, unsigned int bitCount, drflac_uint64* pResultOut)
2466	{
2467	drflac_uint32 resultHi;
2468	drflac_uint32 resultLo;
2469
2470	DRFLAC_ASSERT(bitCount <= `64`);
2471	DRFLAC_ASSERT(bitCount > `32`);
2472
2473	if (!drflac__read_uint32(bs, bitCount - `32`, &resultHi)) {
2474	return DRFLAC_FALSE;
2475	}
2476
2477	if (!drflac__read_uint32(bs, `32`, &resultLo)) {
2478	return DRFLAC_FALSE;
2479	}
2480
2481	*pResultOut = (((drflac_uint64)resultHi) << `32`) \| ((drflac_uint64)resultLo);
2482	return DRFLAC_TRUE;
2483	}
2484	#endif
2485
2486	/ Function below is unused, but leaving it here in case I need to quickly add it again. /
2487	#if 0
2488	static drflac_bool32 drflac__read_int64(drflac_bs* bs, unsigned int bitCount, drflac_int64* pResultOut)
2489	{
2490	drflac_uint64 result;
2491	drflac_uint64 signbit;
2492
2493	DRFLAC_ASSERT(bitCount <= `64`);
2494
2495	if (!drflac__read_uint64(bs, bitCount, &result)) {
2496	return DRFLAC_FALSE;
2497	}
2498
2499	signbit = ((result >> (bitCount-`1`)) & `0x01`);
2500	result \|= (~signbit + `1`) << bitCount;
2501
2502	*pResultOut = (drflac_int64)result;
2503	return DRFLAC_TRUE;
2504	}
2505	#endif
2506
2507	static drflac_bool32 drflac__read_uint16(drflac_bs* bs, unsigned int bitCount, drflac_uint16* pResult)
2508	{
2509	drflac_uint32 result;
2510
2511	DRFLAC_ASSERT(bs != NULL);
2512	DRFLAC_ASSERT(pResult != NULL);
2513	DRFLAC_ASSERT(bitCount > `0`);
2514	DRFLAC_ASSERT(bitCount <= `16`);
2515
2516	if (!drflac__read_uint32(bs, bitCount, &result)) {
2517	return DRFLAC_FALSE;
2518	}
2519
2520	*pResult = (drflac_uint16)result;
2521	return DRFLAC_TRUE;
2522	}
2523
2524	#if 0
2525	static drflac_bool32 drflac__read_int16(drflac_bs* bs, unsigned int bitCount, drflac_int16* pResult)
2526	{
2527	drflac_int32 result;
2528
2529	DRFLAC_ASSERT(bs != NULL);
2530	DRFLAC_ASSERT(pResult != NULL);
2531	DRFLAC_ASSERT(bitCount > `0`);
2532	DRFLAC_ASSERT(bitCount <= `16`);
2533
2534	if (!drflac__read_int32(bs, bitCount, &result)) {
2535	return DRFLAC_FALSE;
2536	}
2537
2538	*pResult = (drflac_int16)result;
2539	return DRFLAC_TRUE;
2540	}
2541	#endif
2542
2543	static drflac_bool32 drflac__read_uint8(drflac_bs* bs, unsigned int bitCount, drflac_uint8* pResult)
2544	{
2545	drflac_uint32 result;
2546
2547	DRFLAC_ASSERT(bs != NULL);
2548	DRFLAC_ASSERT(pResult != NULL);
2549	DRFLAC_ASSERT(bitCount > `0`);
2550	DRFLAC_ASSERT(bitCount <= `8`);
2551
2552	if (!drflac__read_uint32(bs, bitCount, &result)) {
2553	return DRFLAC_FALSE;
2554	}
2555
2556	*pResult = (drflac_uint8)result;
2557	return DRFLAC_TRUE;
2558	}
2559
2560	static drflac_bool32 drflac__read_int8(drflac_bs* bs, unsigned int bitCount, drflac_int8* pResult)
2561	{
2562	drflac_int32 result;
2563
2564	DRFLAC_ASSERT(bs != NULL);
2565	DRFLAC_ASSERT(pResult != NULL);
2566	DRFLAC_ASSERT(bitCount > `0`);
2567	DRFLAC_ASSERT(bitCount <= `8`);
2568
2569	if (!drflac__read_int32(bs, bitCount, &result)) {
2570	return DRFLAC_FALSE;
2571	}
2572
2573	*pResult = (drflac_int8)result;
2574	return DRFLAC_TRUE;
2575	}
2576
2577
2578	static drflac_bool32 drflac__seek_bits(drflac_bs* bs, size_t bitsToSeek)
2579	{
2580	if (bitsToSeek <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2581	bs->consumedBits += (drflac_uint32)bitsToSeek;
2582	bs->cache <<= bitsToSeek;
2583	return DRFLAC_TRUE;
2584	} else {
2585	/ It straddles the cached data. This function isn't called too frequently so I'm favouring simplicity here. /
2586	bitsToSeek -= DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2587	bs->consumedBits += DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2588	bs->cache = `0`;
2589
2590	/ Simple case. Seek in groups of the same number as bits that fit within a cache line. /
2591	#ifdef DRFLAC_64BIT
2592	while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2593	drflac_uint64 bin;
2594	if (!drflac__read_uint64(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
2595	return DRFLAC_FALSE;
2596	}
2597	bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
2598	}
2599	#else
2600	while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2601	drflac_uint32 bin;
2602	if (!drflac__read_uint32(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
2603	return DRFLAC_FALSE;
2604	}
2605	bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
2606	}
2607	#endif
2608
2609	/ Whole leftover bytes. /
2610	while (bitsToSeek >= `8`) {
2611	drflac_uint8 bin;
2612	if (!drflac__read_uint8(bs, `8`, &bin)) {
2613	return DRFLAC_FALSE;
2614	}
2615	bitsToSeek -= `8`;
2616	}
2617
2618	/ Leftover bits. /
2619	if (bitsToSeek > `0`) {
2620	drflac_uint8 bin;
2621	if (!drflac__read_uint8(bs, (drflac_uint32)bitsToSeek, &bin)) {
2622	return DRFLAC_FALSE;
2623	}
2624	bitsToSeek = `0`; / <-- Necessary for the assert below. /
2625	}
2626
2627	DRFLAC_ASSERT(bitsToSeek == `0`);
2628	return DRFLAC_TRUE;
2629	}
2630	}
2631
2632
2633	/ This function moves the bit streamer to the first bit after the sync code (bit 15 of the of the frame header). It will also update the CRC-16. /
2634	static drflac_bool32 drflac__find_and_seek_to_next_sync_code(drflac_bs* bs)
2635	{
2636	DRFLAC_ASSERT(bs != NULL);
2637
2638	/*
2639	The sync code is always aligned to 8 bits. This is convenient for us because it means we can do byte-aligned movements. The first
2640	thing to do is align to the next byte.
2641	*/
2642	if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & `7`)) {
2643	return DRFLAC_FALSE;
2644	}
2645
2646	for (;;) {
2647	drflac_uint8 hi;
2648
2649	#ifndef DR_FLAC_NO_CRC
2650	drflac__reset_crc16(bs);
2651	#endif
2652
2653	if (!drflac__read_uint8(bs, `8`, &hi)) {
2654	return DRFLAC_FALSE;
2655	}
2656
2657	if (hi == `0xFF`) {
2658	drflac_uint8 lo;
2659	if (!drflac__read_uint8(bs, `6`, &lo)) {
2660	return DRFLAC_FALSE;
2661	}
2662
2663	if (lo == `0x3E`) {
2664	return DRFLAC_TRUE;
2665	} else {
2666	if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & `7`)) {
2667	return DRFLAC_FALSE;
2668	}
2669	}
2670	}
2671	}
2672
2673	/ Should never get here. /
2674	/return DRFLAC_FALSE;/
2675	}
2676
2677
2678	#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
2679	#define DRFLAC_IMPLEMENT_CLZ_LZCNT
2680	#endif
2681	#if defined(_MSC_VER) && _MSC_VER >= 1400 && (defined(DRFLAC_X64) \|\| defined(DRFLAC_X86)) && !defined(__clang__)
2682	#define DRFLAC_IMPLEMENT_CLZ_MSVC
2683	#endif
2684	#if defined(__WATCOMC__) && defined(__386__)
2685	#define DRFLAC_IMPLEMENT_CLZ_WATCOM
2686	#endif
2687
2688	static DRFLAC_INLINE drflac_uint32 drflac__clz_software(drflac_cache_t x)
2689	{
2690	drflac_uint32 n;
2691	static drflac_uint32 clz_table_4[] = {
2692	`0`,
2693	`4`,
2694	`3`, `3`,
2695	`2`, `2`, `2`, `2`,
2696	`1`, `1`, `1`, `1`, `1`, `1`, `1`, `1`
2697	};
2698
2699	if (x == `0`) {
2700	return sizeof(x)*`8`;
2701	}
2702
2703	n = clz_table_4[x >> (sizeof(x)*`8` - `4`)];
2704	if (n == `0`) {
2705	#ifdef DRFLAC_64BIT
2706	if ((x & ((drflac_uint64)`0xFFFFFFFF` << `32`)) == `0`) { n = `32`; x <<= `32`; }
2707	if ((x & ((drflac_uint64)`0xFFFF0000` << `32`)) == `0`) { n += `16`; x <<= `16`; }
2708	if ((x & ((drflac_uint64)`0xFF000000` << `32`)) == `0`) { n += `8`; x <<= `8`; }
2709	if ((x & ((drflac_uint64)`0xF0000000` << `32`)) == `0`) { n += `4`; x <<= `4`; }
2710	#else
2711	if ((x & `0xFFFF0000`) == `0`) { n = `16`; x <<= `16`; }
2712	if ((x & `0xFF000000`) == `0`) { n += `8`; x <<= `8`; }
2713	if ((x & `0xF0000000`) == `0`) { n += `4`; x <<= `4`; }
2714	#endif
2715	n += clz_table_4[x >> (sizeof(x)*`8` - `4`)];
2716	}
2717
2718	return n - `1`;
2719	}
2720
2721	#ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
2722	static DRFLAC_INLINE drflac_bool32 drflac__is_lzcnt_supported(void)
2723	{
2724	/ Fast compile time check for ARM. /
2725	#if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
2726	return DRFLAC_TRUE;
2727	#else
2728	/ If the compiler itself does not support the intrinsic then we'll need to return false. /
2729	#ifdef DRFLAC_HAS_LZCNT_INTRINSIC
2730	return drflac__gIsLZCNTSupported;
2731	#else
2732	return DRFLAC_FALSE;
2733	#endif
2734	#endif
2735	}
2736
2737	static DRFLAC_INLINE drflac_uint32 drflac__clz_lzcnt(drflac_cache_t x)
2738	{
2739	/*
2740	It's critical for competitive decoding performance that this function be highly optimal. With MSVC we can use the __lzcnt64() and __lzcnt() intrinsics
2741	to achieve good performance, however on GCC and Clang it's a little bit more annoying. The __builtin_clzl() and __builtin_clzll() intrinsics leave
2742	it undefined as to the return value when `x` is 0. We need this to be well defined as returning 32 or 64, depending on whether or not it's a 32- or
2743	64-bit build. To work around this we would need to add a conditional to check for the x = 0 case, but this creates unnecessary inefficiency. To work
2744	around this problem I have written some inline assembly to emit the LZCNT (x86) or CLZ (ARM) instruction directly which removes the need to include
2745	the conditional. This has worked well in the past, but for some reason Clang's MSVC compatible driver, clang-cl, does not seem to be handling this
2746	in the same way as the normal Clang driver. It seems that `clang-cl` is just outputting the wrong results sometimes, maybe due to some register
2747	getting clobbered?
2748
2749	I'm not sure if this is a bug with dr_flac's inlined assembly (most likely), a bug in `clang-cl` or just a misunderstanding on my part with inline
2750	assembly rules for `clang-cl`. If somebody can identify an error in dr_flac's inlined assembly I'm happy to get that fixed.
2751
2752	Fortunately there is an easy workaround for this. Clang implements MSVC-specific intrinsics for compatibility. It also defines _MSC_VER for extra
2753	compatibility. We can therefore just check for _MSC_VER and use the MSVC intrinsic which, fortunately for us, Clang supports. It would still be nice
2754	to know how to fix the inlined assembly for correctness sake, however.
2755	*/
2756
2757	#if defined(_MSC_VER) /&& !defined(__clang__)/ /* <-- Intentionally wanting Clang to use the MSVC __lzcnt64/__lzcnt intrinsics due to above ^. */
2758	#ifdef DRFLAC_64BIT
2759	return (drflac_uint32)__lzcnt64(x);
2760	#else
2761	return (drflac_uint32)__lzcnt(x);
2762	#endif
2763	#else
2764	#if defined(__GNUC__) \|\| defined(__clang__)
2765	#if defined(DRFLAC_X64)
2766	{
2767	drflac_uint64 r;
2768	__asm__ __volatile__ (
2769	"lzcnt{ %1, %0\| %0, %1}" : "=r"(r) : "r"(x) : "cc"
2770	);
2771
2772	return (drflac_uint32)r;
2773	}
2774	#elif defined(DRFLAC_X86)
2775	{
2776	drflac_uint32 r;
2777	__asm__ __volatile__ (
2778	"lzcnt{l %1, %0\| %0, %1}" : "=r"(r) : "r"(x) : "cc"
2779	);
2780
2781	return r;
2782	}
2783	#elif defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5) && !defined(DRFLAC_64BIT) /* <-- I haven't tested 64-bit inline assembly, so only enabling this for the 32-bit build for now. */
2784	{
2785	unsigned int r;
2786	__asm__ __volatile__ (
2787	#if defined(DRFLAC_64BIT)
2788	"clz %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(x) / <-- This is untested. If someone in the community could test this, that would be appreciated! /
2789	#else
2790	"clz %[out], %[in]" : [out]"=r"(r) : [in]"r"(x)
2791	#endif
2792	);
2793
2794	return r;
2795	}
2796	#else
2797	if (x == `0`) {
2798	return sizeof(x)*`8`;
2799	}
2800	#ifdef DRFLAC_64BIT
2801	return (drflac_uint32)__builtin_clzll((drflac_uint64)x);
2802	#else
2803	return (drflac_uint32)__builtin_clzl((drflac_uint32)x);
2804	#endif
2805	#endif
2806	#else
2807	/ Unsupported compiler. /
2808	#error "This compiler does not support the lzcnt intrinsic."
2809	#endif
2810	#endif
2811	}
2812	#endif
2813
2814	#ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
2815	#include <intrin.h> /* For BitScanReverse(). */
2816
2817	static DRFLAC_INLINE drflac_uint32 drflac__clz_msvc(drflac_cache_t x)
2818	{
2819	drflac_uint32 n;
2820
2821	if (x == `0`) {
2822	return sizeof(x)*`8`;
2823	}
2824
2825	#ifdef DRFLAC_64BIT
2826	_BitScanReverse64((unsigned long*)&n, x);
2827	#else
2828	_BitScanReverse((unsigned long*)&n, x);
2829	#endif
2830	return sizeof(x)*`8` - n - `1`;
2831	}
2832	#endif
2833
2834	#ifdef DRFLAC_IMPLEMENT_CLZ_WATCOM
2835	static __inline drflac_uint32 drflac__clz_watcom (drflac_uint32);
2836	#pragma aux drflac__clz_watcom = \
2837	"bsr eax, eax" \
2838	"xor eax, 31" \
2839	parm [eax] nomemory \
2840	value [eax] \
2841	modify exact [eax] nomemory;
2842	#endif
2843
2844	static DRFLAC_INLINE drflac_uint32 drflac__clz(drflac_cache_t x)
2845	{
2846	#ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
2847	if (drflac__is_lzcnt_supported()) {
2848	return drflac__clz_lzcnt(x);
2849	} else
2850	#endif
2851	{
2852	#ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
2853	return drflac__clz_msvc(x);
2854	#elif defined(DRFLAC_IMPLEMENT_CLZ_WATCOM)
2855	return (x == `0`) ? sizeof(x)*`8` : drflac__clz_watcom(x);
2856	#else
2857	return drflac__clz_software(x);
2858	#endif
2859	}
2860	}
2861
2862
2863	static DRFLAC_INLINE drflac_bool32 drflac__seek_past_next_set_bit(drflac_bs* bs, unsigned int* pOffsetOut)
2864	{
2865	drflac_uint32 zeroCounter = `0`;
2866	drflac_uint32 setBitOffsetPlus1;
2867
2868	while (bs->cache == `0`) {
2869	zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2870	if (!drflac__reload_cache(bs)) {
2871	return DRFLAC_FALSE;
2872	}
2873	}
2874
2875	setBitOffsetPlus1 = drflac__clz(bs->cache);
2876	setBitOffsetPlus1 += `1`;
2877
2878	bs->consumedBits += setBitOffsetPlus1;
2879	bs->cache <<= setBitOffsetPlus1;
2880
2881	*pOffsetOut = zeroCounter + setBitOffsetPlus1 - `1`;
2882	return DRFLAC_TRUE;
2883	}
2884
2885
2886
2887	static drflac_bool32 drflac__seek_to_byte(drflac_bs* bs, drflac_uint64 offsetFromStart)
2888	{
2889	DRFLAC_ASSERT(bs != NULL);
2890	DRFLAC_ASSERT(offsetFromStart > `0`);
2891
2892	/*
2893	Seeking from the start is not quite as trivial as it sounds because the onSeek callback takes a signed 32-bit integer (which
2894	is intentional because it simplifies the implementation of the onSeek callbacks), however offsetFromStart is unsigned 64-bit.
2895	To resolve we just need to do an initial seek from the start, and then a series of offset seeks to make up the remainder.
2896	*/
2897	if (offsetFromStart > `0x7FFFFFFF`) {
2898	drflac_uint64 bytesRemaining = offsetFromStart;
2899	if (!bs->onSeek(bs->pUserData, `0x7FFFFFFF`, drflac_seek_origin_start)) {
2900	return DRFLAC_FALSE;
2901	}
2902	bytesRemaining -= `0x7FFFFFFF`;
2903
2904	while (bytesRemaining > `0x7FFFFFFF`) {
2905	if (!bs->onSeek(bs->pUserData, `0x7FFFFFFF`, drflac_seek_origin_current)) {
2906	return DRFLAC_FALSE;
2907	}
2908	bytesRemaining -= `0x7FFFFFFF`;
2909	}
2910
2911	if (bytesRemaining > `0`) {
2912	if (!bs->onSeek(bs->pUserData, (int)bytesRemaining, drflac_seek_origin_current)) {
2913	return DRFLAC_FALSE;
2914	}
2915	}
2916	} else {
2917	if (!bs->onSeek(bs->pUserData, (int)offsetFromStart, drflac_seek_origin_start)) {
2918	return DRFLAC_FALSE;
2919	}
2920	}
2921
2922	/ The cache should be reset to force a reload of fresh data from the client. /
2923	drflac__reset_cache(bs);
2924	return DRFLAC_TRUE;
2925	}
2926
2927
2928	static drflac_result drflac__read_utf8_coded_number(drflac_bs* bs, drflac_uint64* pNumberOut, drflac_uint8* pCRCOut)
2929	{
2930	drflac_uint8 crc;
2931	drflac_uint64 result;
2932	drflac_uint8 utf8[`7`] = {`0`};
2933	int byteCount;
2934	int i;
2935
2936	DRFLAC_ASSERT(bs != NULL);
2937	DRFLAC_ASSERT(pNumberOut != NULL);
2938	DRFLAC_ASSERT(pCRCOut != NULL);
2939
2940	crc = *pCRCOut;
2941
2942	if (!drflac__read_uint8(bs, `8`, utf8)) {
2943	*pNumberOut = `0`;
2944	return DRFLAC_AT_END;
2945	}
2946	crc = drflac_crc8(crc, utf8[`0`], `8`);
2947
2948	if ((utf8[`0`] & `0x80`) == `0`) {
2949	*pNumberOut = utf8[`0`];
2950	*pCRCOut = crc;
2951	return DRFLAC_SUCCESS;
2952	}
2953
2954	/byteCount = 1;/
2955	if ((utf8[`0`] & `0xE0`) == `0xC0`) {
2956	byteCount = `2`;
2957	} else if ((utf8[`0`] & `0xF0`) == `0xE0`) {
2958	byteCount = `3`;
2959	} else if ((utf8[`0`] & `0xF8`) == `0xF0`) {
2960	byteCount = `4`;
2961	} else if ((utf8[`0`] & `0xFC`) == `0xF8`) {
2962	byteCount = `5`;
2963	} else if ((utf8[`0`] & `0xFE`) == `0xFC`) {
2964	byteCount = `6`;
2965	} else if ((utf8[`0`] & `0xFF`) == `0xFE`) {
2966	byteCount = `7`;
2967	} else {
2968	*pNumberOut = `0`;
2969	return DRFLAC_CRC_MISMATCH; / Bad UTF-8 encoding. /
2970	}
2971
2972	/ Read extra bytes. /
2973	DRFLAC_ASSERT(byteCount > `1`);
2974
2975	result = (drflac_uint64)(utf8[`0`] & (`0xFF` >> (byteCount + `1`)));
2976	for (i = `1`; i < byteCount; ++i) {
2977	if (!drflac__read_uint8(bs, `8`, utf8 + i)) {
2978	*pNumberOut = `0`;
2979	return DRFLAC_AT_END;
2980	}
2981	crc = drflac_crc8(crc, utf8[i], `8`);
2982
2983	result = (result << `6`) \| (utf8[i] & `0x3F`);
2984	}
2985
2986	*pNumberOut = result;
2987	*pCRCOut = crc;
2988	return DRFLAC_SUCCESS;
2989	}
2990
2991
2992
2993	/*
2994	The next two functions are responsible for calculating the prediction.
2995
2996	When the bits per sample is >16 we need to use 64-bit integer arithmetic because otherwise we'll run out of precision. It's
2997	safe to assume this will be slower on 32-bit platforms so we use a more optimal solution when the bits per sample is <=16.
2998	*/
2999	static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_32(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
3000	{
3001	drflac_int32 prediction = `0`;
3002
3003	DRFLAC_ASSERT(order <= `32`);
3004
3005	/ 32-bit version. /
3006
3007	/ VC++ optimizes this to a single jmp. I've not yet verified this for other compilers. /
3008	switch (order)
3009	{
3010	case `32`: prediction += coefficients[`31`] * pDecodedSamples[-`32`];
3011	case `31`: prediction += coefficients[`30`] * pDecodedSamples[-`31`];
3012	case `30`: prediction += coefficients[`29`] * pDecodedSamples[-`30`];
3013	case `29`: prediction += coefficients[`28`] * pDecodedSamples[-`29`];
3014	case `28`: prediction += coefficients[`27`] * pDecodedSamples[-`28`];
3015	case `27`: prediction += coefficients[`26`] * pDecodedSamples[-`27`];
3016	case `26`: prediction += coefficients[`25`] * pDecodedSamples[-`26`];
3017	case `25`: prediction += coefficients[`24`] * pDecodedSamples[-`25`];
3018	case `24`: prediction += coefficients[`23`] * pDecodedSamples[-`24`];
3019	case `23`: prediction += coefficients[`22`] * pDecodedSamples[-`23`];
3020	case `22`: prediction += coefficients[`21`] * pDecodedSamples[-`22`];
3021	case `21`: prediction += coefficients[`20`] * pDecodedSamples[-`21`];
3022	case `20`: prediction += coefficients[`19`] * pDecodedSamples[-`20`];
3023	case `19`: prediction += coefficients[`18`] * pDecodedSamples[-`19`];
3024	case `18`: prediction += coefficients[`17`] * pDecodedSamples[-`18`];
3025	case `17`: prediction += coefficients[`16`] * pDecodedSamples[-`17`];
3026	case `16`: prediction += coefficients[`15`] * pDecodedSamples[-`16`];
3027	case `15`: prediction += coefficients[`14`] * pDecodedSamples[-`15`];
3028	case `14`: prediction += coefficients[`13`] * pDecodedSamples[-`14`];
3029	case `13`: prediction += coefficients[`12`] * pDecodedSamples[-`13`];
3030	case `12`: prediction += coefficients[`11`] * pDecodedSamples[-`12`];
3031	case `11`: prediction += coefficients[`10`] * pDecodedSamples[-`11`];
3032	case `10`: prediction += coefficients[ `9`] * pDecodedSamples[-`10`];
3033	case `9`: prediction += coefficients[ `8`] * pDecodedSamples[- `9`];
3034	case `8`: prediction += coefficients[ `7`] * pDecodedSamples[- `8`];
3035	case `7`: prediction += coefficients[ `6`] * pDecodedSamples[- `7`];
3036	case `6`: prediction += coefficients[ `5`] * pDecodedSamples[- `6`];
3037	case `5`: prediction += coefficients[ `4`] * pDecodedSamples[- `5`];
3038	case `4`: prediction += coefficients[ `3`] * pDecodedSamples[- `4`];
3039	case `3`: prediction += coefficients[ `2`] * pDecodedSamples[- `3`];
3040	case `2`: prediction += coefficients[ `1`] * pDecodedSamples[- `2`];
3041	case `1`: prediction += coefficients[ `0`] * pDecodedSamples[- `1`];
3042	}
3043
3044	return (drflac_int32)(prediction >> shift);
3045	}
3046
3047	static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_64(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
3048	{
3049	drflac_int64 prediction;
3050
3051	DRFLAC_ASSERT(order <= `32`);
3052
3053	/ 64-bit version. /
3054
3055	/ This method is faster on the 32-bit build when compiling with VC++. See note below. /
3056	#ifndef DRFLAC_64BIT
3057	if (order == `8`)
3058	{
3059	prediction = coefficients[`0`] * (drflac_int64)pDecodedSamples[-`1`];
3060	prediction += coefficients[`1`] * (drflac_int64)pDecodedSamples[-`2`];
3061	prediction += coefficients[`2`] * (drflac_int64)pDecodedSamples[-`3`];
3062	prediction += coefficients[`3`] * (drflac_int64)pDecodedSamples[-`4`];
3063	prediction += coefficients[`4`] * (drflac_int64)pDecodedSamples[-`5`];
3064	prediction += coefficients[`5`] * (drflac_int64)pDecodedSamples[-`6`];
3065	prediction += coefficients[`6`] * (drflac_int64)pDecodedSamples[-`7`];
3066	prediction += coefficients[`7`] * (drflac_int64)pDecodedSamples[-`8`];
3067	}
3068	else if (order == `7`)
3069	{
3070	prediction = coefficients[`0`] * (drflac_int64)pDecodedSamples[-`1`];
3071	prediction += coefficients[`1`] * (drflac_int64)pDecodedSamples[-`2`];
3072	prediction += coefficients[`2`] * (drflac_int64)pDecodedSamples[-`3`];
3073	prediction += coefficients[`3`] * (drflac_int64)pDecodedSamples[-`4`];
3074	prediction += coefficients[`4`] * (drflac_int64)pDecodedSamples[-`5`];
3075	prediction += coefficients[`5`] * (drflac_int64)pDecodedSamples[-`6`];
3076	prediction += coefficients[`6`] * (drflac_int64)pDecodedSamples[-`7`];
3077	}
3078	else if (order == `3`)
3079	{
3080	prediction = coefficients[`0`] * (drflac_int64)pDecodedSamples[-`1`];
3081	prediction += coefficients[`1`] * (drflac_int64)pDecodedSamples[-`2`];
3082	prediction += coefficients[`2`] * (drflac_int64)pDecodedSamples[-`3`];
3083	}
3084	else if (order == `6`)
3085	{
3086	prediction = coefficients[`0`] * (drflac_int64)pDecodedSamples[-`1`];
3087	prediction += coefficients[`1`] * (drflac_int64)pDecodedSamples[-`2`];
3088	prediction += coefficients[`2`] * (drflac_int64)pDecodedSamples[-`3`];
3089	prediction += coefficients[`3`] * (drflac_int64)pDecodedSamples[-`4`];
3090	prediction += coefficients[`4`] * (drflac_int64)pDecodedSamples[-`5`];
3091	prediction += coefficients[`5`] * (drflac_int64)pDecodedSamples[-`6`];
3092	}
3093	else if (order == `5`)
3094	{
3095	prediction = coefficients[`0`] * (drflac_int64)pDecodedSamples[-`1`];
3096	prediction += coefficients[`1`] * (drflac_int64)pDecodedSamples[-`2`];
3097	prediction += coefficients[`2`] * (drflac_int64)pDecodedSamples[-`3`];
3098	prediction += coefficients[`3`] * (drflac_int64)pDecodedSamples[-`4`];
3099	prediction += coefficients[`4`] * (drflac_int64)pDecodedSamples[-`5`];
3100	}
3101	else if (order == `4`)
3102	{
3103	prediction = coefficients[`0`] * (drflac_int64)pDecodedSamples[-`1`];
3104	prediction += coefficients[`1`] * (drflac_int64)pDecodedSamples[-`2`];
3105	prediction += coefficients[`2`] * (drflac_int64)pDecodedSamples[-`3`];
3106	prediction += coefficients[`3`] * (drflac_int64)pDecodedSamples[-`4`];
3107	}
3108	else if (order == `12`)
3109	{
3110	prediction = coefficients[`0`] * (drflac_int64)pDecodedSamples[-`1`];
3111	prediction += coefficients[`1`] * (drflac_int64)pDecodedSamples[-`2`];
3112	prediction += coefficients[`2`] * (drflac_int64)pDecodedSamples[-`3`];
3113	prediction += coefficients[`3`] * (drflac_int64)pDecodedSamples[-`4`];
3114	prediction += coefficients[`4`] * (drflac_int64)pDecodedSamples[-`5`];
3115	prediction += coefficients[`5`] * (drflac_int64)pDecodedSamples[-`6`];
3116	prediction += coefficients[`6`] * (drflac_int64)pDecodedSamples[-`7`];
3117	prediction += coefficients[`7`] * (drflac_int64)pDecodedSamples[-`8`];
3118	prediction += coefficients[`8`] * (drflac_int64)pDecodedSamples[-`9`];
3119	prediction += coefficients[`9`] * (drflac_int64)pDecodedSamples[-`10`];
3120	prediction += coefficients[`10`] * (drflac_int64)pDecodedSamples[-`11`];
3121	prediction += coefficients[`11`] * (drflac_int64)pDecodedSamples[-`12`];
3122	}
3123	else if (order == `2`)
3124	{
3125	prediction = coefficients[`0`] * (drflac_int64)pDecodedSamples[-`1`];
3126	prediction += coefficients[`1`] * (drflac_int64)pDecodedSamples[-`2`];
3127	}
3128	else if (order == `1`)
3129	{
3130	prediction = coefficients[`0`] * (drflac_int64)pDecodedSamples[-`1`];
3131	}
3132	else if (order == `10`)
3133	{
3134	prediction = coefficients[`0`] * (drflac_int64)pDecodedSamples[-`1`];
3135	prediction += coefficients[`1`] * (drflac_int64)pDecodedSamples[-`2`];
3136	prediction += coefficients[`2`] * (drflac_int64)pDecodedSamples[-`3`];
3137	prediction += coefficients[`3`] * (drflac_int64)pDecodedSamples[-`4`];
3138	prediction += coefficients[`4`] * (drflac_int64)pDecodedSamples[-`5`];
3139	prediction += coefficients[`5`] * (drflac_int64)pDecodedSamples[-`6`];
3140	prediction += coefficients[`6`] * (drflac_int64)pDecodedSamples[-`7`];
3141	prediction += coefficients[`7`] * (drflac_int64)pDecodedSamples[-`8`];
3142	prediction += coefficients[`8`] * (drflac_int64)pDecodedSamples[-`9`];
3143	prediction += coefficients[`9`] * (drflac_int64)pDecodedSamples[-`10`];
3144	}
3145	else if (order == `9`)
3146	{
3147	prediction = coefficients[`0`] * (drflac_int64)pDecodedSamples[-`1`];
3148	prediction += coefficients[`1`] * (drflac_int64)pDecodedSamples[-`2`];
3149	prediction += coefficients[`2`] * (drflac_int64)pDecodedSamples[-`3`];
3150	prediction += coefficients[`3`] * (drflac_int64)pDecodedSamples[-`4`];
3151	prediction += coefficients[`4`] * (drflac_int64)pDecodedSamples[-`5`];
3152	prediction += coefficients[`5`] * (drflac_int64)pDecodedSamples[-`6`];
3153	prediction += coefficients[`6`] * (drflac_int64)pDecodedSamples[-`7`];
3154	prediction += coefficients[`7`] * (drflac_int64)pDecodedSamples[-`8`];
3155	prediction += coefficients[`8`] * (drflac_int64)pDecodedSamples[-`9`];
3156	}
3157	else if (order == `11`)
3158	{
3159	prediction = coefficients[`0`] * (drflac_int64)pDecodedSamples[-`1`];
3160	prediction += coefficients[`1`] * (drflac_int64)pDecodedSamples[-`2`];
3161	prediction += coefficients[`2`] * (drflac_int64)pDecodedSamples[-`3`];
3162	prediction += coefficients[`3`] * (drflac_int64)pDecodedSamples[-`4`];
3163	prediction += coefficients[`4`] * (drflac_int64)pDecodedSamples[-`5`];
3164	prediction += coefficients[`5`] * (drflac_int64)pDecodedSamples[-`6`];
3165	prediction += coefficients[`6`] * (drflac_int64)pDecodedSamples[-`7`];
3166	prediction += coefficients[`7`] * (drflac_int64)pDecodedSamples[-`8`];
3167	prediction += coefficients[`8`] * (drflac_int64)pDecodedSamples[-`9`];
3168	prediction += coefficients[`9`] * (drflac_int64)pDecodedSamples[-`10`];
3169	prediction += coefficients[`10`] * (drflac_int64)pDecodedSamples[-`11`];
3170	}
3171	else
3172	{
3173	int j;
3174
3175	prediction = `0`;
3176	for (j = `0`; j < (int)order; ++j) {
3177	prediction += coefficients[j] * (drflac_int64)pDecodedSamples[-j-`1`];
3178	}
3179	}
3180	#endif
3181
3182	/*
3183	VC++ optimizes this to a single jmp instruction, but only the 64-bit build. The 32-bit build generates less efficient code for some
3184	reason. The ugly version above is faster so we'll just switch between the two depending on the target platform.
3185	*/
3186	#ifdef DRFLAC_64BIT
3187	prediction = `0`;
3188	switch (order)
3189	{
3190	case `32`: prediction += coefficients[`31`] * (drflac_int64)pDecodedSamples[-`32`];
3191	case `31`: prediction += coefficients[`30`] * (drflac_int64)pDecodedSamples[-`31`];
3192	case `30`: prediction += coefficients[`29`] * (drflac_int64)pDecodedSamples[-`30`];
3193	case `29`: prediction += coefficients[`28`] * (drflac_int64)pDecodedSamples[-`29`];
3194	case `28`: prediction += coefficients[`27`] * (drflac_int64)pDecodedSamples[-`28`];
3195	case `27`: prediction += coefficients[`26`] * (drflac_int64)pDecodedSamples[-`27`];
3196	case `26`: prediction += coefficients[`25`] * (drflac_int64)pDecodedSamples[-`26`];
3197	case `25`: prediction += coefficients[`24`] * (drflac_int64)pDecodedSamples[-`25`];
3198	case `24`: prediction += coefficients[`23`] * (drflac_int64)pDecodedSamples[-`24`];
3199	case `23`: prediction += coefficients[`22`] * (drflac_int64)pDecodedSamples[-`23`];
3200	case `22`: prediction += coefficients[`21`] * (drflac_int64)pDecodedSamples[-`22`];
3201	case `21`: prediction += coefficients[`20`] * (drflac_int64)pDecodedSamples[-`21`];
3202	case `20`: prediction += coefficients[`19`] * (drflac_int64)pDecodedSamples[-`20`];
3203	case `19`: prediction += coefficients[`18`] * (drflac_int64)pDecodedSamples[-`19`];
3204	case `18`: prediction += coefficients[`17`] * (drflac_int64)pDecodedSamples[-`18`];
3205	case `17`: prediction += coefficients[`16`] * (drflac_int64)pDecodedSamples[-`17`];
3206	case `16`: prediction += coefficients[`15`] * (drflac_int64)pDecodedSamples[-`16`];
3207	case `15`: prediction += coefficients[`14`] * (drflac_int64)pDecodedSamples[-`15`];
3208	case `14`: prediction += coefficients[`13`] * (drflac_int64)pDecodedSamples[-`14`];
3209	case `13`: prediction += coefficients[`12`] * (drflac_int64)pDecodedSamples[-`13`];
3210	case `12`: prediction += coefficients[`11`] * (drflac_int64)pDecodedSamples[-`12`];
3211	case `11`: prediction += coefficients[`10`] * (drflac_int64)pDecodedSamples[-`11`];
3212	case `10`: prediction += coefficients[ `9`] * (drflac_int64)pDecodedSamples[-`10`];
3213	case `9`: prediction += coefficients[ `8`] * (drflac_int64)pDecodedSamples[- `9`];
3214	case `8`: prediction += coefficients[ `7`] * (drflac_int64)pDecodedSamples[- `8`];
3215	case `7`: prediction += coefficients[ `6`] * (drflac_int64)pDecodedSamples[- `7`];
3216	case `6`: prediction += coefficients[ `5`] * (drflac_int64)pDecodedSamples[- `6`];
3217	case `5`: prediction += coefficients[ `4`] * (drflac_int64)pDecodedSamples[- `5`];
3218	case `4`: prediction += coefficients[ `3`] * (drflac_int64)pDecodedSamples[- `4`];
3219	case `3`: prediction += coefficients[ `2`] * (drflac_int64)pDecodedSamples[- `3`];
3220	case `2`: prediction += coefficients[ `1`] * (drflac_int64)pDecodedSamples[- `2`];
3221	case `1`: prediction += coefficients[ `0`] * (drflac_int64)pDecodedSamples[- `1`];
3222	}
3223	#endif
3224
3225	return (drflac_int32)(prediction >> shift);
3226	}
3227
3228
3229	#if 0
3230	/*
3231	Reference implementation for reading and decoding samples with residual. This is intentionally left unoptimized for the
3232	sake of readability and should only be used as a reference.
3233	*/
3234	static drflac_bool32 drflac__decode_samples_with_residual__rice__reference(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3235	{
3236	drflac_uint32 i;
3237
3238	DRFLAC_ASSERT(bs != NULL);
3239	DRFLAC_ASSERT(pSamplesOut != NULL);
3240
3241	for (i = `0`; i < count; ++i) {
3242	drflac_uint32 zeroCounter = `0`;
3243	for (;;) {
3244	drflac_uint8 bit;
3245	if (!drflac__read_uint8(bs, `1`, &bit)) {
3246	return DRFLAC_FALSE;
3247	}
3248
3249	if (bit == `0`) {
3250	zeroCounter += `1`;
3251	} else {
3252	break;
3253	}
3254	}
3255
3256	drflac_uint32 decodedRice;
3257	if (riceParam > `0`) {
3258	if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
3259	return DRFLAC_FALSE;
3260	}
3261	} else {
3262	decodedRice = `0`;
3263	}
3264
3265	decodedRice \|= (zeroCounter << riceParam);
3266	if ((decodedRice & `0x01`)) {
3267	decodedRice = ~(decodedRice >> `1`);
3268	} else {
3269	decodedRice = (decodedRice >> `1`);
3270	}
3271
3272
3273	if (bitsPerSample+shift >= `32`) {
3274	pSamplesOut[i] = decodedRice + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + i);
3275	} else {
3276	pSamplesOut[i] = decodedRice + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + i);
3277	}
3278	}
3279
3280	return DRFLAC_TRUE;
3281	}
3282	#endif
3283
3284	#if 0
3285	static drflac_bool32 drflac__read_rice_parts__reference(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3286	{
3287	drflac_uint32 zeroCounter = `0`;
3288	drflac_uint32 decodedRice;
3289
3290	for (;;) {
3291	drflac_uint8 bit;
3292	if (!drflac__read_uint8(bs, `1`, &bit)) {
3293	return DRFLAC_FALSE;
3294	}
3295
3296	if (bit == `0`) {
3297	zeroCounter += `1`;
3298	} else {
3299	break;
3300	}
3301	}
3302
3303	if (riceParam > `0`) {
3304	if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
3305	return DRFLAC_FALSE;
3306	}
3307	} else {
3308	decodedRice = `0`;
3309	}
3310
3311	*pZeroCounterOut = zeroCounter;
3312	*pRiceParamPartOut = decodedRice;
3313	return DRFLAC_TRUE;
3314	}
3315	#endif
3316
3317	#if 0
3318	static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3319	{
3320	drflac_cache_t riceParamMask;
3321	drflac_uint32 zeroCounter;
3322	drflac_uint32 setBitOffsetPlus1;
3323	drflac_uint32 riceParamPart;
3324	drflac_uint32 riceLength;
3325
3326	DRFLAC_ASSERT(riceParam > `0`); / <-- riceParam should never be 0. drflac__read_rice_parts__param_equals_zero() should be used instead for this case. /
3327
3328	riceParamMask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParam);
3329
3330	zeroCounter = `0`;
3331	while (bs->cache == `0`) {
3332	zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
3333	if (!drflac__reload_cache(bs)) {
3334	return DRFLAC_FALSE;
3335	}
3336	}
3337
3338	setBitOffsetPlus1 = drflac__clz(bs->cache);
3339	zeroCounter += setBitOffsetPlus1;
3340	setBitOffsetPlus1 += `1`;
3341
3342	riceLength = setBitOffsetPlus1 + riceParam;
3343	if (riceLength < DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3344	riceParamPart = (drflac_uint32)((bs->cache & (riceParamMask >> setBitOffsetPlus1)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceLength));
3345
3346	bs->consumedBits += riceLength;
3347	bs->cache <<= riceLength;
3348	} else {
3349	drflac_uint32 bitCountLo;
3350	drflac_cache_t resultHi;
3351
3352	bs->consumedBits += riceLength;
3353	bs->cache <<= setBitOffsetPlus1 & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-`1`); / <-- Equivalent to "if (setBitOffsetPlus1 < DRFLAC_CACHE_L1_SIZE_BITS(bs)) { bs->cache <<= setBitOffsetPlus1; }" /
3354
3355	/ It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. /
3356	bitCountLo = bs->consumedBits - DRFLAC_CACHE_L1_SIZE_BITS(bs);
3357	resultHi = DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, riceParam); / <-- Use DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE() if ever this function allows riceParam=0. /
3358
3359	if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3360	#ifndef DR_FLAC_NO_CRC
3361	drflac__update_crc16(bs);
3362	#endif
3363	bs->cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3364	bs->consumedBits = `0`;
3365	#ifndef DR_FLAC_NO_CRC
3366	bs->crc16Cache = bs->cache;
3367	#endif
3368	} else {
3369	/ Slow path. We need to fetch more data from the client. /
3370	if (!drflac__reload_cache(bs)) {
3371	return DRFLAC_FALSE;
3372	}
3373	}
3374
3375	riceParamPart = (drflac_uint32)(resultHi \| DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, bitCountLo));
3376
3377	bs->consumedBits += bitCountLo;
3378	bs->cache <<= bitCountLo;
3379	}
3380
3381	pZeroCounterOut[`0`] = zeroCounter;
3382	pRiceParamPartOut[`0`] = riceParamPart;
3383
3384	return DRFLAC_TRUE;
3385	}
3386	#endif
3387
3388	static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts_x1(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3389	{
3390	drflac_uint32 riceParamPlus1 = riceParam + `1`;
3391	/drflac_cache_t riceParamPlus1Mask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParamPlus1);/
3392	drflac_uint32 riceParamPlus1Shift = DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPlus1);
3393	drflac_uint32 riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
3394
3395	/*
3396	The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
3397	no idea how this will work in practice...
3398	*/
3399	drflac_cache_t bs_cache = bs->cache;
3400	drflac_uint32 bs_consumedBits = bs->consumedBits;
3401
3402	/ The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. /
3403	drflac_uint32 lzcount = drflac__clz(bs_cache);
3404	if (lzcount < sizeof(bs_cache)*`8`) {
3405	pZeroCounterOut[`0`] = lzcount;
3406
3407	/*
3408	It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
3409	this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
3410	outside of this function at a higher level.
3411	*/
3412	extract_rice_param_part:
3413	bs_cache <<= lzcount;
3414	bs_consumedBits += lzcount;
3415
3416	if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
3417	/ Getting here means the rice parameter part is wholly contained within the current cache line. /
3418	pRiceParamPartOut[`0`] = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
3419	bs_cache <<= riceParamPlus1;
3420	bs_consumedBits += riceParamPlus1;
3421	} else {
3422	drflac_uint32 riceParamPartHi;
3423	drflac_uint32 riceParamPartLo;
3424	drflac_uint32 riceParamPartLoBitCount;
3425
3426	/*
3427	Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
3428	line, reload the cache, and then combine it with the head of the next cache line.
3429	*/
3430
3431	/ Grab the high part of the rice parameter part. /
3432	riceParamPartHi = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
3433
3434	/ Before reloading the cache we need to grab the size in bits of the low part. /
3435	riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
3436	DRFLAC_ASSERT(riceParamPartLoBitCount > `0` && riceParamPartLoBitCount < `32`);
3437
3438	/ Now reload the cache. /
3439	if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3440	#ifndef DR_FLAC_NO_CRC
3441	drflac__update_crc16(bs);
3442	#endif
3443	bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3444	bs_consumedBits = riceParamPartLoBitCount;
3445	#ifndef DR_FLAC_NO_CRC
3446	bs->crc16Cache = bs_cache;
3447	#endif
3448	} else {
3449	/ Slow path. We need to fetch more data from the client. /
3450	if (!drflac__reload_cache(bs)) {
3451	return DRFLAC_FALSE;
3452	}
3453
3454	bs_cache = bs->cache;
3455	bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
3456	}
3457
3458	/ We should now have enough information to construct the rice parameter part. /
3459	riceParamPartLo = (drflac_uint32)(bs_cache >> (DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPartLoBitCount)));
3460	pRiceParamPartOut[`0`] = riceParamPartHi \| riceParamPartLo;
3461
3462	bs_cache <<= riceParamPartLoBitCount;
3463	}
3464	} else {
3465	/*
3466	Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
3467	to drflac__clz() and we need to reload the cache.
3468	*/
3469	drflac_uint32 zeroCounter = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BITS(bs) - bs_consumedBits);
3470	for (;;) {
3471	if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3472	#ifndef DR_FLAC_NO_CRC
3473	drflac__update_crc16(bs);
3474	#endif
3475	bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3476	bs_consumedBits = `0`;
3477	#ifndef DR_FLAC_NO_CRC
3478	bs->crc16Cache = bs_cache;
3479	#endif
3480	} else {
3481	/ Slow path. We need to fetch more data from the client. /
3482	if (!drflac__reload_cache(bs)) {
3483	return DRFLAC_FALSE;
3484	}
3485
3486	bs_cache = bs->cache;
3487	bs_consumedBits = bs->consumedBits;
3488	}
3489
3490	lzcount = drflac__clz(bs_cache);
3491	zeroCounter += lzcount;
3492
3493	if (lzcount < sizeof(bs_cache)*`8`) {
3494	break;
3495	}
3496	}
3497
3498	pZeroCounterOut[`0`] = zeroCounter;
3499	goto extract_rice_param_part;
3500	}
3501
3502	/ Make sure the cache is restored at the end of it all. /
3503	bs->cache = bs_cache;
3504	bs->consumedBits = bs_consumedBits;
3505
3506	return DRFLAC_TRUE;
3507	}
3508
3509	static DRFLAC_INLINE drflac_bool32 drflac__seek_rice_parts(drflac_bs* bs, drflac_uint8 riceParam)
3510	{
3511	drflac_uint32 riceParamPlus1 = riceParam + `1`;
3512	drflac_uint32 riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
3513
3514	/*
3515	The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
3516	no idea how this will work in practice...
3517	*/
3518	drflac_cache_t bs_cache = bs->cache;
3519	drflac_uint32 bs_consumedBits = bs->consumedBits;
3520
3521	/ The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. /
3522	drflac_uint32 lzcount = drflac__clz(bs_cache);
3523	if (lzcount < sizeof(bs_cache)*`8`) {
3524	/*
3525	It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
3526	this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
3527	outside of this function at a higher level.
3528	*/
3529	extract_rice_param_part:
3530	bs_cache <<= lzcount;
3531	bs_consumedBits += lzcount;
3532
3533	if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
3534	/ Getting here means the rice parameter part is wholly contained within the current cache line. /
3535	bs_cache <<= riceParamPlus1;
3536	bs_consumedBits += riceParamPlus1;
3537	} else {
3538	/*
3539	Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
3540	line, reload the cache, and then combine it with the head of the next cache line.
3541	*/
3542
3543	/ Before reloading the cache we need to grab the size in bits of the low part. /
3544	drflac_uint32 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
3545	DRFLAC_ASSERT(riceParamPartLoBitCount > `0` && riceParamPartLoBitCount < `32`);
3546
3547	/ Now reload the cache. /
3548	if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3549	#ifndef DR_FLAC_NO_CRC
3550	drflac__update_crc16(bs);
3551	#endif
3552	bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3553	bs_consumedBits = riceParamPartLoBitCount;
3554	#ifndef DR_FLAC_NO_CRC
3555	bs->crc16Cache = bs_cache;
3556	#endif
3557	} else {
3558	/ Slow path. We need to fetch more data from the client. /
3559	if (!drflac__reload_cache(bs)) {
3560	return DRFLAC_FALSE;
3561	}
3562
3563	bs_cache = bs->cache;
3564	bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
3565	}
3566
3567	bs_cache <<= riceParamPartLoBitCount;
3568	}
3569	} else {
3570	/*
3571	Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
3572	to drflac__clz() and we need to reload the cache.
3573	*/
3574	for (;;) {
3575	if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3576	#ifndef DR_FLAC_NO_CRC
3577	drflac__update_crc16(bs);
3578	#endif
3579	bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3580	bs_consumedBits = `0`;
3581	#ifndef DR_FLAC_NO_CRC
3582	bs->crc16Cache = bs_cache;
3583	#endif
3584	} else {
3585	/ Slow path. We need to fetch more data from the client. /
3586	if (!drflac__reload_cache(bs)) {
3587	return DRFLAC_FALSE;
3588	}
3589
3590	bs_cache = bs->cache;
3591	bs_consumedBits = bs->consumedBits;
3592	}
3593
3594	lzcount = drflac__clz(bs_cache);
3595	if (lzcount < sizeof(bs_cache)*`8`) {
3596	break;
3597	}
3598	}
3599
3600	goto extract_rice_param_part;
3601	}
3602
3603	/ Make sure the cache is restored at the end of it all. /
3604	bs->cache = bs_cache;
3605	bs->consumedBits = bs_consumedBits;
3606
3607	return DRFLAC_TRUE;
3608	}
3609
3610
3611	static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar_zeroorder(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3612	{
3613	drflac_uint32 t[`2`] = {`0x00000000`, `0xFFFFFFFF`};
3614	drflac_uint32 zeroCountPart0;
3615	drflac_uint32 riceParamPart0;
3616	drflac_uint32 riceParamMask;
3617	drflac_uint32 i;
3618
3619	DRFLAC_ASSERT(bs != NULL);
3620	DRFLAC_ASSERT(pSamplesOut != NULL);
3621
3622	(void)bitsPerSample;
3623	(void)order;
3624	(void)shift;
3625	(void)coefficients;
3626
3627	riceParamMask = (drflac_uint32)~((~`0UL`) << riceParam);
3628
3629	i = `0`;
3630	while (i < count) {
3631	/ Rice extraction. /
3632	if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
3633	return DRFLAC_FALSE;
3634	}
3635
3636	/ Rice reconstruction. /
3637	riceParamPart0 &= riceParamMask;
3638	riceParamPart0 \|= (zeroCountPart0 << riceParam);
3639	riceParamPart0 = (riceParamPart0 >> `1`) ^ t[riceParamPart0 & `0x01`];
3640
3641	pSamplesOut[i] = riceParamPart0;
3642
3643	i += `1`;
3644	}
3645
3646	return DRFLAC_TRUE;
3647	}
3648
3649	static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3650	{
3651	drflac_uint32 t[`2`] = {`0x00000000`, `0xFFFFFFFF`};
3652	drflac_uint32 zeroCountPart0 = `0`;
3653	drflac_uint32 zeroCountPart1 = `0`;
3654	drflac_uint32 zeroCountPart2 = `0`;
3655	drflac_uint32 zeroCountPart3 = `0`;
3656	drflac_uint32 riceParamPart0 = `0`;
3657	drflac_uint32 riceParamPart1 = `0`;
3658	drflac_uint32 riceParamPart2 = `0`;
3659	drflac_uint32 riceParamPart3 = `0`;
3660	drflac_uint32 riceParamMask;
3661	const drflac_int32* pSamplesOutEnd;
3662	drflac_uint32 i;
3663
3664	DRFLAC_ASSERT(bs != NULL);
3665	DRFLAC_ASSERT(pSamplesOut != NULL);
3666
3667	if (order == `0`) {
3668	return drflac__decode_samples_with_residual__rice__scalar_zeroorder(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
3669	}
3670
3671	riceParamMask = (drflac_uint32)~((~`0UL`) << riceParam);
3672	pSamplesOutEnd = pSamplesOut + (count & ~`3`);
3673
3674	if (bitsPerSample+shift > `32`) {
3675	while (pSamplesOut < pSamplesOutEnd) {
3676	/*
3677	Rice extraction. It's faster to do this one at a time against local variables than it is to use the x4 version
3678	against an array. Not sure why, but perhaps it's making more efficient use of registers?
3679	*/
3680	if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) \|\|
3681	!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) \|\|
3682	!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) \|\|
3683	!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
3684	return DRFLAC_FALSE;
3685	}
3686
3687	riceParamPart0 &= riceParamMask;
3688	riceParamPart1 &= riceParamMask;
3689	riceParamPart2 &= riceParamMask;
3690	riceParamPart3 &= riceParamMask;
3691
3692	riceParamPart0 \|= (zeroCountPart0 << riceParam);
3693	riceParamPart1 \|= (zeroCountPart1 << riceParam);
3694	riceParamPart2 \|= (zeroCountPart2 << riceParam);
3695	riceParamPart3 \|= (zeroCountPart3 << riceParam);
3696
3697	riceParamPart0 = (riceParamPart0 >> `1`) ^ t[riceParamPart0 & `0x01`];
3698	riceParamPart1 = (riceParamPart1 >> `1`) ^ t[riceParamPart1 & `0x01`];
3699	riceParamPart2 = (riceParamPart2 >> `1`) ^ t[riceParamPart2 & `0x01`];
3700	riceParamPart3 = (riceParamPart3 >> `1`) ^ t[riceParamPart3 & `0x01`];
3701
3702	pSamplesOut[`0`] = riceParamPart0 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + `0`);
3703	pSamplesOut[`1`] = riceParamPart1 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + `1`);
3704	pSamplesOut[`2`] = riceParamPart2 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + `2`);
3705	pSamplesOut[`3`] = riceParamPart3 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + `3`);
3706
3707	pSamplesOut += `4`;
3708	}
3709	} else {
3710	while (pSamplesOut < pSamplesOutEnd) {
3711	if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) \|\|
3712	!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) \|\|
3713	!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) \|\|
3714	!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
3715	return DRFLAC_FALSE;
3716	}
3717
3718	riceParamPart0 &= riceParamMask;
3719	riceParamPart1 &= riceParamMask;
3720	riceParamPart2 &= riceParamMask;
3721	riceParamPart3 &= riceParamMask;
3722
3723	riceParamPart0 \|= (zeroCountPart0 << riceParam);
3724	riceParamPart1 \|= (zeroCountPart1 << riceParam);
3725	riceParamPart2 \|= (zeroCountPart2 << riceParam);
3726	riceParamPart3 \|= (zeroCountPart3 << riceParam);
3727
3728	riceParamPart0 = (riceParamPart0 >> `1`) ^ t[riceParamPart0 & `0x01`];
3729	riceParamPart1 = (riceParamPart1 >> `1`) ^ t[riceParamPart1 & `0x01`];
3730	riceParamPart2 = (riceParamPart2 >> `1`) ^ t[riceParamPart2 & `0x01`];
3731	riceParamPart3 = (riceParamPart3 >> `1`) ^ t[riceParamPart3 & `0x01`];
3732
3733	pSamplesOut[`0`] = riceParamPart0 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + `0`);
3734	pSamplesOut[`1`] = riceParamPart1 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + `1`);
3735	pSamplesOut[`2`] = riceParamPart2 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + `2`);
3736	pSamplesOut[`3`] = riceParamPart3 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + `3`);
3737
3738	pSamplesOut += `4`;
3739	}
3740	}
3741
3742	i = (count & ~`3`);
3743	while (i < count) {
3744	/ Rice extraction. /
3745	if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
3746	return DRFLAC_FALSE;
3747	}
3748
3749	/ Rice reconstruction. /
3750	riceParamPart0 &= riceParamMask;
3751	riceParamPart0 \|= (zeroCountPart0 << riceParam);
3752	riceParamPart0 = (riceParamPart0 >> `1`) ^ t[riceParamPart0 & `0x01`];
3753	/riceParamPart0 = (riceParamPart0 >> 1) ^ (~(riceParamPart0 & 0x01) + 1);/
3754
3755	/ Sample reconstruction. /
3756	if (bitsPerSample+shift > `32`) {
3757	pSamplesOut[`0`] = riceParamPart0 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + `0`);
3758	} else {
3759	pSamplesOut[`0`] = riceParamPart0 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + `0`);
3760	}
3761
3762	i += `1`;
3763	pSamplesOut += `1`;
3764	}
3765
3766	return DRFLAC_TRUE;
3767	}
3768
3769	#if defined(DRFLAC_SUPPORT_SSE2)
3770	static DRFLAC_INLINE __m128i drflac__mm_packs_interleaved_epi32(__m128i a, __m128i b)
3771	{
3772	__m128i r;
3773
3774	/ Pack. /
3775	r = _mm_packs_epi32(a, b);
3776
3777	/ a3a2 a1a0 b3b2 b1b0 -> a3a2 b3b2 a1a0 b1b0 /
3778	r = _mm_shuffle_epi32(r, _MM_SHUFFLE(`3`, `1`, `2`, `0`));
3779
3780	/ a3a2 b3b2 a1a0 b1b0 -> a3b3 a2b2 a1b1 a0b0 /
3781	r = _mm_shufflehi_epi16(r, _MM_SHUFFLE(`3`, `1`, `2`, `0`));
3782	r = _mm_shufflelo_epi16(r, _MM_SHUFFLE(`3`, `1`, `2`, `0`));
3783
3784	return r;
3785	}
3786	#endif
3787
3788	#if defined(DRFLAC_SUPPORT_SSE41)
3789	static DRFLAC_INLINE __m128i drflac__mm_not_si128(__m128i a)
3790	{
3791	return _mm_xor_si128(a, _mm_cmpeq_epi32(_mm_setzero_si128(), _mm_setzero_si128()));
3792	}
3793
3794	static DRFLAC_INLINE __m128i drflac__mm_hadd_epi32(__m128i x)
3795	{
3796	__m128i x64 = _mm_add_epi32(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(`1`, `0`, `3`, `2`)));
3797	__m128i x32 = _mm_shufflelo_epi16(x64, _MM_SHUFFLE(`1`, `0`, `3`, `2`));
3798	return _mm_add_epi32(x64, x32);
3799	}
3800
3801	static DRFLAC_INLINE __m128i drflac__mm_hadd_epi64(__m128i x)
3802	{
3803	return _mm_add_epi64(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(`1`, `0`, `3`, `2`)));
3804	}
3805
3806	static DRFLAC_INLINE __m128i drflac__mm_srai_epi64(__m128i x, int count)
3807	{
3808	/*
3809	To simplify this we are assuming count < 32. This restriction allows us to work on a low side and a high side. The low side
3810	is shifted with zero bits, whereas the right side is shifted with sign bits.
3811	*/
3812	__m128i lo = _mm_srli_epi64(x, count);
3813	__m128i hi = _mm_srai_epi32(x, count);
3814
3815	hi = _mm_and_si128(hi, _mm_set_epi32(`0xFFFFFFFF`, `0`, `0xFFFFFFFF`, `0`)); / The high part needs to have the low part cleared. /
3816
3817	return _mm_or_si128(lo, hi);
3818	}
3819
3820	static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3821	{
3822	int i;
3823	drflac_uint32 riceParamMask;
3824	drflac_int32* pDecodedSamples = pSamplesOut;
3825	drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~`3`);
3826	drflac_uint32 zeroCountParts0 = `0`;
3827	drflac_uint32 zeroCountParts1 = `0`;
3828	drflac_uint32 zeroCountParts2 = `0`;
3829	drflac_uint32 zeroCountParts3 = `0`;
3830	drflac_uint32 riceParamParts0 = `0`;
3831	drflac_uint32 riceParamParts1 = `0`;
3832	drflac_uint32 riceParamParts2 = `0`;
3833	drflac_uint32 riceParamParts3 = `0`;
3834	__m128i coefficients128_0;
3835	__m128i coefficients128_4;
3836	__m128i coefficients128_8;
3837	__m128i samples128_0;
3838	__m128i samples128_4;
3839	__m128i samples128_8;
3840	__m128i riceParamMask128;
3841
3842	const drflac_uint32 t[`2`] = {`0x00000000`, `0xFFFFFFFF`};
3843
3844	riceParamMask = (drflac_uint32)~((~`0UL`) << riceParam);
3845	riceParamMask128 = _mm_set1_epi32(riceParamMask);
3846
3847	/ Pre-load. /
3848	coefficients128_0 = _mm_setzero_si128();
3849	coefficients128_4 = _mm_setzero_si128();
3850	coefficients128_8 = _mm_setzero_si128();
3851
3852	samples128_0 = _mm_setzero_si128();
3853	samples128_4 = _mm_setzero_si128();
3854	samples128_8 = _mm_setzero_si128();
3855
3856	/*
3857	Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
3858	what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results
3859	in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
3860	so I think there's opportunity for this to be simplified.
3861	*/
3862	#if 1
3863	{
3864	int runningOrder = order;
3865
3866	/ 0 - 3. /
3867	if (runningOrder >= `4`) {
3868	coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + `0`));
3869	samples128_0 = _mm_loadu_si128((const __m128i*)(pSamplesOut - `4`));
3870	runningOrder -= `4`;
3871	} else {
3872	switch (runningOrder) {
3873	case `3`: coefficients128_0 = _mm_set_epi32(`0`, coefficients[`2`], coefficients[`1`], coefficients[`0`]); samples128_0 = _mm_set_epi32(pSamplesOut[-`1`], pSamplesOut[-`2`], pSamplesOut[-`3`], `0`); break;
3874	case `2`: coefficients128_0 = _mm_set_epi32(`0`, `0`, coefficients[`1`], coefficients[`0`]); samples128_0 = _mm_set_epi32(pSamplesOut[-`1`], pSamplesOut[-`2`], `0`, `0`); break;
3875	case `1`: coefficients128_0 = _mm_set_epi32(`0`, `0`, `0`, coefficients[`0`]); samples128_0 = _mm_set_epi32(pSamplesOut[-`1`], `0`, `0`, `0`); break;
3876	}
3877	runningOrder = `0`;
3878	}
3879
3880	/ 4 - 7 /
3881	if (runningOrder >= `4`) {
3882	coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + `4`));
3883	samples128_4 = _mm_loadu_si128((const __m128i*)(pSamplesOut - `8`));
3884	runningOrder -= `4`;
3885	} else {
3886	switch (runningOrder) {
3887	case `3`: coefficients128_4 = _mm_set_epi32(`0`, coefficients[`6`], coefficients[`5`], coefficients[`4`]); samples128_4 = _mm_set_epi32(pSamplesOut[-`5`], pSamplesOut[-`6`], pSamplesOut[-`7`], `0`); break;
3888	case `2`: coefficients128_4 = _mm_set_epi32(`0`, `0`, coefficients[`5`], coefficients[`4`]); samples128_4 = _mm_set_epi32(pSamplesOut[-`5`], pSamplesOut[-`6`], `0`, `0`); break;
3889	case `1`: coefficients128_4 = _mm_set_epi32(`0`, `0`, `0`, coefficients[`4`]); samples128_4 = _mm_set_epi32(pSamplesOut[-`5`], `0`, `0`, `0`); break;
3890	}
3891	runningOrder = `0`;
3892	}
3893
3894	/ 8 - 11 /
3895	if (runningOrder == `4`) {
3896	coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + `8`));
3897	samples128_8 = _mm_loadu_si128((const __m128i*)(pSamplesOut - `12`));
3898	runningOrder -= `4`;
3899	} else {
3900	switch (runningOrder) {
3901	case `3`: coefficients128_8 = _mm_set_epi32(`0`, coefficients[`10`], coefficients[`9`], coefficients[`8`]); samples128_8 = _mm_set_epi32(pSamplesOut[-`9`], pSamplesOut[-`10`], pSamplesOut[-`11`], `0`); break;
3902	case `2`: coefficients128_8 = _mm_set_epi32(`0`, `0`, coefficients[`9`], coefficients[`8`]); samples128_8 = _mm_set_epi32(pSamplesOut[-`9`], pSamplesOut[-`10`], `0`, `0`); break;
3903	case `1`: coefficients128_8 = _mm_set_epi32(`0`, `0`, `0`, coefficients[`8`]); samples128_8 = _mm_set_epi32(pSamplesOut[-`9`], `0`, `0`, `0`); break;
3904	}
3905	runningOrder = `0`;
3906	}
3907
3908	/ Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. /
3909	coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(`0`, `1`, `2`, `3`));
3910	coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(`0`, `1`, `2`, `3`));
3911	coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(`0`, `1`, `2`, `3`));
3912	}
3913	#else
3914	/ This causes strict-aliasing warnings with GCC. /
3915	switch (order)
3916	{
3917	case `12`: ((drflac_int32)&coefficients128_8)[`0`] = coefficients[`11`]; ((drflac_int32)&samples128_8)[`0`] = pDecodedSamples[-`12`];
3918	case `11`: ((drflac_int32)&coefficients128_8)[`1`] = coefficients[`10`]; ((drflac_int32)&samples128_8)[`1`] = pDecodedSamples[-`11`];
3919	case `10`: ((drflac_int32)&coefficients128_8)[`2`] = coefficients[ `9`]; ((drflac_int32)&samples128_8)[`2`] = pDecodedSamples[-`10`];
3920	case `9`: ((drflac_int32)&coefficients128_8)[`3`] = coefficients[ `8`]; ((drflac_int32)&samples128_8)[`3`] = pDecodedSamples[- `9`];
3921	case `8`: ((drflac_int32)&coefficients128_4)[`0`] = coefficients[ `7`]; ((drflac_int32)&samples128_4)[`0`] = pDecodedSamples[- `8`];
3922	case `7`: ((drflac_int32)&coefficients128_4)[`1`] = coefficients[ `6`]; ((drflac_int32)&samples128_4)[`1`] = pDecodedSamples[- `7`];
3923	case `6`: ((drflac_int32)&coefficients128_4)[`2`] = coefficients[ `5`]; ((drflac_int32)&samples128_4)[`2`] = pDecodedSamples[- `6`];
3924	case `5`: ((drflac_int32)&coefficients128_4)[`3`] = coefficients[ `4`]; ((drflac_int32)&samples128_4)[`3`] = pDecodedSamples[- `5`];
3925	case `4`: ((drflac_int32)&coefficients128_0)[`0`] = coefficients[ `3`]; ((drflac_int32)&samples128_0)[`0`] = pDecodedSamples[- `4`];
3926	case `3`: ((drflac_int32)&coefficients128_0)[`1`] = coefficients[ `2`]; ((drflac_int32)&samples128_0)[`1`] = pDecodedSamples[- `3`];
3927	case `2`: ((drflac_int32)&coefficients128_0)[`2`] = coefficients[ `1`]; ((drflac_int32)&samples128_0)[`2`] = pDecodedSamples[- `2`];
3928	case `1`: ((drflac_int32)&coefficients128_0)[`3`] = coefficients[ `0`]; ((drflac_int32)&samples128_0)[`3`] = pDecodedSamples[- `1`];
3929	}
3930	#endif
3931
3932	/ For this version we are doing one sample at a time. /
3933	while (pDecodedSamples < pDecodedSamplesEnd) {
3934	__m128i prediction128;
3935	__m128i zeroCountPart128;
3936	__m128i riceParamPart128;
3937
3938	if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) \|\|
3939	!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) \|\|
3940	!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) \|\|
3941	!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
3942	return DRFLAC_FALSE;
3943	}
3944
3945	zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
3946	riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
3947
3948	riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
3949	riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
3950	riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, `1`), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(`0x01`))), _mm_set1_epi32(`0x01`))); / <-- SSE2 compatible /
3951	/riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_mullo_epi32(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01)), _mm_set1_epi32(0xFFFFFFFF)));/ / <-- Only supported from SSE4.1 and is slower in my testing... /
3952
3953	if (order <= `4`) {
3954	for (i = `0`; i < `4`; i += `1`) {
3955	prediction128 = _mm_mullo_epi32(coefficients128_0, samples128_0);
3956
3957	/ Horizontal add and shift. /
3958	prediction128 = drflac__mm_hadd_epi32(prediction128);
3959	prediction128 = _mm_srai_epi32(prediction128, shift);
3960	prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
3961
3962	samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, `4`);
3963	riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, `4`);
3964	}
3965	} else if (order <= `8`) {
3966	for (i = `0`; i < `4`; i += `1`) {
3967	prediction128 = _mm_mullo_epi32(coefficients128_4, samples128_4);
3968	prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
3969
3970	/ Horizontal add and shift. /
3971	prediction128 = drflac__mm_hadd_epi32(prediction128);
3972	prediction128 = _mm_srai_epi32(prediction128, shift);
3973	prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
3974
3975	samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, `4`);
3976	samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, `4`);
3977	riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, `4`);
3978	}
3979	} else {
3980	for (i = `0`; i < `4`; i += `1`) {
3981	prediction128 = _mm_mullo_epi32(coefficients128_8, samples128_8);
3982	prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_4, samples128_4));
3983	prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
3984
3985	/ Horizontal add and shift. /
3986	prediction128 = drflac__mm_hadd_epi32(prediction128);
3987	prediction128 = _mm_srai_epi32(prediction128, shift);
3988	prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
3989
3990	samples128_8 = _mm_alignr_epi8(samples128_4, samples128_8, `4`);
3991	samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, `4`);
3992	samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, `4`);
3993	riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, `4`);
3994	}
3995	}
3996
3997	/ We store samples in groups of 4. /
3998	_mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
3999	pDecodedSamples += `4`;
4000	}
4001
4002	/ Make sure we process the last few samples. /
4003	i = (count & ~`3`);
4004	while (i < (int)count) {
4005	/ Rice extraction. /
4006	if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
4007	return DRFLAC_FALSE;
4008	}
4009
4010	/ Rice reconstruction. /
4011	riceParamParts0 &= riceParamMask;
4012	riceParamParts0 \|= (zeroCountParts0 << riceParam);
4013	riceParamParts0 = (riceParamParts0 >> `1`) ^ t[riceParamParts0 & `0x01`];
4014
4015	/ Sample reconstruction. /
4016	pDecodedSamples[`0`] = riceParamParts0 + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
4017
4018	i += `1`;
4019	pDecodedSamples += `1`;
4020	}
4021
4022	return DRFLAC_TRUE;
4023	}
4024
4025	static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4026	{
4027	int i;
4028	drflac_uint32 riceParamMask;
4029	drflac_int32* pDecodedSamples = pSamplesOut;
4030	drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~`3`);
4031	drflac_uint32 zeroCountParts0 = `0`;
4032	drflac_uint32 zeroCountParts1 = `0`;
4033	drflac_uint32 zeroCountParts2 = `0`;
4034	drflac_uint32 zeroCountParts3 = `0`;
4035	drflac_uint32 riceParamParts0 = `0`;
4036	drflac_uint32 riceParamParts1 = `0`;
4037	drflac_uint32 riceParamParts2 = `0`;
4038	drflac_uint32 riceParamParts3 = `0`;
4039	__m128i coefficients128_0;
4040	__m128i coefficients128_4;
4041	__m128i coefficients128_8;
4042	__m128i samples128_0;
4043	__m128i samples128_4;
4044	__m128i samples128_8;
4045	__m128i prediction128;
4046	__m128i riceParamMask128;
4047
4048	const drflac_uint32 t[`2`] = {`0x00000000`, `0xFFFFFFFF`};
4049
4050	DRFLAC_ASSERT(order <= `12`);
4051
4052	riceParamMask = (drflac_uint32)~((~`0UL`) << riceParam);
4053	riceParamMask128 = _mm_set1_epi32(riceParamMask);
4054
4055	prediction128 = _mm_setzero_si128();
4056
4057	/ Pre-load. /
4058	coefficients128_0 = _mm_setzero_si128();
4059	coefficients128_4 = _mm_setzero_si128();
4060	coefficients128_8 = _mm_setzero_si128();
4061
4062	samples128_0 = _mm_setzero_si128();
4063	samples128_4 = _mm_setzero_si128();
4064	samples128_8 = _mm_setzero_si128();
4065
4066	#if 1
4067	{
4068	int runningOrder = order;
4069
4070	/ 0 - 3. /
4071	if (runningOrder >= `4`) {
4072	coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + `0`));
4073	samples128_0 = _mm_loadu_si128((const __m128i*)(pSamplesOut - `4`));
4074	runningOrder -= `4`;
4075	} else {
4076	switch (runningOrder) {
4077	case `3`: coefficients128_0 = _mm_set_epi32(`0`, coefficients[`2`], coefficients[`1`], coefficients[`0`]); samples128_0 = _mm_set_epi32(pSamplesOut[-`1`], pSamplesOut[-`2`], pSamplesOut[-`3`], `0`); break;
4078	case `2`: coefficients128_0 = _mm_set_epi32(`0`, `0`, coefficients[`1`], coefficients[`0`]); samples128_0 = _mm_set_epi32(pSamplesOut[-`1`], pSamplesOut[-`2`], `0`, `0`); break;
4079	case `1`: coefficients128_0 = _mm_set_epi32(`0`, `0`, `0`, coefficients[`0`]); samples128_0 = _mm_set_epi32(pSamplesOut[-`1`], `0`, `0`, `0`); break;
4080	}
4081	runningOrder = `0`;
4082	}
4083
4084	/ 4 - 7 /
4085	if (runningOrder >= `4`) {
4086	coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + `4`));
4087	samples128_4 = _mm_loadu_si128((const __m128i*)(pSamplesOut - `8`));
4088	runningOrder -= `4`;
4089	} else {
4090	switch (runningOrder) {
4091	case `3`: coefficients128_4 = _mm_set_epi32(`0`, coefficients[`6`], coefficients[`5`], coefficients[`4`]); samples128_4 = _mm_set_epi32(pSamplesOut[-`5`], pSamplesOut[-`6`], pSamplesOut[-`7`], `0`); break;
4092	case `2`: coefficients128_4 = _mm_set_epi32(`0`, `0`, coefficients[`5`], coefficients[`4`]); samples128_4 = _mm_set_epi32(pSamplesOut[-`5`], pSamplesOut[-`6`], `0`, `0`); break;
4093	case `1`: coefficients128_4 = _mm_set_epi32(`0`, `0`, `0`, coefficients[`4`]); samples128_4 = _mm_set_epi32(pSamplesOut[-`5`], `0`, `0`, `0`); break;
4094	}
4095	runningOrder = `0`;
4096	}
4097
4098	/ 8 - 11 /
4099	if (runningOrder == `4`) {
4100	coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + `8`));
4101	samples128_8 = _mm_loadu_si128((const __m128i*)(pSamplesOut - `12`));
4102	runningOrder -= `4`;
4103	} else {
4104	switch (runningOrder) {
4105	case `3`: coefficients128_8 = _mm_set_epi32(`0`, coefficients[`10`], coefficients[`9`], coefficients[`8`]); samples128_8 = _mm_set_epi32(pSamplesOut[-`9`], pSamplesOut[-`10`], pSamplesOut[-`11`], `0`); break;
4106	case `2`: coefficients128_8 = _mm_set_epi32(`0`, `0`, coefficients[`9`], coefficients[`8`]); samples128_8 = _mm_set_epi32(pSamplesOut[-`9`], pSamplesOut[-`10`], `0`, `0`); break;
4107	case `1`: coefficients128_8 = _mm_set_epi32(`0`, `0`, `0`, coefficients[`8`]); samples128_8 = _mm_set_epi32(pSamplesOut[-`9`], `0`, `0`, `0`); break;
4108	}
4109	runningOrder = `0`;
4110	}
4111
4112	/ Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. /
4113	coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(`0`, `1`, `2`, `3`));
4114	coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(`0`, `1`, `2`, `3`));
4115	coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(`0`, `1`, `2`, `3`));
4116	}
4117	#else
4118	switch (order)
4119	{
4120	case `12`: ((drflac_int32)&coefficients128_8)[`0`] = coefficients[`11`]; ((drflac_int32)&samples128_8)[`0`] = pDecodedSamples[-`12`];
4121	case `11`: ((drflac_int32)&coefficients128_8)[`1`] = coefficients[`10`]; ((drflac_int32)&samples128_8)[`1`] = pDecodedSamples[-`11`];
4122	case `10`: ((drflac_int32)&coefficients128_8)[`2`] = coefficients[ `9`]; ((drflac_int32)&samples128_8)[`2`] = pDecodedSamples[-`10`];
4123	case `9`: ((drflac_int32)&coefficients128_8)[`3`] = coefficients[ `8`]; ((drflac_int32)&samples128_8)[`3`] = pDecodedSamples[- `9`];
4124	case `8`: ((drflac_int32)&coefficients128_4)[`0`] = coefficients[ `7`]; ((drflac_int32)&samples128_4)[`0`] = pDecodedSamples[- `8`];
4125	case `7`: ((drflac_int32)&coefficients128_4)[`1`] = coefficients[ `6`]; ((drflac_int32)&samples128_4)[`1`] = pDecodedSamples[- `7`];
4126	case `6`: ((drflac_int32)&coefficients128_4)[`2`] = coefficients[ `5`]; ((drflac_int32)&samples128_4)[`2`] = pDecodedSamples[- `6`];
4127	case `5`: ((drflac_int32)&coefficients128_4)[`3`] = coefficients[ `4`]; ((drflac_int32)&samples128_4)[`3`] = pDecodedSamples[- `5`];
4128	case `4`: ((drflac_int32)&coefficients128_0)[`0`] = coefficients[ `3`]; ((drflac_int32)&samples128_0)[`0`] = pDecodedSamples[- `4`];
4129	case `3`: ((drflac_int32)&coefficients128_0)[`1`] = coefficients[ `2`]; ((drflac_int32)&samples128_0)[`1`] = pDecodedSamples[- `3`];
4130	case `2`: ((drflac_int32)&coefficients128_0)[`2`] = coefficients[ `1`]; ((drflac_int32)&samples128_0)[`2`] = pDecodedSamples[- `2`];
4131	case `1`: ((drflac_int32)&coefficients128_0)[`3`] = coefficients[ `0`]; ((drflac_int32)&samples128_0)[`3`] = pDecodedSamples[- `1`];
4132	}
4133	#endif
4134
4135	/ For this version we are doing one sample at a time. /
4136	while (pDecodedSamples < pDecodedSamplesEnd) {
4137	__m128i zeroCountPart128;
4138	__m128i riceParamPart128;
4139
4140	if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) \|\|
4141	!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) \|\|
4142	!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) \|\|
4143	!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
4144	return DRFLAC_FALSE;
4145	}
4146
4147	zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
4148	riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
4149
4150	riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
4151	riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
4152	riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, `1`), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(`1`))), _mm_set1_epi32(`1`)));
4153
4154	for (i = `0`; i < `4`; i += `1`) {
4155	prediction128 = _mm_xor_si128(prediction128, prediction128); / Reset to 0. /
4156
4157	switch (order)
4158	{
4159	case `12`:
4160	case `11`: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(`1`, `1`, `0`, `0`)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(`1`, `1`, `0`, `0`))));
4161	case `10`:
4162	case `9`: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(`3`, `3`, `2`, `2`)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(`3`, `3`, `2`, `2`))));
4163	case `8`:
4164	case `7`: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(`1`, `1`, `0`, `0`)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(`1`, `1`, `0`, `0`))));
4165	case `6`:
4166	case `5`: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(`3`, `3`, `2`, `2`)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(`3`, `3`, `2`, `2`))));
4167	case `4`:
4168	case `3`: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(`1`, `1`, `0`, `0`)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(`1`, `1`, `0`, `0`))));
4169	case `2`:
4170	case `1`: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(`3`, `3`, `2`, `2`)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(`3`, `3`, `2`, `2`))));
4171	}
4172
4173	/ Horizontal add and shift. /
4174	prediction128 = drflac__mm_hadd_epi64(prediction128);
4175	prediction128 = drflac__mm_srai_epi64(prediction128, shift);
4176	prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4177
4178	/ Our value should be sitting in prediction128[0]. We need to combine this with our SSE samples. /
4179	samples128_8 = _mm_alignr_epi8(samples128_4, samples128_8, `4`);
4180	samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, `4`);
4181	samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, `4`);
4182
4183	/ Slide our rice parameter down so that the value in position 0 contains the next one to process. /
4184	riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, `4`);
4185	}
4186
4187	/ We store samples in groups of 4. /
4188	_mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
4189	pDecodedSamples += `4`;
4190	}
4191
4192	/ Make sure we process the last few samples. /
4193	i = (count & ~`3`);
4194	while (i < (int)count) {
4195	/ Rice extraction. /
4196	if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
4197	return DRFLAC_FALSE;
4198	}
4199
4200	/ Rice reconstruction. /
4201	riceParamParts0 &= riceParamMask;
4202	riceParamParts0 \|= (zeroCountParts0 << riceParam);
4203	riceParamParts0 = (riceParamParts0 >> `1`) ^ t[riceParamParts0 & `0x01`];
4204
4205	/ Sample reconstruction. /
4206	pDecodedSamples[`0`] = riceParamParts0 + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
4207
4208	i += `1`;
4209	pDecodedSamples += `1`;
4210	}
4211
4212	return DRFLAC_TRUE;
4213	}
4214
4215	static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4216	{
4217	DRFLAC_ASSERT(bs != NULL);
4218	DRFLAC_ASSERT(pSamplesOut != NULL);
4219
4220	/ In my testing the order is rarely > 12, so in this case I'm going to simplify the SSE implementation by only handling order <= 12. /
4221	if (order > `0` && order <= `12`) {
4222	if (bitsPerSample+shift > `32`) {
4223	return drflac__decode_samples_with_residual__rice__sse41_64(bs, count, riceParam, order, shift, coefficients, pSamplesOut);
4224	} else {
4225	return drflac__decode_samples_with_residual__rice__sse41_32(bs, count, riceParam, order, shift, coefficients, pSamplesOut);
4226	}
4227	} else {
4228	return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
4229	}
4230	}
4231	#endif
4232
4233	#if defined(DRFLAC_SUPPORT_NEON)
4234	static DRFLAC_INLINE void drflac__vst2q_s32(drflac_int32* p, int32x4x2_t x)
4235	{
4236	vst1q_s32(p+`0`, x.val[`0`]);
4237	vst1q_s32(p+`4`, x.val[`1`]);
4238	}
4239
4240	static DRFLAC_INLINE void drflac__vst2q_u32(drflac_uint32* p, uint32x4x2_t x)
4241	{
4242	vst1q_u32(p+`0`, x.val[`0`]);
4243	vst1q_u32(p+`4`, x.val[`1`]);
4244	}
4245
4246	static DRFLAC_INLINE void drflac__vst2q_f32(float* p, float32x4x2_t x)
4247	{
4248	vst1q_f32(p+`0`, x.val[`0`]);
4249	vst1q_f32(p+`4`, x.val[`1`]);
4250	}
4251
4252	static DRFLAC_INLINE void drflac__vst2q_s16(drflac_int16* p, int16x4x2_t x)
4253	{
4254	vst1q_s16(p, vcombine_s16(x.val[`0`], x.val[`1`]));
4255	}
4256
4257	static DRFLAC_INLINE void drflac__vst2q_u16(drflac_uint16* p, uint16x4x2_t x)
4258	{
4259	vst1q_u16(p, vcombine_u16(x.val[`0`], x.val[`1`]));
4260	}
4261
4262	static DRFLAC_INLINE int32x4_t drflac__vdupq_n_s32x4(drflac_int32 x3, drflac_int32 x2, drflac_int32 x1, drflac_int32 x0)
4263	{
4264	drflac_int32 x[`4`];
4265	x[`3`] = x3;
4266	x[`2`] = x2;
4267	x[`1`] = x1;
4268	x[`0`] = x0;
4269	return vld1q_s32(x);
4270	}
4271
4272	static DRFLAC_INLINE int32x4_t drflac__valignrq_s32_1(int32x4_t a, int32x4_t b)
4273	{
4274	/ Equivalent to SSE's _mm_alignr_epi8(a, b, 4) /
4275
4276	/ Reference /
4277	/return drflac__vdupq_n_s32x4(*
4278	vgetq_lane_s32(a, 0),
4279	vgetq_lane_s32(b, 3),
4280	vgetq_lane_s32(b, 2),
4281	vgetq_lane_s32(b, 1)
4282	);/*
4283
4284	return vextq_s32(b, a, `1`);
4285	}
4286
4287	static DRFLAC_INLINE uint32x4_t drflac__valignrq_u32_1(uint32x4_t a, uint32x4_t b)
4288	{
4289	/ Equivalent to SSE's _mm_alignr_epi8(a, b, 4) /
4290
4291	/ Reference /
4292	/return drflac__vdupq_n_s32x4(*
4293	vgetq_lane_s32(a, 0),
4294	vgetq_lane_s32(b, 3),
4295	vgetq_lane_s32(b, 2),
4296	vgetq_lane_s32(b, 1)
4297	);/*
4298
4299	return vextq_u32(b, a, `1`);
4300	}
4301
4302	static DRFLAC_INLINE int32x2_t drflac__vhaddq_s32(int32x4_t x)
4303	{
4304	/ The sum must end up in position 0. /
4305
4306	/ Reference /
4307	/return vdupq_n_s32(*
4308	vgetq_lane_s32(x, 3) +
4309	vgetq_lane_s32(x, 2) +
4310	vgetq_lane_s32(x, 1) +
4311	vgetq_lane_s32(x, 0)
4312	);/*
4313
4314	int32x2_t r = vadd_s32(vget_high_s32(x), vget_low_s32(x));
4315	return vpadd_s32(r, r);
4316	}
4317
4318	static DRFLAC_INLINE int64x1_t drflac__vhaddq_s64(int64x2_t x)
4319	{
4320	return vadd_s64(vget_high_s64(x), vget_low_s64(x));
4321	}
4322
4323	static DRFLAC_INLINE int32x4_t drflac__vrevq_s32(int32x4_t x)
4324	{
4325	/ Reference /
4326	/return drflac__vdupq_n_s32x4(*
4327	vgetq_lane_s32(x, 0),
4328	vgetq_lane_s32(x, 1),
4329	vgetq_lane_s32(x, 2),
4330	vgetq_lane_s32(x, 3)
4331	);/*
4332
4333	return vrev64q_s32(vcombine_s32(vget_high_s32(x), vget_low_s32(x)));
4334	}
4335
4336	static DRFLAC_INLINE int32x4_t drflac__vnotq_s32(int32x4_t x)
4337	{
4338	return veorq_s32(x, vdupq_n_s32(`0xFFFFFFFF`));
4339	}
4340
4341	static DRFLAC_INLINE uint32x4_t drflac__vnotq_u32(uint32x4_t x)
4342	{
4343	return veorq_u32(x, vdupq_n_u32(`0xFFFFFFFF`));
4344	}
4345
4346	static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4347	{
4348	int i;
4349	drflac_uint32 riceParamMask;
4350	drflac_int32* pDecodedSamples = pSamplesOut;
4351	drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~`3`);
4352	drflac_uint32 zeroCountParts[`4`];
4353	drflac_uint32 riceParamParts[`4`];
4354	int32x4_t coefficients128_0;
4355	int32x4_t coefficients128_4;
4356	int32x4_t coefficients128_8;
4357	int32x4_t samples128_0;
4358	int32x4_t samples128_4;
4359	int32x4_t samples128_8;
4360	uint32x4_t riceParamMask128;
4361	int32x4_t riceParam128;
4362	int32x2_t shift64;
4363	uint32x4_t one128;
4364
4365	const drflac_uint32 t[`2`] = {`0x00000000`, `0xFFFFFFFF`};
4366
4367	riceParamMask = ~((~`0UL`) << riceParam);
4368	riceParamMask128 = vdupq_n_u32(riceParamMask);
4369
4370	riceParam128 = vdupq_n_s32(riceParam);
4371	shift64 = vdup_n_s32(-shift); / Negate the shift because we'll be doing a variable shift using vshlq_s32(). /
4372	one128 = vdupq_n_u32(`1`);
4373
4374	/*
4375	Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
4376	what's available in the input buffers. It would be conenient to use a fall-through switch to do this, but this results
4377	in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
4378	so I think there's opportunity for this to be simplified.
4379	*/
4380	{
4381	int runningOrder = order;
4382	drflac_int32 tempC[`4`] = {`0`, `0`, `0`, `0`};
4383	drflac_int32 tempS[`4`] = {`0`, `0`, `0`, `0`};
4384
4385	/ 0 - 3. /
4386	if (runningOrder >= `4`) {
4387	coefficients128_0 = vld1q_s32(coefficients + `0`);
4388	samples128_0 = vld1q_s32(pSamplesOut - `4`);
4389	runningOrder -= `4`;
4390	} else {
4391	switch (runningOrder) {
4392	case `3`: tempC[`2`] = coefficients[`2`]; tempS[`1`] = pSamplesOut[-`3`]; / fallthrough /
4393	case `2`: tempC[`1`] = coefficients[`1`]; tempS[`2`] = pSamplesOut[-`2`]; / fallthrough /
4394	case `1`: tempC[`0`] = coefficients[`0`]; tempS[`3`] = pSamplesOut[-`1`]; / fallthrough /
4395	}
4396
4397	coefficients128_0 = vld1q_s32(tempC);
4398	samples128_0 = vld1q_s32(tempS);
4399	runningOrder = `0`;
4400	}
4401
4402	/ 4 - 7 /
4403	if (runningOrder >= `4`) {
4404	coefficients128_4 = vld1q_s32(coefficients + `4`);
4405	samples128_4 = vld1q_s32(pSamplesOut - `8`);
4406	runningOrder -= `4`;
4407	} else {
4408	switch (runningOrder) {
4409	case `3`: tempC[`2`] = coefficients[`6`]; tempS[`1`] = pSamplesOut[-`7`]; / fallthrough /
4410	case `2`: tempC[`1`] = coefficients[`5`]; tempS[`2`] = pSamplesOut[-`6`]; / fallthrough /
4411	case `1`: tempC[`0`] = coefficients[`4`]; tempS[`3`] = pSamplesOut[-`5`]; / fallthrough /
4412	}
4413
4414	coefficients128_4 = vld1q_s32(tempC);
4415	samples128_4 = vld1q_s32(tempS);
4416	runningOrder = `0`;
4417	}
4418
4419	/ 8 - 11 /
4420	if (runningOrder == `4`) {
4421	coefficients128_8 = vld1q_s32(coefficients + `8`);
4422	samples128_8 = vld1q_s32(pSamplesOut - `12`);
4423	runningOrder -= `4`;
4424	} else {
4425	switch (runningOrder) {
4426	case `3`: tempC[`2`] = coefficients[`10`]; tempS[`1`] = pSamplesOut[-`11`]; / fallthrough /
4427	case `2`: tempC[`1`] = coefficients[ `9`]; tempS[`2`] = pSamplesOut[-`10`]; / fallthrough /
4428	case `1`: tempC[`0`] = coefficients[ `8`]; tempS[`3`] = pSamplesOut[- `9`]; / fallthrough /
4429	}
4430
4431	coefficients128_8 = vld1q_s32(tempC);
4432	samples128_8 = vld1q_s32(tempS);
4433	runningOrder = `0`;
4434	}
4435
4436	/ Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. /
4437	coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
4438	coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
4439	coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
4440	}
4441
4442	/ For this version we are doing one sample at a time. /
4443	while (pDecodedSamples < pDecodedSamplesEnd) {
4444	int32x4_t prediction128;
4445	int32x2_t prediction64;
4446	uint32x4_t zeroCountPart128;
4447	uint32x4_t riceParamPart128;
4448
4449	if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[`0`], &riceParamParts[`0`]) \|\|
4450	!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[`1`], &riceParamParts[`1`]) \|\|
4451	!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[`2`], &riceParamParts[`2`]) \|\|
4452	!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[`3`], &riceParamParts[`3`])) {
4453	return DRFLAC_FALSE;
4454	}
4455
4456	zeroCountPart128 = vld1q_u32(zeroCountParts);
4457	riceParamPart128 = vld1q_u32(riceParamParts);
4458
4459	riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
4460	riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
4461	riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, `1`), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
4462
4463	if (order <= `4`) {
4464	for (i = `0`; i < `4`; i += `1`) {
4465	prediction128 = vmulq_s32(coefficients128_0, samples128_0);
4466
4467	/ Horizontal add and shift. /
4468	prediction64 = drflac__vhaddq_s32(prediction128);
4469	prediction64 = vshl_s32(prediction64, shift64);
4470	prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4471
4472	samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(`0`)), samples128_0);
4473	riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(`0`), riceParamPart128);
4474	}
4475	} else if (order <= `8`) {
4476	for (i = `0`; i < `4`; i += `1`) {
4477	prediction128 = vmulq_s32(coefficients128_4, samples128_4);
4478	prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
4479
4480	/ Horizontal add and shift. /
4481	prediction64 = drflac__vhaddq_s32(prediction128);
4482	prediction64 = vshl_s32(prediction64, shift64);
4483	prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4484
4485	samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4486	samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(`0`)), samples128_0);
4487	riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(`0`), riceParamPart128);
4488	}
4489	} else {
4490	for (i = `0`; i < `4`; i += `1`) {
4491	prediction128 = vmulq_s32(coefficients128_8, samples128_8);
4492	prediction128 = vmlaq_s32(prediction128, coefficients128_4, samples128_4);
4493	prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
4494
4495	/ Horizontal add and shift. /
4496	prediction64 = drflac__vhaddq_s32(prediction128);
4497	prediction64 = vshl_s32(prediction64, shift64);
4498	prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4499
4500	samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
4501	samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4502	samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(`0`)), samples128_0);
4503	riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(`0`), riceParamPart128);
4504	}
4505	}
4506
4507	/ We store samples in groups of 4. /
4508	vst1q_s32(pDecodedSamples, samples128_0);
4509	pDecodedSamples += `4`;
4510	}
4511
4512	/ Make sure we process the last few samples. /
4513	i = (count & ~`3`);
4514	while (i < (int)count) {
4515	/ Rice extraction. /
4516	if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[`0`], &riceParamParts[`0`])) {
4517	return DRFLAC_FALSE;
4518	}
4519
4520	/ Rice reconstruction. /
4521	riceParamParts[`0`] &= riceParamMask;
4522	riceParamParts[`0`] \|= (zeroCountParts[`0`] << riceParam);
4523	riceParamParts[`0`] = (riceParamParts[`0`] >> `1`) ^ t[riceParamParts[`0`] & `0x01`];
4524
4525	/ Sample reconstruction. /
4526	pDecodedSamples[`0`] = riceParamParts[`0`] + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
4527
4528	i += `1`;
4529	pDecodedSamples += `1`;
4530	}
4531
4532	return DRFLAC_TRUE;
4533	}
4534
4535	static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4536	{
4537	int i;
4538	drflac_uint32 riceParamMask;
4539	drflac_int32* pDecodedSamples = pSamplesOut;
4540	drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~`3`);
4541	drflac_uint32 zeroCountParts[`4`];
4542	drflac_uint32 riceParamParts[`4`];
4543	int32x4_t coefficients128_0;
4544	int32x4_t coefficients128_4;
4545	int32x4_t coefficients128_8;
4546	int32x4_t samples128_0;
4547	int32x4_t samples128_4;
4548	int32x4_t samples128_8;
4549	uint32x4_t riceParamMask128;
4550	int32x4_t riceParam128;
4551	int64x1_t shift64;
4552	uint32x4_t one128;
4553
4554	const drflac_uint32 t[`2`] = {`0x00000000`, `0xFFFFFFFF`};
4555
4556	riceParamMask = ~((~`0UL`) << riceParam);
4557	riceParamMask128 = vdupq_n_u32(riceParamMask);
4558
4559	riceParam128 = vdupq_n_s32(riceParam);
4560	shift64 = vdup_n_s64(-shift); / Negate the shift because we'll be doing a variable shift using vshlq_s32(). /
4561	one128 = vdupq_n_u32(`1`);
4562
4563	/*
4564	Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
4565	what's available in the input buffers. It would be conenient to use a fall-through switch to do this, but this results
4566	in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
4567	so I think there's opportunity for this to be simplified.
4568	*/
4569	{
4570	int runningOrder = order;
4571	drflac_int32 tempC[`4`] = {`0`, `0`, `0`, `0`};
4572	drflac_int32 tempS[`4`] = {`0`, `0`, `0`, `0`};
4573
4574	/ 0 - 3. /
4575	if (runningOrder >= `4`) {
4576	coefficients128_0 = vld1q_s32(coefficients + `0`);
4577	samples128_0 = vld1q_s32(pSamplesOut - `4`);
4578	runningOrder -= `4`;
4579	} else {
4580	switch (runningOrder) {
4581	case `3`: tempC[`2`] = coefficients[`2`]; tempS[`1`] = pSamplesOut[-`3`]; / fallthrough /
4582	case `2`: tempC[`1`] = coefficients[`1`]; tempS[`2`] = pSamplesOut[-`2`]; / fallthrough /
4583	case `1`: tempC[`0`] = coefficients[`0`]; tempS[`3`] = pSamplesOut[-`1`]; / fallthrough /
4584	}
4585
4586	coefficients128_0 = vld1q_s32(tempC);
4587	samples128_0 = vld1q_s32(tempS);
4588	runningOrder = `0`;
4589	}
4590
4591	/ 4 - 7 /
4592	if (runningOrder >= `4`) {
4593	coefficients128_4 = vld1q_s32(coefficients + `4`);
4594	samples128_4 = vld1q_s32(pSamplesOut - `8`);
4595	runningOrder -= `4`;
4596	} else {
4597	switch (runningOrder) {
4598	case `3`: tempC[`2`] = coefficients[`6`]; tempS[`1`] = pSamplesOut[-`7`]; / fallthrough /
4599	case `2`: tempC[`1`] = coefficients[`5`]; tempS[`2`] = pSamplesOut[-`6`]; / fallthrough /
4600	case `1`: tempC[`0`] = coefficients[`4`]; tempS[`3`] = pSamplesOut[-`5`]; / fallthrough /
4601	}
4602
4603	coefficients128_4 = vld1q_s32(tempC);
4604	samples128_4 = vld1q_s32(tempS);
4605	runningOrder = `0`;
4606	}
4607
4608	/ 8 - 11 /
4609	if (runningOrder == `4`) {
4610	coefficients128_8 = vld1q_s32(coefficients + `8`);
4611	samples128_8 = vld1q_s32(pSamplesOut - `12`);
4612	runningOrder -= `4`;
4613	} else {
4614	switch (runningOrder) {
4615	case `3`: tempC[`2`] = coefficients[`10`]; tempS[`1`] = pSamplesOut[-`11`]; / fallthrough /
4616	case `2`: tempC[`1`] = coefficients[ `9`]; tempS[`2`] = pSamplesOut[-`10`]; / fallthrough /
4617	case `1`: tempC[`0`] = coefficients[ `8`]; tempS[`3`] = pSamplesOut[- `9`]; / fallthrough /
4618	}
4619
4620	coefficients128_8 = vld1q_s32(tempC);
4621	samples128_8 = vld1q_s32(tempS);
4622	runningOrder = `0`;
4623	}
4624
4625	/ Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. /
4626	coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
4627	coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
4628	coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
4629	}
4630
4631	/ For this version we are doing one sample at a time. /
4632	while (pDecodedSamples < pDecodedSamplesEnd) {
4633	int64x2_t prediction128;
4634	uint32x4_t zeroCountPart128;
4635	uint32x4_t riceParamPart128;
4636
4637	if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[`0`], &riceParamParts[`0`]) \|\|
4638	!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[`1`], &riceParamParts[`1`]) \|\|
4639	!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[`2`], &riceParamParts[`2`]) \|\|
4640	!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[`3`], &riceParamParts[`3`])) {
4641	return DRFLAC_FALSE;
4642	}
4643
4644	zeroCountPart128 = vld1q_u32(zeroCountParts);
4645	riceParamPart128 = vld1q_u32(riceParamParts);
4646
4647	riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
4648	riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
4649	riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, `1`), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
4650
4651	for (i = `0`; i < `4`; i += `1`) {
4652	int64x1_t prediction64;
4653
4654	prediction128 = veorq_s64(prediction128, prediction128); / Reset to 0. /
4655	switch (order)
4656	{
4657	case `12`:
4658	case `11`: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_8), vget_low_s32(samples128_8)));
4659	case `10`:
4660	case `9`: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_8), vget_high_s32(samples128_8)));
4661	case `8`:
4662	case `7`: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_4), vget_low_s32(samples128_4)));
4663	case `6`:
4664	case `5`: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_4), vget_high_s32(samples128_4)));
4665	case `4`:
4666	case `3`: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_0), vget_low_s32(samples128_0)));
4667	case `2`:
4668	case `1`: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_0), vget_high_s32(samples128_0)));
4669	}
4670
4671	/ Horizontal add and shift. /
4672	prediction64 = drflac__vhaddq_s64(prediction128);
4673	prediction64 = vshl_s64(prediction64, shift64);
4674	prediction64 = vadd_s64(prediction64, vdup_n_s64(vgetq_lane_u32(riceParamPart128, `0`)));
4675
4676	/ Our value should be sitting in prediction64[0]. We need to combine this with our SSE samples. /
4677	samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
4678	samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4679	samples128_0 = drflac__valignrq_s32_1(vcombine_s32(vreinterpret_s32_s64(prediction64), vdup_n_s32(`0`)), samples128_0);
4680
4681	/ Slide our rice parameter down so that the value in position 0 contains the next one to process. /
4682	riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(`0`), riceParamPart128);
4683	}
4684
4685	/ We store samples in groups of 4. /
4686	vst1q_s32(pDecodedSamples, samples128_0);
4687	pDecodedSamples += `4`;
4688	}
4689
4690	/ Make sure we process the last few samples. /
4691	i = (count & ~`3`);
4692	while (i < (int)count) {
4693	/ Rice extraction. /
4694	if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[`0`], &riceParamParts[`0`])) {
4695	return DRFLAC_FALSE;
4696	}
4697
4698	/ Rice reconstruction. /
4699	riceParamParts[`0`] &= riceParamMask;
4700	riceParamParts[`0`] \|= (zeroCountParts[`0`] << riceParam);
4701	riceParamParts[`0`] = (riceParamParts[`0`] >> `1`) ^ t[riceParamParts[`0`] & `0x01`];
4702
4703	/ Sample reconstruction. /
4704	pDecodedSamples[`0`] = riceParamParts[`0`] + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
4705
4706	i += `1`;
4707	pDecodedSamples += `1`;
4708	}
4709
4710	return DRFLAC_TRUE;
4711	}
4712
4713	static drflac_bool32 drflac__decode_samples_with_residual__rice__neon(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4714	{
4715	DRFLAC_ASSERT(bs != NULL);
4716	DRFLAC_ASSERT(pSamplesOut != NULL);
4717
4718	/ In my testing the order is rarely > 12, so in this case I'm going to simplify the NEON implementation by only handling order <= 12. /
4719	if (order > `0` && order <= `12`) {
4720	if (bitsPerSample+shift > `32`) {
4721	return drflac__decode_samples_with_residual__rice__neon_64(bs, count, riceParam, order, shift, coefficients, pSamplesOut);
4722	} else {
4723	return drflac__decode_samples_with_residual__rice__neon_32(bs, count, riceParam, order, shift, coefficients, pSamplesOut);
4724	}
4725	} else {
4726	return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
4727	}
4728	}
4729	#endif
4730
4731	static drflac_bool32 drflac__decode_samples_with_residual__rice(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4732	{
4733	#if defined(DRFLAC_SUPPORT_SSE41)
4734	if (drflac__gIsSSE41Supported) {
4735	return drflac__decode_samples_with_residual__rice__sse41(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
4736	} else
4737	#elif defined(DRFLAC_SUPPORT_NEON)
4738	if (drflac__gIsNEONSupported) {
4739	return drflac__decode_samples_with_residual__rice__neon(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
4740	} else
4741	#endif
4742	{
4743	/ Scalar fallback. /
4744	#if 0
4745	return drflac__decode_samples_with_residual__rice__reference(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
4746	#else
4747	return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
4748	#endif
4749	}
4750	}
4751
4752	/ Reads and seeks past a string of residual values as Rice codes. The decoder should be sitting on the first bit of the Rice codes. /
4753	static drflac_bool32 drflac__read_and_seek_residual__rice(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam)
4754	{
4755	drflac_uint32 i;
4756
4757	DRFLAC_ASSERT(bs != NULL);
4758
4759	for (i = `0`; i < count; ++i) {
4760	if (!drflac__seek_rice_parts(bs, riceParam)) {
4761	return DRFLAC_FALSE;
4762	}
4763	}
4764
4765	return DRFLAC_TRUE;
4766	}
4767
4768	static drflac_bool32 drflac__decode_samples_with_residual__unencoded(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 unencodedBitsPerSample, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4769	{
4770	drflac_uint32 i;
4771
4772	DRFLAC_ASSERT(bs != NULL);
4773	DRFLAC_ASSERT(unencodedBitsPerSample <= `31`); / <-- unencodedBitsPerSample is a 5 bit number, so cannot exceed 31. /
4774	DRFLAC_ASSERT(pSamplesOut != NULL);
4775
4776	for (i = `0`; i < count; ++i) {
4777	if (unencodedBitsPerSample > `0`) {
4778	if (!drflac__read_int32(bs, unencodedBitsPerSample, pSamplesOut + i)) {
4779	return DRFLAC_FALSE;
4780	}
4781	} else {
4782	pSamplesOut[i] = `0`;
4783	}
4784
4785	if (bitsPerSample >= `24`) {
4786	pSamplesOut[i] += drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + i);
4787	} else {
4788	pSamplesOut[i] += drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + i);
4789	}
4790	}
4791
4792	return DRFLAC_TRUE;
4793	}
4794
4795
4796	/*
4797	Reads and decodes the residual for the sub-frame the decoder is currently sitting on. This function should be called
4798	when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be ignored. The
4799	<blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
4800	*/
4801	static drflac_bool32 drflac__decode_samples_with_residual(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 blockSize, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
4802	{
4803	drflac_uint8 residualMethod;
4804	drflac_uint8 partitionOrder;
4805	drflac_uint32 samplesInPartition;
4806	drflac_uint32 partitionsRemaining;
4807
4808	DRFLAC_ASSERT(bs != NULL);
4809	DRFLAC_ASSERT(blockSize != `0`);
4810	DRFLAC_ASSERT(pDecodedSamples != NULL); / <-- Should we allow NULL, in which case we just seek past the residual rather than do a full decode? /
4811
4812	if (!drflac__read_uint8(bs, `2`, &residualMethod)) {
4813	return DRFLAC_FALSE;
4814	}
4815
4816	if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4817	return DRFLAC_FALSE; / Unknown or unsupported residual coding method. /
4818	}
4819
4820	/ Ignore the first <order> values. /
4821	pDecodedSamples += order;
4822
4823	if (!drflac__read_uint8(bs, `4`, &partitionOrder)) {
4824	return DRFLAC_FALSE;
4825	}
4826
4827	/*
4828	From the FLAC spec:
4829	The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
4830	*/
4831	if (partitionOrder > `8`) {
4832	return DRFLAC_FALSE;
4833	}
4834
4835	/ Validation check. /
4836	if ((blockSize / (`1` << partitionOrder)) < order) {
4837	return DRFLAC_FALSE;
4838	}
4839
4840	samplesInPartition = (blockSize / (`1` << partitionOrder)) - order;
4841	partitionsRemaining = (`1` << partitionOrder);
4842	for (;;) {
4843	drflac_uint8 riceParam = `0`;
4844	if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
4845	if (!drflac__read_uint8(bs, `4`, &riceParam)) {
4846	return DRFLAC_FALSE;
4847	}
4848	if (riceParam == `15`) {
4849	riceParam = `0xFF`;
4850	}
4851	} else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4852	if (!drflac__read_uint8(bs, `5`, &riceParam)) {
4853	return DRFLAC_FALSE;
4854	}
4855	if (riceParam == `31`) {
4856	riceParam = `0xFF`;
4857	}
4858	}
4859
4860	if (riceParam != `0xFF`) {
4861	if (!drflac__decode_samples_with_residual__rice(bs, bitsPerSample, samplesInPartition, riceParam, order, shift, coefficients, pDecodedSamples)) {
4862	return DRFLAC_FALSE;
4863	}
4864	} else {
4865	drflac_uint8 unencodedBitsPerSample = `0`;
4866	if (!drflac__read_uint8(bs, `5`, &unencodedBitsPerSample)) {
4867	return DRFLAC_FALSE;
4868	}
4869
4870	if (!drflac__decode_samples_with_residual__unencoded(bs, bitsPerSample, samplesInPartition, unencodedBitsPerSample, order, shift, coefficients, pDecodedSamples)) {
4871	return DRFLAC_FALSE;
4872	}
4873	}
4874
4875	pDecodedSamples += samplesInPartition;
4876
4877	if (partitionsRemaining == `1`) {
4878	break;
4879	}
4880
4881	partitionsRemaining -= `1`;
4882
4883	if (partitionOrder != `0`) {
4884	samplesInPartition = blockSize / (`1` << partitionOrder);
4885	}
4886	}
4887
4888	return DRFLAC_TRUE;
4889	}
4890
4891	/*
4892	Reads and seeks past the residual for the sub-frame the decoder is currently sitting on. This function should be called
4893	when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be set to 0. The
4894	<blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
4895	*/
4896	static drflac_bool32 drflac__read_and_seek_residual(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 order)
4897	{
4898	drflac_uint8 residualMethod;
4899	drflac_uint8 partitionOrder;
4900	drflac_uint32 samplesInPartition;
4901	drflac_uint32 partitionsRemaining;
4902
4903	DRFLAC_ASSERT(bs != NULL);
4904	DRFLAC_ASSERT(blockSize != `0`);
4905
4906	if (!drflac__read_uint8(bs, `2`, &residualMethod)) {
4907	return DRFLAC_FALSE;
4908	}
4909
4910	if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4911	return DRFLAC_FALSE; / Unknown or unsupported residual coding method. /
4912	}
4913
4914	if (!drflac__read_uint8(bs, `4`, &partitionOrder)) {
4915	return DRFLAC_FALSE;
4916	}
4917
4918	/*
4919	From the FLAC spec:
4920	The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
4921	*/
4922	if (partitionOrder > `8`) {
4923	return DRFLAC_FALSE;
4924	}
4925
4926	/ Validation check. /
4927	if ((blockSize / (`1` << partitionOrder)) <= order) {
4928	return DRFLAC_FALSE;
4929	}
4930
4931	samplesInPartition = (blockSize / (`1` << partitionOrder)) - order;
4932	partitionsRemaining = (`1` << partitionOrder);
4933	for (;;)
4934	{
4935	drflac_uint8 riceParam = `0`;
4936	if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
4937	if (!drflac__read_uint8(bs, `4`, &riceParam)) {
4938	return DRFLAC_FALSE;
4939	}
4940	if (riceParam == `15`) {
4941	riceParam = `0xFF`;
4942	}
4943	} else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4944	if (!drflac__read_uint8(bs, `5`, &riceParam)) {
4945	return DRFLAC_FALSE;
4946	}
4947	if (riceParam == `31`) {
4948	riceParam = `0xFF`;
4949	}
4950	}
4951
4952	if (riceParam != `0xFF`) {
4953	if (!drflac__read_and_seek_residual__rice(bs, samplesInPartition, riceParam)) {
4954	return DRFLAC_FALSE;
4955	}
4956	} else {
4957	drflac_uint8 unencodedBitsPerSample = `0`;
4958	if (!drflac__read_uint8(bs, `5`, &unencodedBitsPerSample)) {
4959	return DRFLAC_FALSE;
4960	}
4961
4962	if (!drflac__seek_bits(bs, unencodedBitsPerSample * samplesInPartition)) {
4963	return DRFLAC_FALSE;
4964	}
4965	}
4966
4967
4968	if (partitionsRemaining == `1`) {
4969	break;
4970	}
4971
4972	partitionsRemaining -= `1`;
4973	samplesInPartition = blockSize / (`1` << partitionOrder);
4974	}
4975
4976	return DRFLAC_TRUE;
4977	}
4978
4979
4980	static drflac_bool32 drflac__decode_samples__constant(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
4981	{
4982	drflac_uint32 i;
4983
4984	/ Only a single sample needs to be decoded here. /
4985	drflac_int32 sample;
4986	if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
4987	return DRFLAC_FALSE;
4988	}
4989
4990	/*
4991	We don't really need to expand this, but it does simplify the process of reading samples. If this becomes a performance issue (unlikely)
4992	we'll want to look at a more efficient way.
4993	*/
4994	for (i = `0`; i < blockSize; ++i) {
4995	pDecodedSamples[i] = sample;
4996	}
4997
4998	return DRFLAC_TRUE;
4999	}
5000
5001	static drflac_bool32 drflac__decode_samples__verbatim(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
5002	{
5003	drflac_uint32 i;
5004
5005	for (i = `0`; i < blockSize; ++i) {
5006	drflac_int32 sample;
5007	if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5008	return DRFLAC_FALSE;
5009	}
5010
5011	pDecodedSamples[i] = sample;
5012	}
5013
5014	return DRFLAC_TRUE;
5015	}
5016
5017	static drflac_bool32 drflac__decode_samples__fixed(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
5018	{
5019	drflac_uint32 i;
5020
5021	static drflac_int32 lpcCoefficientsTable[`5`][`4`] = {
5022	{`0`, `0`, `0`, `0`},
5023	{`1`, `0`, `0`, `0`},
5024	{`2`, -`1`, `0`, `0`},
5025	{`3`, -`3`, `1`, `0`},
5026	{`4`, -`6`, `4`, -`1`}
5027	};
5028
5029	/ Warm up samples and coefficients. /
5030	for (i = `0`; i < lpcOrder; ++i) {
5031	drflac_int32 sample;
5032	if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5033	return DRFLAC_FALSE;
5034	}
5035
5036	pDecodedSamples[i] = sample;
5037	}
5038
5039	if (!drflac__decode_samples_with_residual(bs, subframeBitsPerSample, blockSize, lpcOrder, `0`, lpcCoefficientsTable[lpcOrder], pDecodedSamples)) {
5040	return DRFLAC_FALSE;
5041	}
5042
5043	return DRFLAC_TRUE;
5044	}
5045
5046	static drflac_bool32 drflac__decode_samples__lpc(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 bitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
5047	{
5048	drflac_uint8 i;
5049	drflac_uint8 lpcPrecision;
5050	drflac_int8 lpcShift;
5051	drflac_int32 coefficients[`32`];
5052
5053	/ Warm up samples. /
5054	for (i = `0`; i < lpcOrder; ++i) {
5055	drflac_int32 sample;
5056	if (!drflac__read_int32(bs, bitsPerSample, &sample)) {
5057	return DRFLAC_FALSE;
5058	}
5059
5060	pDecodedSamples[i] = sample;
5061	}
5062
5063	if (!drflac__read_uint8(bs, `4`, &lpcPrecision)) {
5064	return DRFLAC_FALSE;
5065	}
5066	if (lpcPrecision == `15`) {
5067	return DRFLAC_FALSE; / Invalid. /
5068	}
5069	lpcPrecision += `1`;
5070
5071	if (!drflac__read_int8(bs, `5`, &lpcShift)) {
5072	return DRFLAC_FALSE;
5073	}
5074
5075	/*
5076	From the FLAC specification:
5077
5078	Quantized linear predictor coefficient shift needed in bits (NOTE: this number is signed two's-complement)
5079
5080	Emphasis on the "signed two's-complement". In practice there does not seem to be any encoders nor decoders supporting negative shifts. For now dr_flac is
5081	not going to support negative shifts as I don't have any reference files. However, when a reference file comes through I will consider adding support.
5082	*/
5083	if (lpcShift < `0`) {
5084	return DRFLAC_FALSE;
5085	}
5086
5087	DRFLAC_ZERO_MEMORY(coefficients, sizeof(coefficients));
5088	for (i = `0`; i < lpcOrder; ++i) {
5089	if (!drflac__read_int32(bs, lpcPrecision, coefficients + i)) {
5090	return DRFLAC_FALSE;
5091	}
5092	}
5093
5094	if (!drflac__decode_samples_with_residual(bs, bitsPerSample, blockSize, lpcOrder, lpcShift, coefficients, pDecodedSamples)) {
5095	return DRFLAC_FALSE;
5096	}
5097
5098	return DRFLAC_TRUE;
5099	}
5100
5101
5102	static drflac_bool32 drflac__read_next_flac_frame_header(drflac_bs* bs, drflac_uint8 streaminfoBitsPerSample, drflac_frame_header* header)
5103	{
5104	const drflac_uint32 sampleRateTable[`12`] = {`0`, `88200`, `176400`, `192000`, `8000`, `16000`, `22050`, `24000`, `32000`, `44100`, `48000`, `96000`};
5105	const drflac_uint8 bitsPerSampleTable[`8`] = {`0`, `8`, `12`, (drflac_uint8)-`1`, `16`, `20`, `24`, (drflac_uint8)-`1`}; / -1 = reserved. /
5106
5107	DRFLAC_ASSERT(bs != NULL);
5108	DRFLAC_ASSERT(header != NULL);
5109
5110	/ Keep looping until we find a valid sync code. /
5111	for (;;) {
5112	drflac_uint8 crc8 = `0xCE`; / 0xCE = drflac_crc8(0, 0x3FFE, 14); /
5113	drflac_uint8 reserved = `0`;
5114	drflac_uint8 blockingStrategy = `0`;
5115	drflac_uint8 blockSize = `0`;
5116	drflac_uint8 sampleRate = `0`;
5117	drflac_uint8 channelAssignment = `0`;
5118	drflac_uint8 bitsPerSample = `0`;
5119	drflac_bool32 isVariableBlockSize;
5120
5121	if (!drflac__find_and_seek_to_next_sync_code(bs)) {
5122	return DRFLAC_FALSE;
5123	}
5124
5125	if (!drflac__read_uint8(bs, `1`, &reserved)) {
5126	return DRFLAC_FALSE;
5127	}
5128	if (reserved == `1`) {
5129	continue;
5130	}
5131	crc8 = drflac_crc8(crc8, reserved, `1`);
5132
5133	if (!drflac__read_uint8(bs, `1`, &blockingStrategy)) {
5134	return DRFLAC_FALSE;
5135	}
5136	crc8 = drflac_crc8(crc8, blockingStrategy, `1`);
5137
5138	if (!drflac__read_uint8(bs, `4`, &blockSize)) {
5139	return DRFLAC_FALSE;
5140	}
5141	if (blockSize == `0`) {
5142	continue;
5143	}
5144	crc8 = drflac_crc8(crc8, blockSize, `4`);
5145
5146	if (!drflac__read_uint8(bs, `4`, &sampleRate)) {
5147	return DRFLAC_FALSE;
5148	}
5149	crc8 = drflac_crc8(crc8, sampleRate, `4`);
5150
5151	if (!drflac__read_uint8(bs, `4`, &channelAssignment)) {
5152	return DRFLAC_FALSE;
5153	}
5154	if (channelAssignment > `10`) {
5155	continue;
5156	}
5157	crc8 = drflac_crc8(crc8, channelAssignment, `4`);
5158
5159	if (!drflac__read_uint8(bs, `3`, &bitsPerSample)) {
5160	return DRFLAC_FALSE;
5161	}
5162	if (bitsPerSample == `3` \|\| bitsPerSample == `7`) {
5163	continue;
5164	}
5165	crc8 = drflac_crc8(crc8, bitsPerSample, `3`);
5166
5167
5168	if (!drflac__read_uint8(bs, `1`, &reserved)) {
5169	return DRFLAC_FALSE;
5170	}
5171	if (reserved == `1`) {
5172	continue;
5173	}
5174	crc8 = drflac_crc8(crc8, reserved, `1`);
5175
5176
5177	isVariableBlockSize = blockingStrategy == `1`;
5178	if (isVariableBlockSize) {
5179	drflac_uint64 pcmFrameNumber;
5180	drflac_result result = drflac__read_utf8_coded_number(bs, &pcmFrameNumber, &crc8);
5181	if (result != DRFLAC_SUCCESS) {
5182	if (result == DRFLAC_AT_END) {
5183	return DRFLAC_FALSE;
5184	} else {
5185	continue;
5186	}
5187	}
5188	header->flacFrameNumber = `0`;
5189	header->pcmFrameNumber = pcmFrameNumber;
5190	} else {
5191	drflac_uint64 flacFrameNumber = `0`;
5192	drflac_result result = drflac__read_utf8_coded_number(bs, &flacFrameNumber, &crc8);
5193	if (result != DRFLAC_SUCCESS) {
5194	if (result == DRFLAC_AT_END) {
5195	return DRFLAC_FALSE;
5196	} else {
5197	continue;
5198	}
5199	}
5200	header->flacFrameNumber = (drflac_uint32)flacFrameNumber; / <-- Safe cast. /
5201	header->pcmFrameNumber = `0`;
5202	}
5203
5204
5205	DRFLAC_ASSERT(blockSize > `0`);
5206	if (blockSize == `1`) {
5207	header->blockSizeInPCMFrames = `192`;
5208	} else if (blockSize <= `5`) {
5209	DRFLAC_ASSERT(blockSize >= `2`);
5210	header->blockSizeInPCMFrames = `576` * (`1` << (blockSize - `2`));
5211	} else if (blockSize == `6`) {
5212	if (!drflac__read_uint16(bs, `8`, &header->blockSizeInPCMFrames)) {
5213	return DRFLAC_FALSE;
5214	}
5215	crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, `8`);
5216	header->blockSizeInPCMFrames += `1`;
5217	} else if (blockSize == `7`) {
5218	if (!drflac__read_uint16(bs, `16`, &header->blockSizeInPCMFrames)) {
5219	return DRFLAC_FALSE;
5220	}
5221	crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, `16`);
5222	header->blockSizeInPCMFrames += `1`;
5223	} else {
5224	DRFLAC_ASSERT(blockSize >= `8`);
5225	header->blockSizeInPCMFrames = `256` * (`1` << (blockSize - `8`));
5226	}
5227
5228
5229	if (sampleRate <= `11`) {
5230	header->sampleRate = sampleRateTable[sampleRate];
5231	} else if (sampleRate == `12`) {
5232	if (!drflac__read_uint32(bs, `8`, &header->sampleRate)) {
5233	return DRFLAC_FALSE;
5234	}
5235	crc8 = drflac_crc8(crc8, header->sampleRate, `8`);
5236	header->sampleRate *= `1000`;
5237	} else if (sampleRate == `13`) {
5238	if (!drflac__read_uint32(bs, `16`, &header->sampleRate)) {
5239	return DRFLAC_FALSE;
5240	}
5241	crc8 = drflac_crc8(crc8, header->sampleRate, `16`);
5242	} else if (sampleRate == `14`) {
5243	if (!drflac__read_uint32(bs, `16`, &header->sampleRate)) {
5244	return DRFLAC_FALSE;
5245	}
5246	crc8 = drflac_crc8(crc8, header->sampleRate, `16`);
5247	header->sampleRate *= `10`;
5248	} else {
5249	continue; / Invalid. Assume an invalid block. /
5250	}
5251
5252
5253	header->channelAssignment = channelAssignment;
5254
5255	header->bitsPerSample = bitsPerSampleTable[bitsPerSample];
5256	if (header->bitsPerSample == `0`) {
5257	header->bitsPerSample = streaminfoBitsPerSample;
5258	}
5259
5260	if (!drflac__read_uint8(bs, `8`, &header->crc8)) {
5261	return DRFLAC_FALSE;
5262	}
5263
5264	#ifndef DR_FLAC_NO_CRC
5265	if (header->crc8 != crc8) {
5266	continue; / CRC mismatch. Loop back to the top and find the next sync code. /
5267	}
5268	#endif
5269	return DRFLAC_TRUE;
5270	}
5271	}
5272
5273	static drflac_bool32 drflac__read_subframe_header(drflac_bs* bs, drflac_subframe* pSubframe)
5274	{
5275	drflac_uint8 header;
5276	int type;
5277
5278	if (!drflac__read_uint8(bs, `8`, &header)) {
5279	return DRFLAC_FALSE;
5280	}
5281
5282	/ First bit should always be 0. /
5283	if ((header & `0x80`) != `0`) {
5284	return DRFLAC_FALSE;
5285	}
5286
5287	type = (header & `0x7E`) >> `1`;
5288	if (type == `0`) {
5289	pSubframe->subframeType = DRFLAC_SUBFRAME_CONSTANT;
5290	} else if (type == `1`) {
5291	pSubframe->subframeType = DRFLAC_SUBFRAME_VERBATIM;
5292	} else {
5293	if ((type & `0x20`) != `0`) {
5294	pSubframe->subframeType = DRFLAC_SUBFRAME_LPC;
5295	pSubframe->lpcOrder = (drflac_uint8)(type & `0x1F`) + `1`;
5296	} else if ((type & `0x08`) != `0`) {
5297	pSubframe->subframeType = DRFLAC_SUBFRAME_FIXED;
5298	pSubframe->lpcOrder = (drflac_uint8)(type & `0x07`);
5299	if (pSubframe->lpcOrder > `4`) {
5300	pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
5301	pSubframe->lpcOrder = `0`;
5302	}
5303	} else {
5304	pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
5305	}
5306	}
5307
5308	if (pSubframe->subframeType == DRFLAC_SUBFRAME_RESERVED) {
5309	return DRFLAC_FALSE;
5310	}
5311
5312	/ Wasted bits per sample. /
5313	pSubframe->wastedBitsPerSample = `0`;
5314	if ((header & `0x01`) == `1`) {
5315	unsigned int wastedBitsPerSample;
5316	if (!drflac__seek_past_next_set_bit(bs, &wastedBitsPerSample)) {
5317	return DRFLAC_FALSE;
5318	}
5319	pSubframe->wastedBitsPerSample = (drflac_uint8)wastedBitsPerSample + `1`;
5320	}
5321
5322	return DRFLAC_TRUE;
5323	}
5324
5325	static drflac_bool32 drflac__decode_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex, drflac_int32* pDecodedSamplesOut)
5326	{
5327	drflac_subframe* pSubframe;
5328	drflac_uint32 subframeBitsPerSample;
5329
5330	DRFLAC_ASSERT(bs != NULL);
5331	DRFLAC_ASSERT(frame != NULL);
5332
5333	pSubframe = frame->subframes + subframeIndex;
5334	if (!drflac__read_subframe_header(bs, pSubframe)) {
5335	return DRFLAC_FALSE;
5336	}
5337
5338	/ Side channels require an extra bit per sample. Took a while to figure that one out... /
5339	subframeBitsPerSample = frame->header.bitsPerSample;
5340	if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE \|\| frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == `1`) {
5341	subframeBitsPerSample += `1`;
5342	} else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == `0`) {
5343	subframeBitsPerSample += `1`;
5344	}
5345
5346	/ Need to handle wasted bits per sample. /
5347	if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
5348	return DRFLAC_FALSE;
5349	}
5350	subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
5351
5352	pSubframe->pSamplesS32 = pDecodedSamplesOut;
5353
5354	switch (pSubframe->subframeType)
5355	{
5356	case DRFLAC_SUBFRAME_CONSTANT:
5357	{
5358	drflac__decode_samples__constant(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
5359	} break;
5360
5361	case DRFLAC_SUBFRAME_VERBATIM:
5362	{
5363	drflac__decode_samples__verbatim(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
5364	} break;
5365
5366	case DRFLAC_SUBFRAME_FIXED:
5367	{
5368	drflac__decode_samples__fixed(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
5369	} break;
5370
5371	case DRFLAC_SUBFRAME_LPC:
5372	{
5373	drflac__decode_samples__lpc(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
5374	} break;
5375
5376	default: return DRFLAC_FALSE;
5377	}
5378
5379	return DRFLAC_TRUE;
5380	}
5381
5382	static drflac_bool32 drflac__seek_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex)
5383	{
5384	drflac_subframe* pSubframe;
5385	drflac_uint32 subframeBitsPerSample;
5386
5387	DRFLAC_ASSERT(bs != NULL);
5388	DRFLAC_ASSERT(frame != NULL);
5389
5390	pSubframe = frame->subframes + subframeIndex;
5391	if (!drflac__read_subframe_header(bs, pSubframe)) {
5392	return DRFLAC_FALSE;
5393	}
5394
5395	/ Side channels require an extra bit per sample. Took a while to figure that one out... /
5396	subframeBitsPerSample = frame->header.bitsPerSample;
5397	if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE \|\| frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == `1`) {
5398	subframeBitsPerSample += `1`;
5399	} else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == `0`) {
5400	subframeBitsPerSample += `1`;
5401	}
5402
5403	/ Need to handle wasted bits per sample. /
5404	if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
5405	return DRFLAC_FALSE;
5406	}
5407	subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
5408
5409	pSubframe->pSamplesS32 = NULL;
5410
5411	switch (pSubframe->subframeType)
5412	{
5413	case DRFLAC_SUBFRAME_CONSTANT:
5414	{
5415	if (!drflac__seek_bits(bs, subframeBitsPerSample)) {
5416	return DRFLAC_FALSE;
5417	}
5418	} break;
5419
5420	case DRFLAC_SUBFRAME_VERBATIM:
5421	{
5422	unsigned int bitsToSeek = frame->header.blockSizeInPCMFrames * subframeBitsPerSample;
5423	if (!drflac__seek_bits(bs, bitsToSeek)) {
5424	return DRFLAC_FALSE;
5425	}
5426	} break;
5427
5428	case DRFLAC_SUBFRAME_FIXED:
5429	{
5430	unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
5431	if (!drflac__seek_bits(bs, bitsToSeek)) {
5432	return DRFLAC_FALSE;
5433	}
5434
5435	if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
5436	return DRFLAC_FALSE;
5437	}
5438	} break;
5439
5440	case DRFLAC_SUBFRAME_LPC:
5441	{
5442	drflac_uint8 lpcPrecision;
5443
5444	unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
5445	if (!drflac__seek_bits(bs, bitsToSeek)) {
5446	return DRFLAC_FALSE;
5447	}
5448
5449	if (!drflac__read_uint8(bs, `4`, &lpcPrecision)) {
5450	return DRFLAC_FALSE;
5451	}
5452	if (lpcPrecision == `15`) {
5453	return DRFLAC_FALSE; / Invalid. /
5454	}
5455	lpcPrecision += `1`;
5456
5457
5458	bitsToSeek = (pSubframe->lpcOrder * lpcPrecision) + `5`; / +5 for shift. /
5459	if (!drflac__seek_bits(bs, bitsToSeek)) {
5460	return DRFLAC_FALSE;
5461	}
5462
5463	if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
5464	return DRFLAC_FALSE;
5465	}
5466	} break;
5467
5468	default: return DRFLAC_FALSE;
5469	}
5470
5471	return DRFLAC_TRUE;
5472	}
5473
5474
5475	static DRFLAC_INLINE drflac_uint8 drflac__get_channel_count_from_channel_assignment(drflac_int8 channelAssignment)
5476	{
5477	drflac_uint8 lookup[] = {`1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `2`, `2`, `2`};
5478
5479	DRFLAC_ASSERT(channelAssignment <= `10`);
5480	return lookup[channelAssignment];
5481	}
5482
5483	static drflac_result drflac__decode_flac_frame(drflac* pFlac)
5484	{
5485	int channelCount;
5486	int i;
5487	drflac_uint8 paddingSizeInBits;
5488	drflac_uint16 desiredCRC16;
5489	#ifndef DR_FLAC_NO_CRC
5490	drflac_uint16 actualCRC16;
5491	#endif
5492
5493	/ This function should be called while the stream is sitting on the first byte after the frame header. /
5494	DRFLAC_ZERO_MEMORY(pFlac->currentFLACFrame.subframes, sizeof(pFlac->currentFLACFrame.subframes));
5495
5496	/ The frame block size must never be larger than the maximum block size defined by the FLAC stream. /
5497	if (pFlac->currentFLACFrame.header.blockSizeInPCMFrames > pFlac->maxBlockSizeInPCMFrames) {
5498	return DRFLAC_ERROR;
5499	}
5500
5501	/ The number of channels in the frame must match the channel count from the STREAMINFO block. /
5502	channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
5503	if (channelCount != (int)pFlac->channels) {
5504	return DRFLAC_ERROR;
5505	}
5506
5507	for (i = `0`; i < channelCount; ++i) {
5508	if (!drflac__decode_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i, pFlac->pDecodedSamples + (pFlac->currentFLACFrame.header.blockSizeInPCMFrames * i))) {
5509	return DRFLAC_ERROR;
5510	}
5511	}
5512
5513	paddingSizeInBits = (drflac_uint8)(DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & `7`);
5514	if (paddingSizeInBits > `0`) {
5515	drflac_uint8 padding = `0`;
5516	if (!drflac__read_uint8(&pFlac->bs, paddingSizeInBits, &padding)) {
5517	return DRFLAC_AT_END;
5518	}
5519	}
5520
5521	#ifndef DR_FLAC_NO_CRC
5522	actualCRC16 = drflac__flush_crc16(&pFlac->bs);
5523	#endif
5524	if (!drflac__read_uint16(&pFlac->bs, `16`, &desiredCRC16)) {
5525	return DRFLAC_AT_END;
5526	}
5527
5528	#ifndef DR_FLAC_NO_CRC
5529	if (actualCRC16 != desiredCRC16) {
5530	return DRFLAC_CRC_MISMATCH; / CRC mismatch. /
5531	}
5532	#endif
5533
5534	pFlac->currentFLACFrame.pcmFramesRemaining = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
5535
5536	return DRFLAC_SUCCESS;
5537	}
5538
5539	static drflac_result drflac__seek_flac_frame(drflac* pFlac)
5540	{
5541	int channelCount;
5542	int i;
5543	drflac_uint16 desiredCRC16;
5544	#ifndef DR_FLAC_NO_CRC
5545	drflac_uint16 actualCRC16;
5546	#endif
5547
5548	channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
5549	for (i = `0`; i < channelCount; ++i) {
5550	if (!drflac__seek_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i)) {
5551	return DRFLAC_ERROR;
5552	}
5553	}
5554
5555	/ Padding. /
5556	if (!drflac__seek_bits(&pFlac->bs, DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & `7`)) {
5557	return DRFLAC_ERROR;
5558	}
5559
5560	/ CRC. /
5561	#ifndef DR_FLAC_NO_CRC
5562	actualCRC16 = drflac__flush_crc16(&pFlac->bs);
5563	#endif
5564	if (!drflac__read_uint16(&pFlac->bs, `16`, &desiredCRC16)) {
5565	return DRFLAC_AT_END;
5566	}
5567
5568	#ifndef DR_FLAC_NO_CRC
5569	if (actualCRC16 != desiredCRC16) {
5570	return DRFLAC_CRC_MISMATCH; / CRC mismatch. /
5571	}
5572	#endif
5573
5574	return DRFLAC_SUCCESS;
5575	}
5576
5577	static drflac_bool32 drflac__read_and_decode_next_flac_frame(drflac* pFlac)
5578	{
5579	DRFLAC_ASSERT(pFlac != NULL);
5580
5581	for (;;) {
5582	drflac_result result;
5583
5584	if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5585	return DRFLAC_FALSE;
5586	}
5587
5588	result = drflac__decode_flac_frame(pFlac);
5589	if (result != DRFLAC_SUCCESS) {
5590	if (result == DRFLAC_CRC_MISMATCH) {
5591	continue; / CRC mismatch. Skip to the next frame. /
5592	} else {
5593	return DRFLAC_FALSE;
5594	}
5595	}
5596
5597	return DRFLAC_TRUE;
5598	}
5599	}
5600
5601	static void drflac__get_pcm_frame_range_of_current_flac_frame(drflac* pFlac, drflac_uint64* pFirstPCMFrame, drflac_uint64* pLastPCMFrame)
5602	{
5603	drflac_uint64 firstPCMFrame;
5604	drflac_uint64 lastPCMFrame;
5605
5606	DRFLAC_ASSERT(pFlac != NULL);
5607
5608	firstPCMFrame = pFlac->currentFLACFrame.header.pcmFrameNumber;
5609	if (firstPCMFrame == `0`) {
5610	firstPCMFrame = ((drflac_uint64)pFlac->currentFLACFrame.header.flacFrameNumber) * pFlac->maxBlockSizeInPCMFrames;
5611	}
5612
5613	lastPCMFrame = firstPCMFrame + pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
5614	if (lastPCMFrame > `0`) {
5615	lastPCMFrame -= `1`; / Needs to be zero based. /
5616	}
5617
5618	if (pFirstPCMFrame) {
5619	*pFirstPCMFrame = firstPCMFrame;
5620	}
5621	if (pLastPCMFrame) {
5622	*pLastPCMFrame = lastPCMFrame;
5623	}
5624	}
5625
5626	static drflac_bool32 drflac__seek_to_first_frame(drflac* pFlac)
5627	{
5628	drflac_bool32 result;
5629
5630	DRFLAC_ASSERT(pFlac != NULL);
5631
5632	result = drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes);
5633
5634	DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
5635	pFlac->currentPCMFrame = `0`;
5636
5637	return result;
5638	}
5639
5640	static DRFLAC_INLINE drflac_result drflac__seek_to_next_flac_frame(drflac* pFlac)
5641	{
5642	/ This function should only ever be called while the decoder is sitting on the first byte past the FRAME_HEADER section. /
5643	DRFLAC_ASSERT(pFlac != NULL);
5644	return drflac__seek_flac_frame(pFlac);
5645	}
5646
5647
5648	static drflac_uint64 drflac__seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 pcmFramesToSeek)
5649	{
5650	drflac_uint64 pcmFramesRead = `0`;
5651	while (pcmFramesToSeek > `0`) {
5652	if (pFlac->currentFLACFrame.pcmFramesRemaining == `0`) {
5653	if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
5654	break; / Couldn't read the next frame, so just break from the loop and return. /
5655	}
5656	} else {
5657	if (pFlac->currentFLACFrame.pcmFramesRemaining > pcmFramesToSeek) {
5658	pcmFramesRead += pcmFramesToSeek;
5659	pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)pcmFramesToSeek; / <-- Safe cast. Will always be < currentFrame.pcmFramesRemaining < 65536. /
5660	pcmFramesToSeek = `0`;
5661	} else {
5662	pcmFramesRead += pFlac->currentFLACFrame.pcmFramesRemaining;
5663	pcmFramesToSeek -= pFlac->currentFLACFrame.pcmFramesRemaining;
5664	pFlac->currentFLACFrame.pcmFramesRemaining = `0`;
5665	}
5666	}
5667	}
5668
5669	pFlac->currentPCMFrame += pcmFramesRead;
5670	return pcmFramesRead;
5671	}
5672
5673
5674	static drflac_bool32 drflac__seek_to_pcm_frame__brute_force(drflac* pFlac, drflac_uint64 pcmFrameIndex)
5675	{
5676	drflac_bool32 isMidFrame = DRFLAC_FALSE;
5677	drflac_uint64 runningPCMFrameCount;
5678
5679	DRFLAC_ASSERT(pFlac != NULL);
5680
5681	/ If we are seeking forward we start from the current position. Otherwise we need to start all the way from the start of the file. /
5682	if (pcmFrameIndex >= pFlac->currentPCMFrame) {
5683	/ Seeking forward. Need to seek from the current position. /
5684	runningPCMFrameCount = pFlac->currentPCMFrame;
5685
5686	/ The frame header for the first frame may not yet have been read. We need to do that if necessary. /
5687	if (pFlac->currentPCMFrame == `0` && pFlac->currentFLACFrame.pcmFramesRemaining == `0`) {
5688	if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5689	return DRFLAC_FALSE;
5690	}
5691	} else {
5692	isMidFrame = DRFLAC_TRUE;
5693	}
5694	} else {
5695	/ Seeking backwards. Need to seek from the start of the file. /
5696	runningPCMFrameCount = `0`;
5697
5698	/ Move back to the start. /
5699	if (!drflac__seek_to_first_frame(pFlac)) {
5700	return DRFLAC_FALSE;
5701	}
5702
5703	/ Decode the first frame in preparation for sample-exact seeking below. /
5704	if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5705	return DRFLAC_FALSE;
5706	}
5707	}
5708
5709	/*
5710	We need to as quickly as possible find the frame that contains the target sample. To do this, we iterate over each frame and inspect its
5711	header. If based on the header we can determine that the frame contains the sample, we do a full decode of that frame.
5712	*/
5713	for (;;) {
5714	drflac_uint64 pcmFrameCountInThisFLACFrame;
5715	drflac_uint64 firstPCMFrameInFLACFrame = `0`;
5716	drflac_uint64 lastPCMFrameInFLACFrame = `0`;
5717
5718	drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
5719
5720	pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + `1`;
5721	if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
5722	/*
5723	The sample should be in this frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
5724	it never existed and keep iterating.
5725	*/
5726	drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
5727
5728	if (!isMidFrame) {
5729	drflac_result result = drflac__decode_flac_frame(pFlac);
5730	if (result == DRFLAC_SUCCESS) {
5731	/ The frame is valid. We just need to skip over some samples to ensure it's sample-exact. /
5732	return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; / <-- If this fails, something bad has happened (it should never fail). /
5733	} else {
5734	if (result == DRFLAC_CRC_MISMATCH) {
5735	goto next_iteration; / CRC mismatch. Pretend this frame never existed. /
5736	} else {
5737	return DRFLAC_FALSE;
5738	}
5739	}
5740	} else {
5741	/ We started seeking mid-frame which means we need to skip the frame decoding part. /
5742	return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
5743	}
5744	} else {
5745	/*
5746	It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
5747	frame never existed and leave the running sample count untouched.
5748	*/
5749	if (!isMidFrame) {
5750	drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
5751	if (result == DRFLAC_SUCCESS) {
5752	runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
5753	} else {
5754	if (result == DRFLAC_CRC_MISMATCH) {
5755	goto next_iteration; / CRC mismatch. Pretend this frame never existed. /
5756	} else {
5757	return DRFLAC_FALSE;
5758	}
5759	}
5760	} else {
5761	/*
5762	We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
5763	drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
5764	*/
5765	runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
5766	pFlac->currentFLACFrame.pcmFramesRemaining = `0`;
5767	isMidFrame = DRFLAC_FALSE;
5768	}
5769
5770	/ If we are seeking to the end of the file and we've just hit it, we're done. /
5771	if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
5772	return DRFLAC_TRUE;
5773	}
5774	}
5775
5776	next_iteration:
5777	/ Grab the next frame in preparation for the next iteration. /
5778	if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5779	return DRFLAC_FALSE;
5780	}
5781	}
5782	}
5783
5784
5785	#if !defined(DR_FLAC_NO_CRC)
5786	/*
5787	We use an average compression ratio to determine our approximate start location. FLAC files are generally about 50%-70% the size of their
5788	uncompressed counterparts so we'll use this as a basis. I'm going to split the middle and use a factor of 0.6 to determine the starting
5789	location.
5790	*/
5791	#define DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO 0.6f
5792
5793	static drflac_bool32 drflac__seek_to_approximate_flac_frame_to_byte(drflac* pFlac, drflac_uint64 targetByte, drflac_uint64 rangeLo, drflac_uint64 rangeHi, drflac_uint64* pLastSuccessfulSeekOffset)
5794	{
5795	DRFLAC_ASSERT(pFlac != NULL);
5796	DRFLAC_ASSERT(pLastSuccessfulSeekOffset != NULL);
5797	DRFLAC_ASSERT(targetByte >= rangeLo);
5798	DRFLAC_ASSERT(targetByte <= rangeHi);
5799
5800	*pLastSuccessfulSeekOffset = pFlac->firstFLACFramePosInBytes;
5801
5802	for (;;) {
5803	/ After rangeLo == rangeHi == targetByte fails, we need to break out. /
5804	drflac_uint64 lastTargetByte = targetByte;
5805
5806	/ When seeking to a byte, failure probably means we've attempted to seek beyond the end of the stream. To counter this we just halve it each attempt. /
5807	if (!drflac__seek_to_byte(&pFlac->bs, targetByte)) {
5808	/ If we couldn't even seek to the first byte in the stream we have a problem. Just abandon the whole thing. /
5809	if (targetByte == `0`) {
5810	drflac__seek_to_first_frame(pFlac); / Try to recover. /
5811	return DRFLAC_FALSE;
5812	}
5813
5814	/ Halve the byte location and continue. /
5815	targetByte = rangeLo + ((rangeHi - rangeLo)/`2`);
5816	rangeHi = targetByte;
5817	} else {
5818	/ Getting here should mean that we have seeked to an appropriate byte. /
5819
5820	/ Clear the details of the FLAC frame so we don't misreport data. /
5821	DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
5822
5823	/*
5824	Now seek to the next FLAC frame. We need to decode the entire frame (not just the header) because it's possible for the header to incorrectly pass the
5825	CRC check and return bad data. We need to decode the entire frame to be more certain. Although this seems unlikely, this has happened to me in testing
5826	so it needs to stay this way for now.
5827	*/
5828	#if 1
5829	if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
5830	/ Halve the byte location and continue. /
5831	targetByte = rangeLo + ((rangeHi - rangeLo)/`2`);
5832	rangeHi = targetByte;
5833	} else {
5834	break;
5835	}
5836	#else
5837	if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5838	/ Halve the byte location and continue. /
5839	targetByte = rangeLo + ((rangeHi - rangeLo)/`2`);
5840	rangeHi = targetByte;
5841	} else {
5842	break;
5843	}
5844	#endif
5845	}
5846
5847	/ We already tried this byte and there are no more to try, break out. /
5848	if(targetByte == lastTargetByte) {
5849	return DRFLAC_FALSE;
5850	}
5851	}
5852
5853	/ The current PCM frame needs to be updated based on the frame we just seeked to. /
5854	drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
5855
5856	DRFLAC_ASSERT(targetByte <= rangeHi);
5857
5858	*pLastSuccessfulSeekOffset = targetByte;
5859	return DRFLAC_TRUE;
5860	}
5861
5862	static drflac_bool32 drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 offset)
5863	{
5864	/ This section of code would be used if we were only decoding the FLAC frame header when calling drflac__seek_to_approximate_flac_frame_to_byte(). /
5865	#if 0
5866	if (drflac__decode_flac_frame(pFlac) != DRFLAC_SUCCESS) {
5867	/ We failed to decode this frame which may be due to it being corrupt. We'll just use the next valid FLAC frame. /
5868	if (drflac__read_and_decode_next_flac_frame(pFlac) == DRFLAC_FALSE) {
5869	return DRFLAC_FALSE;
5870	}
5871	}
5872	#endif
5873
5874	return drflac__seek_forward_by_pcm_frames(pFlac, offset) == offset;
5875	}
5876
5877
5878	static drflac_bool32 drflac__seek_to_pcm_frame__binary_search_internal(drflac* pFlac, drflac_uint64 pcmFrameIndex, drflac_uint64 byteRangeLo, drflac_uint64 byteRangeHi)
5879	{
5880	/ This assumes pFlac->currentPCMFrame is sitting on byteRangeLo upon entry. /
5881
5882	drflac_uint64 targetByte;
5883	drflac_uint64 pcmRangeLo = pFlac->totalPCMFrameCount;
5884	drflac_uint64 pcmRangeHi = `0`;
5885	drflac_uint64 lastSuccessfulSeekOffset = (drflac_uint64)-`1`;
5886	drflac_uint64 closestSeekOffsetBeforeTargetPCMFrame = byteRangeLo;
5887	drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != `0`) ? pFlac->maxBlockSizeInPCMFrames*`2` : `4096`;
5888
5889	targetByte = byteRangeLo + (drflac_uint64)(((drflac_int64)((pcmFrameIndex - pFlac->currentPCMFrame) * pFlac->channels * pFlac->bitsPerSample)/`8.0f`) * DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO);
5890	if (targetByte > byteRangeHi) {
5891	targetByte = byteRangeHi;
5892	}
5893
5894	for (;;) {
5895	if (drflac__seek_to_approximate_flac_frame_to_byte(pFlac, targetByte, byteRangeLo, byteRangeHi, &lastSuccessfulSeekOffset)) {
5896	/ We found a FLAC frame. We need to check if it contains the sample we're looking for. /
5897	drflac_uint64 newPCMRangeLo;
5898	drflac_uint64 newPCMRangeHi;
5899	drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &newPCMRangeLo, &newPCMRangeHi);
5900
5901	/ If we selected the same frame, it means we should be pretty close. Just decode the rest. /
5902	if (pcmRangeLo == newPCMRangeLo) {
5903	if (!drflac__seek_to_approximate_flac_frame_to_byte(pFlac, closestSeekOffsetBeforeTargetPCMFrame, closestSeekOffsetBeforeTargetPCMFrame, byteRangeHi, &lastSuccessfulSeekOffset)) {
5904	break; / Failed to seek to closest frame. /
5905	}
5906
5907	if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
5908	return DRFLAC_TRUE;
5909	} else {
5910	break; / Failed to seek forward. /
5911	}
5912	}
5913
5914	pcmRangeLo = newPCMRangeLo;
5915	pcmRangeHi = newPCMRangeHi;
5916
5917	if (pcmRangeLo <= pcmFrameIndex && pcmRangeHi >= pcmFrameIndex) {
5918	/ The target PCM frame is in this FLAC frame. /
5919	if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame) ) {
5920	return DRFLAC_TRUE;
5921	} else {
5922	break; / Failed to seek to FLAC frame. /
5923	}
5924	} else {
5925	const float approxCompressionRatio = (drflac_int64)(lastSuccessfulSeekOffset - pFlac->firstFLACFramePosInBytes) / ((drflac_int64)(pcmRangeLo * pFlac->channels * pFlac->bitsPerSample)/`8.0f`);
5926
5927	if (pcmRangeLo > pcmFrameIndex) {
5928	/ We seeked too far forward. We need to move our target byte backward and try again. /
5929	byteRangeHi = lastSuccessfulSeekOffset;
5930	if (byteRangeLo > byteRangeHi) {
5931	byteRangeLo = byteRangeHi;
5932	}
5933
5934	targetByte = byteRangeLo + ((byteRangeHi - byteRangeLo) / `2`);
5935	if (targetByte < byteRangeLo) {
5936	targetByte = byteRangeLo;
5937	}
5938	} else /if (pcmRangeHi < pcmFrameIndex)/ {
5939	/ We didn't seek far enough. We need to move our target byte forward and try again. /
5940
5941	/ If we're close enough we can just seek forward. /
5942	if ((pcmFrameIndex - pcmRangeLo) < seekForwardThreshold) {
5943	if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
5944	return DRFLAC_TRUE;
5945	} else {
5946	break; / Failed to seek to FLAC frame. /
5947	}
5948	} else {
5949	byteRangeLo = lastSuccessfulSeekOffset;
5950	if (byteRangeHi < byteRangeLo) {
5951	byteRangeHi = byteRangeLo;
5952	}
5953
5954	targetByte = lastSuccessfulSeekOffset + (drflac_uint64)(((drflac_int64)((pcmFrameIndex-pcmRangeLo) * pFlac->channels * pFlac->bitsPerSample)/`8.0f`) * approxCompressionRatio);
5955	if (targetByte > byteRangeHi) {
5956	targetByte = byteRangeHi;
5957	}
5958
5959	if (closestSeekOffsetBeforeTargetPCMFrame < lastSuccessfulSeekOffset) {
5960	closestSeekOffsetBeforeTargetPCMFrame = lastSuccessfulSeekOffset;
5961	}
5962	}
5963	}
5964	}
5965	} else {
5966	/ Getting here is really bad. We just recover as best we can, but moving to the first frame in the stream, and then abort. /
5967	break;
5968	}
5969	}
5970
5971	drflac__seek_to_first_frame(pFlac); / <-- Try to recover. /
5972	return DRFLAC_FALSE;
5973	}
5974
5975	static drflac_bool32 drflac__seek_to_pcm_frame__binary_search(drflac* pFlac, drflac_uint64 pcmFrameIndex)
5976	{
5977	drflac_uint64 byteRangeLo;
5978	drflac_uint64 byteRangeHi;
5979	drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != `0`) ? pFlac->maxBlockSizeInPCMFrames*`2` : `4096`;
5980
5981	/ Our algorithm currently assumes the FLAC stream is currently sitting at the start. /
5982	if (drflac__seek_to_first_frame(pFlac) == DRFLAC_FALSE) {
5983	return DRFLAC_FALSE;
5984	}
5985
5986	/ If we're close enough to the start, just move to the start and seek forward. /
5987	if (pcmFrameIndex < seekForwardThreshold) {
5988	return drflac__seek_forward_by_pcm_frames(pFlac, pcmFrameIndex) == pcmFrameIndex;
5989	}
5990
5991	/*
5992	Our starting byte range is the byte position of the first FLAC frame and the approximate end of the file as if it were completely uncompressed. This ensures
5993	the entire file is included, even though most of the time it'll exceed the end of the actual stream. This is OK as the frame searching logic will handle it.
5994	*/
5995	byteRangeLo = pFlac->firstFLACFramePosInBytes;
5996	byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/`8.0f`);
5997
5998	return drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi);
5999	}
6000	#endif /* !DR_FLAC_NO_CRC */
6001
6002	static drflac_bool32 drflac__seek_to_pcm_frame__seek_table(drflac* pFlac, drflac_uint64 pcmFrameIndex)
6003	{
6004	drflac_uint32 iClosestSeekpoint = `0`;
6005	drflac_bool32 isMidFrame = DRFLAC_FALSE;
6006	drflac_uint64 runningPCMFrameCount;
6007	drflac_uint32 iSeekpoint;
6008
6009
6010	DRFLAC_ASSERT(pFlac != NULL);
6011
6012	if (pFlac->pSeekpoints == NULL \|\| pFlac->seekpointCount == `0`) {
6013	return DRFLAC_FALSE;
6014	}
6015
6016	/ Do not use the seektable if pcmFramIndex is not coverd by it. /
6017	if (pFlac->pSeekpoints[`0`].firstPCMFrame > pcmFrameIndex) {
6018	return DRFLAC_FALSE;
6019	}
6020
6021	for (iSeekpoint = `0`; iSeekpoint < pFlac->seekpointCount; ++iSeekpoint) {
6022	if (pFlac->pSeekpoints[iSeekpoint].firstPCMFrame >= pcmFrameIndex) {
6023	break;
6024	}
6025
6026	iClosestSeekpoint = iSeekpoint;
6027	}
6028
6029	/ There's been cases where the seek table contains only zeros. We need to do some basic validation on the closest seekpoint. /
6030	if (pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount == `0` \|\| pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount > pFlac->maxBlockSizeInPCMFrames) {
6031	return DRFLAC_FALSE;
6032	}
6033	if (pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame > pFlac->totalPCMFrameCount && pFlac->totalPCMFrameCount > `0`) {
6034	return DRFLAC_FALSE;
6035	}
6036
6037	#if !defined(DR_FLAC_NO_CRC)
6038	/ At this point we should know the closest seek point. We can use a binary search for this. We need to know the total sample count for this. /
6039	if (pFlac->totalPCMFrameCount > `0`) {
6040	drflac_uint64 byteRangeLo;
6041	drflac_uint64 byteRangeHi;
6042
6043	byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/`8.0f`);
6044	byteRangeLo = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset;
6045
6046	/*
6047	If our closest seek point is not the last one, we only need to search between it and the next one. The section below calculates an appropriate starting
6048	value for byteRangeHi which will clamp it appropriately.
6049
6050	Note that the next seekpoint must have an offset greater than the closest seekpoint because otherwise our binary search algorithm will break down. There
6051	have been cases where a seektable consists of seek points where every byte offset is set to 0 which causes problems. If this happens we need to abort.
6052	*/
6053	if (iClosestSeekpoint < pFlac->seekpointCount-`1`) {
6054	drflac_uint32 iNextSeekpoint = iClosestSeekpoint + `1`;
6055
6056	/ Basic validation on the seekpoints to ensure they're usable. /
6057	if (pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset >= pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset \|\| pFlac->pSeekpoints[iNextSeekpoint].pcmFrameCount == `0`) {
6058	return DRFLAC_FALSE; / The next seekpoint doesn't look right. The seek table cannot be trusted from here. Abort. /
6059	}
6060
6061	if (pFlac->pSeekpoints[iNextSeekpoint].firstPCMFrame != (((drflac_uint64)`0xFFFFFFFF` << `32`) \| `0xFFFFFFFF`)) { / Make sure it's not a placeholder seekpoint. /
6062	byteRangeHi = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset - `1`; / byteRangeHi must be zero based. /
6063	}
6064	}
6065
6066	if (drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
6067	if (drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6068	drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
6069
6070	if (drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi)) {
6071	return DRFLAC_TRUE;
6072	}
6073	}
6074	}
6075	}
6076	#endif /* !DR_FLAC_NO_CRC */
6077
6078	/ Getting here means we need to use a slower algorithm because the binary search method failed or cannot be used. /
6079
6080	/*
6081	If we are seeking forward and the closest seekpoint is _before_ the current sample, we just seek forward from where we are. Otherwise we start seeking
6082	from the seekpoint's first sample.
6083	*/
6084	if (pcmFrameIndex >= pFlac->currentPCMFrame && pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame <= pFlac->currentPCMFrame) {
6085	/ Optimized case. Just seek forward from where we are. /
6086	runningPCMFrameCount = pFlac->currentPCMFrame;
6087
6088	/ The frame header for the first frame may not yet have been read. We need to do that if necessary. /
6089	if (pFlac->currentPCMFrame == `0` && pFlac->currentFLACFrame.pcmFramesRemaining == `0`) {
6090	if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6091	return DRFLAC_FALSE;
6092	}
6093	} else {
6094	isMidFrame = DRFLAC_TRUE;
6095	}
6096	} else {
6097	/ Slower case. Seek to the start of the seekpoint and then seek forward from there. /
6098	runningPCMFrameCount = pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame;
6099
6100	if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
6101	return DRFLAC_FALSE;
6102	}
6103
6104	/ Grab the frame the seekpoint is sitting on in preparation for the sample-exact seeking below. /
6105	if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6106	return DRFLAC_FALSE;
6107	}
6108	}
6109
6110	for (;;) {
6111	drflac_uint64 pcmFrameCountInThisFLACFrame;
6112	drflac_uint64 firstPCMFrameInFLACFrame = `0`;
6113	drflac_uint64 lastPCMFrameInFLACFrame = `0`;
6114
6115	drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
6116
6117	pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + `1`;
6118	if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
6119	/*
6120	The sample should be in this frame. We need to fully decode it, but if it's an invalid frame (a CRC mismatch) we need to pretend
6121	it never existed and keep iterating.
6122	*/
6123	drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
6124
6125	if (!isMidFrame) {
6126	drflac_result result = drflac__decode_flac_frame(pFlac);
6127	if (result == DRFLAC_SUCCESS) {
6128	/ The frame is valid. We just need to skip over some samples to ensure it's sample-exact. /
6129	return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; / <-- If this fails, something bad has happened (it should never fail). /
6130	} else {
6131	if (result == DRFLAC_CRC_MISMATCH) {
6132	goto next_iteration; / CRC mismatch. Pretend this frame never existed. /
6133	} else {
6134	return DRFLAC_FALSE;
6135	}
6136	}
6137	} else {
6138	/ We started seeking mid-frame which means we need to skip the frame decoding part. /
6139	return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
6140	}
6141	} else {
6142	/*
6143	It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
6144	frame never existed and leave the running sample count untouched.
6145	*/
6146	if (!isMidFrame) {
6147	drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
6148	if (result == DRFLAC_SUCCESS) {
6149	runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
6150	} else {
6151	if (result == DRFLAC_CRC_MISMATCH) {
6152	goto next_iteration; / CRC mismatch. Pretend this frame never existed. /
6153	} else {
6154	return DRFLAC_FALSE;
6155	}
6156	}
6157	} else {
6158	/*
6159	We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
6160	drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
6161	*/
6162	runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
6163	pFlac->currentFLACFrame.pcmFramesRemaining = `0`;
6164	isMidFrame = DRFLAC_FALSE;
6165	}
6166
6167	/ If we are seeking to the end of the file and we've just hit it, we're done. /
6168	if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
6169	return DRFLAC_TRUE;
6170	}
6171	}
6172
6173	next_iteration:
6174	/ Grab the next frame in preparation for the next iteration. /
6175	if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6176	return DRFLAC_FALSE;
6177	}
6178	}
6179	}
6180
6181
6182	#ifndef DR_FLAC_NO_OGG
6183	typedef struct
6184	{
6185	drflac_uint8 capturePattern[`4`]; / Should be "OggS" /
6186	drflac_uint8 structureVersion; / Always 0. /
6187	drflac_uint8 headerType;
6188	drflac_uint64 granulePosition;
6189	drflac_uint32 serialNumber;
6190	drflac_uint32 sequenceNumber;
6191	drflac_uint32 checksum;
6192	drflac_uint8 segmentCount;
6193	drflac_uint8 segmentTable[`255`];
6194	} drflac_ogg_page_header;
6195	#endif
6196
6197	typedef struct
6198	{
6199	drflac_read_proc onRead;
6200	drflac_seek_proc onSeek;
6201	drflac_meta_proc onMeta;
6202	drflac_container container;
6203	void* pUserData;
6204	void* pUserDataMD;
6205	drflac_uint32 sampleRate;
6206	drflac_uint8 channels;
6207	drflac_uint8 bitsPerSample;
6208	drflac_uint64 totalPCMFrameCount;
6209	drflac_uint16 maxBlockSizeInPCMFrames;
6210	drflac_uint64 runningFilePos;
6211	drflac_bool32 hasStreamInfoBlock;
6212	drflac_bool32 hasMetadataBlocks;
6213	drflac_bs bs; / <-- A bit streamer is required for loading data during initialization. /
6214	drflac_frame_header firstFrameHeader; / <-- The header of the first frame that was read during relaxed initalization. Only set if there is no STREAMINFO block. /
6215
6216	#ifndef DR_FLAC_NO_OGG
6217	drflac_uint32 oggSerial;
6218	drflac_uint64 oggFirstBytePos;
6219	drflac_ogg_page_header oggBosHeader;
6220	#endif
6221	} drflac_init_info;
6222
6223	static DRFLAC_INLINE void drflac__decode_block_header(drflac_uint32 blockHeader, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
6224	{
6225	blockHeader = drflac__be2host_32(blockHeader);
6226	*isLastBlock = (drflac_uint8)((blockHeader & `0x80000000UL`) >> `31`);
6227	*blockType = (drflac_uint8)((blockHeader & `0x7F000000UL`) >> `24`);
6228	*blockSize = (blockHeader & `0x00FFFFFFUL`);
6229	}
6230
6231	static DRFLAC_INLINE drflac_bool32 drflac__read_and_decode_block_header(drflac_read_proc onRead, void* pUserData, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
6232	{
6233	drflac_uint32 blockHeader;
6234
6235	*blockSize = `0`;
6236	if (onRead(pUserData, &blockHeader, `4`) != `4`) {
6237	return DRFLAC_FALSE;
6238	}
6239
6240	drflac__decode_block_header(blockHeader, isLastBlock, blockType, blockSize);
6241	return DRFLAC_TRUE;
6242	}
6243
6244	static drflac_bool32 drflac__read_streaminfo(drflac_read_proc onRead, void* pUserData, drflac_streaminfo* pStreamInfo)
6245	{
6246	drflac_uint32 blockSizes;
6247	drflac_uint64 frameSizes = `0`;
6248	drflac_uint64 importantProps;
6249	drflac_uint8 md5[`16`];
6250
6251	/ min/max block size. /
6252	if (onRead(pUserData, &blockSizes, `4`) != `4`) {
6253	return DRFLAC_FALSE;
6254	}
6255
6256	/ min/max frame size. /
6257	if (onRead(pUserData, &frameSizes, `6`) != `6`) {
6258	return DRFLAC_FALSE;
6259	}
6260
6261	/ Sample rate, channels, bits per sample and total sample count. /
6262	if (onRead(pUserData, &importantProps, `8`) != `8`) {
6263	return DRFLAC_FALSE;
6264	}
6265
6266	/ MD5 /
6267	if (onRead(pUserData, md5, sizeof(md5)) != sizeof(md5)) {
6268	return DRFLAC_FALSE;
6269	}
6270
6271	blockSizes = drflac__be2host_32(blockSizes);
6272	frameSizes = drflac__be2host_64(frameSizes);
6273	importantProps = drflac__be2host_64(importantProps);
6274
6275	pStreamInfo->minBlockSizeInPCMFrames = (drflac_uint16)((blockSizes & `0xFFFF0000`) >> `16`);
6276	pStreamInfo->maxBlockSizeInPCMFrames = (drflac_uint16) (blockSizes & `0x0000FFFF`);
6277	pStreamInfo->minFrameSizeInPCMFrames = (drflac_uint32)((frameSizes & (((drflac_uint64)`0x00FFFFFF` << `16`) << `24`)) >> `40`);
6278	pStreamInfo->maxFrameSizeInPCMFrames = (drflac_uint32)((frameSizes & (((drflac_uint64)`0x00FFFFFF` << `16`) << `0`)) >> `16`);
6279	pStreamInfo->sampleRate = (drflac_uint32)((importantProps & (((drflac_uint64)`0x000FFFFF` << `16`) << `28`)) >> `44`);
6280	pStreamInfo->channels = (drflac_uint8 )((importantProps & (((drflac_uint64)`0x0000000E` << `16`) << `24`)) >> `41`) + `1`;
6281	pStreamInfo->bitsPerSample = (drflac_uint8 )((importantProps & (((drflac_uint64)`0x0000001F` << `16`) << `20`)) >> `36`) + `1`;
6282	pStreamInfo->totalPCMFrameCount = ((importantProps & ((((drflac_uint64)`0x0000000F` << `16`) << `16`) \| `0xFFFFFFFF`)));
6283	DRFLAC_COPY_MEMORY(pStreamInfo->md5, md5, sizeof(md5));
6284
6285	return DRFLAC_TRUE;
6286	}
6287
6288
6289	static void* drflac__malloc_default(size_t sz, void* pUserData)
6290	{
6291	(void)pUserData;
6292	return DRFLAC_MALLOC(sz);
6293	}
6294
6295	static void* drflac__realloc_default(void* p, size_t sz, void* pUserData)
6296	{
6297	(void)pUserData;
6298	return DRFLAC_REALLOC(p, sz);
6299	}
6300
6301	static void drflac__free_default(void* p, void* pUserData)
6302	{
6303	(void)pUserData;
6304	DRFLAC_FREE(p);
6305	}
6306
6307
6308	static void* drflac__malloc_from_callbacks(size_t sz, const drflac_allocation_callbacks* pAllocationCallbacks)
6309	{
6310	if (pAllocationCallbacks == NULL) {
6311	return NULL;
6312	}
6313
6314	if (pAllocationCallbacks->onMalloc != NULL) {
6315	return pAllocationCallbacks->onMalloc(sz, pAllocationCallbacks->pUserData);
6316	}
6317
6318	/ Try using realloc(). /
6319	if (pAllocationCallbacks->onRealloc != NULL) {
6320	return pAllocationCallbacks->onRealloc(NULL, sz, pAllocationCallbacks->pUserData);
6321	}
6322
6323	return NULL;
6324	}
6325
6326	static void* drflac__realloc_from_callbacks(void* p, size_t szNew, size_t szOld, const drflac_allocation_callbacks* pAllocationCallbacks)
6327	{
6328	if (pAllocationCallbacks == NULL) {
6329	return NULL;
6330	}
6331
6332	if (pAllocationCallbacks->onRealloc != NULL) {
6333	return pAllocationCallbacks->onRealloc(p, szNew, pAllocationCallbacks->pUserData);
6334	}
6335
6336	/ Try emulating realloc() in terms of malloc()/free(). /
6337	if (pAllocationCallbacks->onMalloc != NULL && pAllocationCallbacks->onFree != NULL) {
6338	void* p2;
6339
6340	p2 = pAllocationCallbacks->onMalloc(szNew, pAllocationCallbacks->pUserData);
6341	if (p2 == NULL) {
6342	return NULL;
6343	}
6344
6345	if (p != NULL) {
6346	DRFLAC_COPY_MEMORY(p2, p, szOld);
6347	pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
6348	}
6349
6350	return p2;
6351	}
6352
6353	return NULL;
6354	}
6355
6356	static void drflac__free_from_callbacks(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
6357	{
6358	if (p == NULL \|\| pAllocationCallbacks == NULL) {
6359	return;
6360	}
6361
6362	if (pAllocationCallbacks->onFree != NULL) {
6363	pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
6364	}
6365	}
6366
6367
6368	static drflac_bool32 drflac__read_and_decode_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_uint64* pFirstFramePos, drflac_uint64* pSeektablePos, drflac_uint32* pSeektableSize, drflac_allocation_callbacks* pAllocationCallbacks)
6369	{
6370	/*
6371	We want to keep track of the byte position in the stream of the seektable. At the time of calling this function we know that
6372	we'll be sitting on byte 42.
6373	*/
6374	drflac_uint64 runningFilePos = `42`;
6375	drflac_uint64 seektablePos = `0`;
6376	drflac_uint32 seektableSize = `0`;
6377
6378	for (;;) {
6379	drflac_metadata metadata;
6380	drflac_uint8 isLastBlock = `0`;
6381	drflac_uint8 blockType;
6382	drflac_uint32 blockSize;
6383	if (drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize) == DRFLAC_FALSE) {
6384	return DRFLAC_FALSE;
6385	}
6386	runningFilePos += `4`;
6387
6388	metadata.type = blockType;
6389	metadata.pRawData = NULL;
6390	metadata.rawDataSize = `0`;
6391
6392	switch (blockType)
6393	{
6394	case DRFLAC_METADATA_BLOCK_TYPE_APPLICATION:
6395	{
6396	if (blockSize < `4`) {
6397	return DRFLAC_FALSE;
6398	}
6399
6400	if (onMeta) {
6401	void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6402	if (pRawData == NULL) {
6403	return DRFLAC_FALSE;
6404	}
6405
6406	if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6407	drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6408	return DRFLAC_FALSE;
6409	}
6410
6411	metadata.pRawData = pRawData;
6412	metadata.rawDataSize = blockSize;
6413	metadata.data.application.id = drflac__be2host_32((drflac_uint32)pRawData);
6414	metadata.data.application.pData = (const void)((drflac_uint8)pRawData + sizeof(drflac_uint32));
6415	metadata.data.application.dataSize = blockSize - sizeof(drflac_uint32);
6416	onMeta(pUserDataMD, &metadata);
6417
6418	drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6419	}
6420	} break;
6421
6422	case DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE:
6423	{
6424	seektablePos = runningFilePos;
6425	seektableSize = blockSize;
6426
6427	if (onMeta) {
6428	drflac_uint32 iSeekpoint;
6429	void* pRawData;
6430
6431	pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6432	if (pRawData == NULL) {
6433	return DRFLAC_FALSE;
6434	}
6435
6436	if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6437	drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6438	return DRFLAC_FALSE;
6439	}
6440
6441	metadata.pRawData = pRawData;
6442	metadata.rawDataSize = blockSize;
6443	metadata.data.seektable.seekpointCount = blockSize/sizeof(drflac_seekpoint);
6444	metadata.data.seektable.pSeekpoints = (const drflac_seekpoint*)pRawData;
6445
6446	/ Endian swap. /
6447	for (iSeekpoint = `0`; iSeekpoint < metadata.data.seektable.seekpointCount; ++iSeekpoint) {
6448	drflac_seekpoint* pSeekpoint = (drflac_seekpoint*)pRawData + iSeekpoint;
6449	pSeekpoint->firstPCMFrame = drflac__be2host_64(pSeekpoint->firstPCMFrame);
6450	pSeekpoint->flacFrameOffset = drflac__be2host_64(pSeekpoint->flacFrameOffset);
6451	pSeekpoint->pcmFrameCount = drflac__be2host_16(pSeekpoint->pcmFrameCount);
6452	}
6453
6454	onMeta(pUserDataMD, &metadata);
6455
6456	drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6457	}
6458	} break;
6459
6460	case DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT:
6461	{
6462	if (blockSize < `8`) {
6463	return DRFLAC_FALSE;
6464	}
6465
6466	if (onMeta) {
6467	void* pRawData;
6468	const char* pRunningData;
6469	const char* pRunningDataEnd;
6470	drflac_uint32 i;
6471
6472	pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6473	if (pRawData == NULL) {
6474	return DRFLAC_FALSE;
6475	}
6476
6477	if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6478	drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6479	return DRFLAC_FALSE;
6480	}
6481
6482	metadata.pRawData = pRawData;
6483	metadata.rawDataSize = blockSize;
6484
6485	pRunningData = (const char*)pRawData;
6486	pRunningDataEnd = (const char*)pRawData + blockSize;
6487
6488	metadata.data.vorbis_comment.vendorLength = drflac__le2host_32((const* drflac_uint32*)pRunningData); pRunningData += `4`;
6489
6490	/ Need space for the rest of the block /
6491	if ((pRunningDataEnd - pRunningData) - `4` < (drflac_int64)metadata.data.vorbis_comment.vendorLength) { / <-- Note the order of operations to avoid overflow to a valid value /
6492	drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6493	return DRFLAC_FALSE;
6494	}
6495	metadata.data.vorbis_comment.vendor = pRunningData; pRunningData += metadata.data.vorbis_comment.vendorLength;
6496	metadata.data.vorbis_comment.commentCount = drflac__le2host_32((const* drflac_uint32*)pRunningData); pRunningData += `4`;
6497
6498	/ Need space for 'commentCount' comments after the block, which at minimum is a drflac_uint32 per comment /
6499	if ((pRunningDataEnd - pRunningData) / sizeof(drflac_uint32) < metadata.data.vorbis_comment.commentCount) { / <-- Note the order of operations to avoid overflow to a valid value /
6500	drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6501	return DRFLAC_FALSE;
6502	}
6503	metadata.data.vorbis_comment.pComments = pRunningData;
6504
6505	/ Check that the comments section is valid before passing it to the callback /
6506	for (i = `0`; i < metadata.data.vorbis_comment.commentCount; ++i) {
6507	drflac_uint32 commentLength;
6508
6509	if (pRunningDataEnd - pRunningData < `4`) {
6510	drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6511	return DRFLAC_FALSE;
6512	}
6513
6514	commentLength = drflac__le2host_32((const* drflac_uint32*)pRunningData); pRunningData += `4`;
6515	if (pRunningDataEnd - pRunningData < (drflac_int64)commentLength) { / <-- Note the order of operations to avoid overflow to a valid value /
6516	drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6517	return DRFLAC_FALSE;
6518	}
6519	pRunningData += commentLength;
6520	}
6521
6522	onMeta(pUserDataMD, &metadata);
6523
6524	drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6525	}
6526	} break;
6527
6528	case DRFLAC_METADATA_BLOCK_TYPE_CUESHEET:
6529	{
6530	if (blockSize < `396`) {
6531	return DRFLAC_FALSE;
6532	}
6533
6534	if (onMeta) {
6535	void* pRawData;
6536	const char* pRunningData;
6537	const char* pRunningDataEnd;
6538	drflac_uint8 iTrack;
6539	drflac_uint8 iIndex;
6540
6541	pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6542	if (pRawData == NULL) {
6543	return DRFLAC_FALSE;
6544	}
6545
6546	if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6547	drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6548	return DRFLAC_FALSE;
6549	}
6550
6551	metadata.pRawData = pRawData;
6552	metadata.rawDataSize = blockSize;
6553
6554	pRunningData = (const char*)pRawData;
6555	pRunningDataEnd = (const char*)pRawData + blockSize;
6556
6557	DRFLAC_COPY_MEMORY(metadata.data.cuesheet.catalog, pRunningData, `128`); pRunningData += `128`;
6558	metadata.data.cuesheet.leadInSampleCount = drflac__be2host_64((const* drflac_uint64*)pRunningData); pRunningData += `8`;
6559	metadata.data.cuesheet.isCD = (pRunningData[`0`] & `0x80`) != `0`; pRunningData += `259`;
6560	metadata.data.cuesheet.trackCount = pRunningData[`0`]; pRunningData += `1`;
6561	metadata.data.cuesheet.pTrackData = pRunningData;
6562
6563	/ Check that the cuesheet tracks are valid before passing it to the callback /
6564	for (iTrack = `0`; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) {
6565	drflac_uint8 indexCount;
6566	drflac_uint32 indexPointSize;
6567
6568	if (pRunningDataEnd - pRunningData < `36`) {
6569	drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6570	return DRFLAC_FALSE;
6571	}
6572
6573	/ Skip to the index point count /
6574	pRunningData += `35`;
6575	indexCount = pRunningData[`0`]; pRunningData += `1`;
6576	indexPointSize = indexCount * sizeof(drflac_cuesheet_track_index);
6577	if (pRunningDataEnd - pRunningData < (drflac_int64)indexPointSize) {
6578	drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6579	return DRFLAC_FALSE;
6580	}
6581
6582	/ Endian swap. /
6583	for (iIndex = `0`; iIndex < indexCount; ++iIndex) {
6584	drflac_cuesheet_track_index* pTrack = (drflac_cuesheet_track_index*)pRunningData;
6585	pRunningData += sizeof(drflac_cuesheet_track_index);
6586	pTrack->offset = drflac__be2host_64(pTrack->offset);
6587	}
6588	}
6589
6590	onMeta(pUserDataMD, &metadata);
6591
6592	drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6593	}
6594	} break;
6595
6596	case DRFLAC_METADATA_BLOCK_TYPE_PICTURE:
6597	{
6598	if (blockSize < `32`) {
6599	return DRFLAC_FALSE;
6600	}
6601
6602	if (onMeta) {
6603	void* pRawData;
6604	const char* pRunningData;
6605	const char* pRunningDataEnd;
6606
6607	pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6608	if (pRawData == NULL) {
6609	return DRFLAC_FALSE;
6610	}
6611
6612	if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6613	drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6614	return DRFLAC_FALSE;
6615	}
6616
6617	metadata.pRawData = pRawData;
6618	metadata.rawDataSize = blockSize;
6619
6620	pRunningData = (const char*)pRawData;
6621	pRunningDataEnd = (const char*)pRawData + blockSize;
6622
6623	metadata.data.picture.type = drflac__be2host_32((const* drflac_uint32*)pRunningData); pRunningData += `4`;
6624	metadata.data.picture.mimeLength = drflac__be2host_32((const* drflac_uint32*)pRunningData); pRunningData += `4`;
6625
6626	/ Need space for the rest of the block /
6627	if ((pRunningDataEnd - pRunningData) - `24` < (drflac_int64)metadata.data.picture.mimeLength) { / <-- Note the order of operations to avoid overflow to a valid value /
6628	drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6629	return DRFLAC_FALSE;
6630	}
6631	metadata.data.picture.mime = pRunningData; pRunningData += metadata.data.picture.mimeLength;
6632	metadata.data.picture.descriptionLength = drflac__be2host_32((const* drflac_uint32*)pRunningData); pRunningData += `4`;
6633
6634	/ Need space for the rest of the block /
6635	if ((pRunningDataEnd - pRunningData) - `20` < (drflac_int64)metadata.data.picture.descriptionLength) { / <-- Note the order of operations to avoid overflow to a valid value /
6636	drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6637	return DRFLAC_FALSE;
6638	}
6639	metadata.data.picture.description = pRunningData; pRunningData += metadata.data.picture.descriptionLength;
6640	metadata.data.picture.width = drflac__be2host_32((const* drflac_uint32*)pRunningData); pRunningData += `4`;
6641	metadata.data.picture.height = drflac__be2host_32((const* drflac_uint32*)pRunningData); pRunningData += `4`;
6642	metadata.data.picture.colorDepth = drflac__be2host_32((const* drflac_uint32*)pRunningData); pRunningData += `4`;
6643	metadata.data.picture.indexColorCount = drflac__be2host_32((const* drflac_uint32*)pRunningData); pRunningData += `4`;
6644	metadata.data.picture.pictureDataSize = drflac__be2host_32((const* drflac_uint32*)pRunningData); pRunningData += `4`;
6645	metadata.data.picture.pPictureData = (const drflac_uint8*)pRunningData;
6646
6647	/ Need space for the picture after the block /
6648	if (pRunningDataEnd - pRunningData < (drflac_int64)metadata.data.picture.pictureDataSize) { / <-- Note the order of operations to avoid overflow to a valid value /
6649	drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6650	return DRFLAC_FALSE;
6651	}
6652
6653	onMeta(pUserDataMD, &metadata);
6654
6655	drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6656	}
6657	} break;
6658
6659	case DRFLAC_METADATA_BLOCK_TYPE_PADDING:
6660	{
6661	if (onMeta) {
6662	metadata.data.padding.unused = `0`;
6663
6664	/ Padding doesn't have anything meaningful in it, so just skip over it, but make sure the caller is aware of it by firing the callback. /
6665	if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
6666	isLastBlock = DRFLAC_TRUE; / An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. /
6667	} else {
6668	onMeta(pUserDataMD, &metadata);
6669	}
6670	}
6671	} break;
6672
6673	case DRFLAC_METADATA_BLOCK_TYPE_INVALID:
6674	{
6675	/ Invalid chunk. Just skip over this one. /
6676	if (onMeta) {
6677	if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
6678	isLastBlock = DRFLAC_TRUE; / An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. /
6679	}
6680	}
6681	} break;
6682
6683	default:
6684	{
6685	/*
6686	It's an unknown chunk, but not necessarily invalid. There's a chance more metadata blocks might be defined later on, so we
6687	can at the very least report the chunk to the application and let it look at the raw data.
6688	*/
6689	if (onMeta) {
6690	void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6691	if (pRawData == NULL) {
6692	return DRFLAC_FALSE;
6693	}
6694
6695	if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6696	drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6697	return DRFLAC_FALSE;
6698	}
6699
6700	metadata.pRawData = pRawData;
6701	metadata.rawDataSize = blockSize;
6702	onMeta(pUserDataMD, &metadata);
6703
6704	drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6705	}
6706	} break;
6707	}
6708
6709	/ If we're not handling metadata, just skip over the block. If we are, it will have been handled earlier in the switch statement above. /
6710	if (onMeta == NULL && blockSize > `0`) {
6711	if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
6712	isLastBlock = DRFLAC_TRUE;
6713	}
6714	}
6715
6716	runningFilePos += blockSize;
6717	if (isLastBlock) {
6718	break;
6719	}
6720	}
6721
6722	*pSeektablePos = seektablePos;
6723	*pSeektableSize = seektableSize;
6724	*pFirstFramePos = runningFilePos;
6725
6726	return DRFLAC_TRUE;
6727	}
6728
6729	static drflac_bool32 drflac__init_private__native(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
6730	{
6731	/ Pre Condition: The bit stream should be sitting just past the 4-byte id header. /
6732
6733	drflac_uint8 isLastBlock;
6734	drflac_uint8 blockType;
6735	drflac_uint32 blockSize;
6736
6737	(void)onSeek;
6738
6739	pInit->container = drflac_container_native;
6740
6741	/ The first metadata block should be the STREAMINFO block. /
6742	if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
6743	return DRFLAC_FALSE;
6744	}
6745
6746	if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO \|\| blockSize != `34`) {
6747	if (!relaxed) {
6748	/ We're opening in strict mode and the first block is not the STREAMINFO block. Error. /
6749	return DRFLAC_FALSE;
6750	} else {
6751	/*
6752	Relaxed mode. To open from here we need to just find the first frame and set the sample rate, etc. to whatever is defined
6753	for that frame.
6754	*/
6755	pInit->hasStreamInfoBlock = DRFLAC_FALSE;
6756	pInit->hasMetadataBlocks = DRFLAC_FALSE;
6757
6758	if (!drflac__read_next_flac_frame_header(&pInit->bs, `0`, &pInit->firstFrameHeader)) {
6759	return DRFLAC_FALSE; / Couldn't find a frame. /
6760	}
6761
6762	if (pInit->firstFrameHeader.bitsPerSample == `0`) {
6763	return DRFLAC_FALSE; / Failed to initialize because the first frame depends on the STREAMINFO block, which does not exist. /
6764	}
6765
6766	pInit->sampleRate = pInit->firstFrameHeader.sampleRate;
6767	pInit->channels = drflac__get_channel_count_from_channel_assignment(pInit->firstFrameHeader.channelAssignment);
6768	pInit->bitsPerSample = pInit->firstFrameHeader.bitsPerSample;
6769	pInit->maxBlockSizeInPCMFrames = `65535`; / <-- See notes here: https://xiph.org/flac/format.html#metadata_block_streaminfo /
6770	return DRFLAC_TRUE;
6771	}
6772	} else {
6773	drflac_streaminfo streaminfo;
6774	if (!drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
6775	return DRFLAC_FALSE;
6776	}
6777
6778	pInit->hasStreamInfoBlock = DRFLAC_TRUE;
6779	pInit->sampleRate = streaminfo.sampleRate;
6780	pInit->channels = streaminfo.channels;
6781	pInit->bitsPerSample = streaminfo.bitsPerSample;
6782	pInit->totalPCMFrameCount = streaminfo.totalPCMFrameCount;
6783	pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames; / Don't care about the min block size - only the max (used for determining the size of the memory allocation). /
6784	pInit->hasMetadataBlocks = !isLastBlock;
6785
6786	if (onMeta) {
6787	drflac_metadata metadata;
6788	metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
6789	metadata.pRawData = NULL;
6790	metadata.rawDataSize = `0`;
6791	metadata.data.streaminfo = streaminfo;
6792	onMeta(pUserDataMD, &metadata);
6793	}
6794
6795	return DRFLAC_TRUE;
6796	}
6797	}
6798
6799	#ifndef DR_FLAC_NO_OGG
6800	#define DRFLAC_OGG_MAX_PAGE_SIZE 65307
6801	#define DRFLAC_OGG_CAPTURE_PATTERN_CRC32 1605413199 /* CRC-32 of "OggS". */
6802
6803	typedef enum
6804	{
6805	drflac_ogg_recover_on_crc_mismatch,
6806	drflac_ogg_fail_on_crc_mismatch
6807	} drflac_ogg_crc_mismatch_recovery;
6808
6809	#ifndef DR_FLAC_NO_CRC
6810	static drflac_uint32 drflac__crc32_table[] = {
6811	`0x00000000L`, `0x04C11DB7L`, `0x09823B6EL`, `0x0D4326D9L`,
6812	`0x130476DCL`, `0x17C56B6BL`, `0x1A864DB2L`, `0x1E475005L`,
6813	`0x2608EDB8L`, `0x22C9F00FL`, `0x2F8AD6D6L`, `0x2B4BCB61L`,
6814	`0x350C9B64L`, `0x31CD86D3L`, `0x3C8EA00AL`, `0x384FBDBDL`,
6815	`0x4C11DB70L`, `0x48D0C6C7L`, `0x4593E01EL`, `0x4152FDA9L`,
6816	`0x5F15ADACL`, `0x5BD4B01BL`, `0x569796C2L`, `0x52568B75L`,
6817	`0x6A1936C8L`, `0x6ED82B7FL`, `0x639B0DA6L`, `0x675A1011L`,
6818	`0x791D4014L`, `0x7DDC5DA3L`, `0x709F7B7AL`, `0x745E66CDL`,
6819	`0x9823B6E0L`, `0x9CE2AB57L`, `0x91A18D8EL`, `0x95609039L`,
6820	`0x8B27C03CL`, `0x8FE6DD8BL`, `0x82A5FB52L`, `0x8664E6E5L`,
6821	`0xBE2B5B58L`, `0xBAEA46EFL`, `0xB7A96036L`, `0xB3687D81L`,
6822	`0xAD2F2D84L`, `0xA9EE3033L`, `0xA4AD16EAL`, `0xA06C0B5DL`,
6823	`0xD4326D90L`, `0xD0F37027L`, `0xDDB056FEL`, `0xD9714B49L`,
6824	`0xC7361B4CL`, `0xC3F706FBL`, `0xCEB42022L`, `0xCA753D95L`,
6825	`0xF23A8028L`, `0xF6FB9D9FL`, `0xFBB8BB46L`, `0xFF79A6F1L`,
6826	`0xE13EF6F4L`, `0xE5FFEB43L`, `0xE8BCCD9AL`, `0xEC7DD02DL`,
6827	`0x34867077L`, `0x30476DC0L`, `0x3D044B19L`, `0x39C556AEL`,
6828	`0x278206ABL`, `0x23431B1CL`, `0x2E003DC5L`, `0x2AC12072L`,
6829	`0x128E9DCFL`, `0x164F8078L`, `0x1B0CA6A1L`, `0x1FCDBB16L`,
6830	`0x018AEB13L`, `0x054BF6A4L`, `0x0808D07DL`, `0x0CC9CDCAL`,
6831	`0x7897AB07L`, `0x7C56B6B0L`, `0x71159069L`, `0x75D48DDEL`,
6832	`0x6B93DDDBL`, `0x6F52C06CL`, `0x6211E6B5L`, `0x66D0FB02L`,
6833	`0x5E9F46BFL`, `0x5A5E5B08L`, `0x571D7DD1L`, `0x53DC6066L`,
6834	`0x4D9B3063L`, `0x495A2DD4L`, `0x44190B0DL`, `0x40D816BAL`,
6835	`0xACA5C697L`, `0xA864DB20L`, `0xA527FDF9L`, `0xA1E6E04EL`,
6836	`0xBFA1B04BL`, `0xBB60ADFCL`, `0xB6238B25L`, `0xB2E29692L`,
6837	`0x8AAD2B2FL`, `0x8E6C3698L`, `0x832F1041L`, `0x87EE0DF6L`,
6838	`0x99A95DF3L`, `0x9D684044L`, `0x902B669DL`, `0x94EA7B2AL`,
6839	`0xE0B41DE7L`, `0xE4750050L`, `0xE9362689L`, `0xEDF73B3EL`,
6840	`0xF3B06B3BL`, `0xF771768CL`, `0xFA325055L`, `0xFEF34DE2L`,
6841	`0xC6BCF05FL`, `0xC27DEDE8L`, `0xCF3ECB31L`, `0xCBFFD686L`,
6842	`0xD5B88683L`, `0xD1799B34L`, `0xDC3ABDEDL`, `0xD8FBA05AL`,
6843	`0x690CE0EEL`, `0x6DCDFD59L`, `0x608EDB80L`, `0x644FC637L`,
6844	`0x7A089632L`, `0x7EC98B85L`, `0x738AAD5CL`, `0x774BB0EBL`,
6845	`0x4F040D56L`, `0x4BC510E1L`, `0x46863638L`, `0x42472B8FL`,
6846	`0x5C007B8AL`, `0x58C1663DL`, `0x558240E4L`, `0x51435D53L`,
6847	`0x251D3B9EL`, `0x21DC2629L`, `0x2C9F00F0L`, `0x285E1D47L`,
6848	`0x36194D42L`, `0x32D850F5L`, `0x3F9B762CL`, `0x3B5A6B9BL`,
6849	`0x0315D626L`, `0x07D4CB91L`, `0x0A97ED48L`, `0x0E56F0FFL`,
6850	`0x1011A0FAL`, `0x14D0BD4DL`, `0x19939B94L`, `0x1D528623L`,
6851	`0xF12F560EL`, `0xF5EE4BB9L`, `0xF8AD6D60L`, `0xFC6C70D7L`,
6852	`0xE22B20D2L`, `0xE6EA3D65L`, `0xEBA91BBCL`, `0xEF68060BL`,
6853	`0xD727BBB6L`, `0xD3E6A601L`, `0xDEA580D8L`, `0xDA649D6FL`,
6854	`0xC423CD6AL`, `0xC0E2D0DDL`, `0xCDA1F604L`, `0xC960EBB3L`,
6855	`0xBD3E8D7EL`, `0xB9FF90C9L`, `0xB4BCB610L`, `0xB07DABA7L`,
6856	`0xAE3AFBA2L`, `0xAAFBE615L`, `0xA7B8C0CCL`, `0xA379DD7BL`,
6857	`0x9B3660C6L`, `0x9FF77D71L`, `0x92B45BA8L`, `0x9675461FL`,
6858	`0x8832161AL`, `0x8CF30BADL`, `0x81B02D74L`, `0x857130C3L`,
6859	`0x5D8A9099L`, `0x594B8D2EL`, `0x5408ABF7L`, `0x50C9B640L`,
6860	`0x4E8EE645L`, `0x4A4FFBF2L`, `0x470CDD2BL`, `0x43CDC09CL`,
6861	`0x7B827D21L`, `0x7F436096L`, `0x7200464FL`, `0x76C15BF8L`,
6862	`0x68860BFDL`, `0x6C47164AL`, `0x61043093L`, `0x65C52D24L`,
6863	`0x119B4BE9L`, `0x155A565EL`, `0x18197087L`, `0x1CD86D30L`,
6864	`0x029F3D35L`, `0x065E2082L`, `0x0B1D065BL`, `0x0FDC1BECL`,
6865	`0x3793A651L`, `0x3352BBE6L`, `0x3E119D3FL`, `0x3AD08088L`,
6866	`0x2497D08DL`, `0x2056CD3AL`, `0x2D15EBE3L`, `0x29D4F654L`,
6867	`0xC5A92679L`, `0xC1683BCEL`, `0xCC2B1D17L`, `0xC8EA00A0L`,
6868	`0xD6AD50A5L`, `0xD26C4D12L`, `0xDF2F6BCBL`, `0xDBEE767CL`,
6869	`0xE3A1CBC1L`, `0xE760D676L`, `0xEA23F0AFL`, `0xEEE2ED18L`,
6870	`0xF0A5BD1DL`, `0xF464A0AAL`, `0xF9278673L`, `0xFDE69BC4L`,
6871	`0x89B8FD09L`, `0x8D79E0BEL`, `0x803AC667L`, `0x84FBDBD0L`,
6872	`0x9ABC8BD5L`, `0x9E7D9662L`, `0x933EB0BBL`, `0x97FFAD0CL`,
6873	`0xAFB010B1L`, `0xAB710D06L`, `0xA6322BDFL`, `0xA2F33668L`,
6874	`0xBCB4666DL`, `0xB8757BDAL`, `0xB5365D03L`, `0xB1F740B4L`
6875	};
6876	#endif
6877
6878	static DRFLAC_INLINE drflac_uint32 drflac_crc32_byte(drflac_uint32 crc32, drflac_uint8 data)
6879	{
6880	#ifndef DR_FLAC_NO_CRC
6881	return (crc32 << `8`) ^ drflac__crc32_table[(drflac_uint8)((crc32 >> `24`) & `0xFF`) ^ data];
6882	#else
6883	(void)data;
6884	return crc32;
6885	#endif
6886	}
6887
6888	#if 0
6889	static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint32(drflac_uint32 crc32, drflac_uint32 data)
6890	{
6891	crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> `24`) & `0xFF`));
6892	crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> `16`) & `0xFF`));
6893	crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> `8`) & `0xFF`));
6894	crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> `0`) & `0xFF`));
6895	return crc32;
6896	}
6897
6898	static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint64(drflac_uint32 crc32, drflac_uint64 data)
6899	{
6900	crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> `32`) & `0xFFFFFFFF`));
6901	crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> `0`) & `0xFFFFFFFF`));
6902	return crc32;
6903	}
6904	#endif
6905
6906	static DRFLAC_INLINE drflac_uint32 drflac_crc32_buffer(drflac_uint32 crc32, drflac_uint8* pData, drflac_uint32 dataSize)
6907	{
6908	/ This can be optimized. /
6909	drflac_uint32 i;
6910	for (i = `0`; i < dataSize; ++i) {
6911	crc32 = drflac_crc32_byte(crc32, pData[i]);
6912	}
6913	return crc32;
6914	}
6915
6916
6917	static DRFLAC_INLINE drflac_bool32 drflac_ogg__is_capture_pattern(drflac_uint8 pattern[`4`])
6918	{
6919	return pattern[`0`] == `'O'` && pattern[`1`] == `'g'` && pattern[`2`] == `'g'` && pattern[`3`] == `'S'`;
6920	}
6921
6922	static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_header_size(drflac_ogg_page_header* pHeader)
6923	{
6924	return `27` + pHeader->segmentCount;
6925	}
6926
6927	static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_body_size(drflac_ogg_page_header* pHeader)
6928	{
6929	drflac_uint32 pageBodySize = `0`;
6930	int i;
6931
6932	for (i = `0`; i < pHeader->segmentCount; ++i) {
6933	pageBodySize += pHeader->segmentTable[i];
6934	}
6935
6936	return pageBodySize;
6937	}
6938
6939	static drflac_result drflac_ogg__read_page_header_after_capture_pattern(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
6940	{
6941	drflac_uint8 data[`23`];
6942	drflac_uint32 i;
6943
6944	DRFLAC_ASSERT(*pCRC32 == DRFLAC_OGG_CAPTURE_PATTERN_CRC32);
6945
6946	if (onRead(pUserData, data, `23`) != `23`) {
6947	return DRFLAC_AT_END;
6948	}
6949	*pBytesRead += `23`;
6950
6951	/*
6952	It's not actually used, but set the capture pattern to 'OggS' for completeness. Not doing this will cause static analysers to complain about
6953	us trying to access uninitialized data. We could alternatively just comment out this member of the drflac_ogg_page_header structure, but I
6954	like to have it map to the structure of the underlying data.
6955	*/
6956	pHeader->capturePattern[`0`] = `'O'`;
6957	pHeader->capturePattern[`1`] = `'g'`;
6958	pHeader->capturePattern[`2`] = `'g'`;
6959	pHeader->capturePattern[`3`] = `'S'`;
6960
6961	pHeader->structureVersion = data[`0`];
6962	pHeader->headerType = data[`1`];
6963	DRFLAC_COPY_MEMORY(&pHeader->granulePosition, &data[ `2`], `8`);
6964	DRFLAC_COPY_MEMORY(&pHeader->serialNumber, &data[`10`], `4`);
6965	DRFLAC_COPY_MEMORY(&pHeader->sequenceNumber, &data[`14`], `4`);
6966	DRFLAC_COPY_MEMORY(&pHeader->checksum, &data[`18`], `4`);
6967	pHeader->segmentCount = data[`22`];
6968
6969	/ Calculate the CRC. Note that for the calculation the checksum part of the page needs to be set to 0. /
6970	data[`18`] = `0`;
6971	data[`19`] = `0`;
6972	data[`20`] = `0`;
6973	data[`21`] = `0`;
6974
6975	for (i = `0`; i < `23`; ++i) {
6976	pCRC32 = drflac_crc32_byte(pCRC32, data[i]);
6977	}
6978
6979
6980	if (onRead(pUserData, pHeader->segmentTable, pHeader->segmentCount) != pHeader->segmentCount) {
6981	return DRFLAC_AT_END;
6982	}
6983	*pBytesRead += pHeader->segmentCount;
6984
6985	for (i = `0`; i < pHeader->segmentCount; ++i) {
6986	pCRC32 = drflac_crc32_byte(pCRC32, pHeader->segmentTable[i]);
6987	}
6988
6989	return DRFLAC_SUCCESS;
6990	}
6991
6992	static drflac_result drflac_ogg__read_page_header(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
6993	{
6994	drflac_uint8 id[`4`];
6995
6996	*pBytesRead = `0`;
6997
6998	if (onRead(pUserData, id, `4`) != `4`) {
6999	return DRFLAC_AT_END;
7000	}
7001	*pBytesRead += `4`;
7002
7003	/ We need to read byte-by-byte until we find the OggS capture pattern. /
7004	for (;;) {
7005	if (drflac_ogg__is_capture_pattern(id)) {
7006	drflac_result result;
7007
7008	*pCRC32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
7009
7010	result = drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, pHeader, pBytesRead, pCRC32);
7011	if (result == DRFLAC_SUCCESS) {
7012	return DRFLAC_SUCCESS;
7013	} else {
7014	if (result == DRFLAC_CRC_MISMATCH) {
7015	continue;
7016	} else {
7017	return result;
7018	}
7019	}
7020	} else {
7021	/ The first 4 bytes did not equal the capture pattern. Read the next byte and try again. /
7022	id[`0`] = id[`1`];
7023	id[`1`] = id[`2`];
7024	id[`2`] = id[`3`];
7025	if (onRead(pUserData, &id[`3`], `1`) != `1`) {
7026	return DRFLAC_AT_END;
7027	}
7028	*pBytesRead += `1`;
7029	}
7030	}
7031	}
7032
7033
7034	/*
7035	The main part of the Ogg encapsulation is the conversion from the physical Ogg bitstream to the native FLAC bitstream. It works
7036	in three general stages: Ogg Physical Bitstream -> Ogg/FLAC Logical Bitstream -> FLAC Native Bitstream. dr_flac is designed
7037	in such a way that the core sections assume everything is delivered in native format. Therefore, for each encapsulation type
7038	dr_flac is supporting there needs to be a layer sitting on top of the onRead and onSeek callbacks that ensures the bits read from
7039	the physical Ogg bitstream are converted and delivered in native FLAC format.
7040	*/
7041	typedef struct
7042	{
7043	drflac_read_proc onRead; / The original onRead callback from drflac_open() and family. /
7044	drflac_seek_proc onSeek; / The original onSeek callback from drflac_open() and family. /
7045	void* pUserData; / The user data passed on onRead and onSeek. This is the user data that was passed on drflac_open() and family. /
7046	drflac_uint64 currentBytePos; / The position of the byte we are sitting on in the physical byte stream. Used for efficient seeking. /
7047	drflac_uint64 firstBytePos; / The position of the first byte in the physical bitstream. Points to the start of the "OggS" identifier of the FLAC bos page. /
7048	drflac_uint32 serialNumber; / The serial number of the FLAC audio pages. This is determined by the initial header page that was read during initialization. /
7049	drflac_ogg_page_header bosPageHeader; / Used for seeking. /
7050	drflac_ogg_page_header currentPageHeader;
7051	drflac_uint32 bytesRemainingInPage;
7052	drflac_uint32 pageDataSize;
7053	drflac_uint8 pageData[DRFLAC_OGG_MAX_PAGE_SIZE];
7054	} drflac_oggbs; / oggbs = Ogg Bitstream /
7055
7056	static size_t drflac_oggbs__read_physical(drflac_oggbs* oggbs, void* bufferOut, size_t bytesToRead)
7057	{
7058	size_t bytesActuallyRead = oggbs->onRead(oggbs->pUserData, bufferOut, bytesToRead);
7059	oggbs->currentBytePos += bytesActuallyRead;
7060
7061	return bytesActuallyRead;
7062	}
7063
7064	static drflac_bool32 drflac_oggbs__seek_physical(drflac_oggbs* oggbs, drflac_uint64 offset, drflac_seek_origin origin)
7065	{
7066	if (origin == drflac_seek_origin_start) {
7067	if (offset <= `0x7FFFFFFF`) {
7068	if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_start)) {
7069	return DRFLAC_FALSE;
7070	}
7071	oggbs->currentBytePos = offset;
7072
7073	return DRFLAC_TRUE;
7074	} else {
7075	if (!oggbs->onSeek(oggbs->pUserData, `0x7FFFFFFF`, drflac_seek_origin_start)) {
7076	return DRFLAC_FALSE;
7077	}
7078	oggbs->currentBytePos = offset;
7079
7080	return drflac_oggbs__seek_physical(oggbs, offset - `0x7FFFFFFF`, drflac_seek_origin_current);
7081	}
7082	} else {
7083	while (offset > `0x7FFFFFFF`) {
7084	if (!oggbs->onSeek(oggbs->pUserData, `0x7FFFFFFF`, drflac_seek_origin_current)) {
7085	return DRFLAC_FALSE;
7086	}
7087	oggbs->currentBytePos += `0x7FFFFFFF`;
7088	offset -= `0x7FFFFFFF`;
7089	}
7090
7091	if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_current)) { / <-- Safe cast thanks to the loop above. /
7092	return DRFLAC_FALSE;
7093	}
7094	oggbs->currentBytePos += offset;
7095
7096	return DRFLAC_TRUE;
7097	}
7098	}
7099
7100	static drflac_bool32 drflac_oggbs__goto_next_page(drflac_oggbs* oggbs, drflac_ogg_crc_mismatch_recovery recoveryMethod)
7101	{
7102	drflac_ogg_page_header header;
7103	for (;;) {
7104	drflac_uint32 crc32 = `0`;
7105	drflac_uint32 bytesRead;
7106	drflac_uint32 pageBodySize;
7107	#ifndef DR_FLAC_NO_CRC
7108	drflac_uint32 actualCRC32;
7109	#endif
7110
7111	if (drflac_ogg__read_page_header(oggbs->onRead, oggbs->pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7112	return DRFLAC_FALSE;
7113	}
7114	oggbs->currentBytePos += bytesRead;
7115
7116	pageBodySize = drflac_ogg__get_page_body_size(&header);
7117	if (pageBodySize > DRFLAC_OGG_MAX_PAGE_SIZE) {
7118	continue; / Invalid page size. Assume it's corrupted and just move to the next page. /
7119	}
7120
7121	if (header.serialNumber != oggbs->serialNumber) {
7122	/ It's not a FLAC page. Skip it. /
7123	if (pageBodySize > `0` && !drflac_oggbs__seek_physical(oggbs, pageBodySize, drflac_seek_origin_current)) {
7124	return DRFLAC_FALSE;
7125	}
7126	continue;
7127	}
7128
7129
7130	/ We need to read the entire page and then do a CRC check on it. If there's a CRC mismatch we need to skip this page. /
7131	if (drflac_oggbs__read_physical(oggbs, oggbs->pageData, pageBodySize) != pageBodySize) {
7132	return DRFLAC_FALSE;
7133	}
7134	oggbs->pageDataSize = pageBodySize;
7135
7136	#ifndef DR_FLAC_NO_CRC
7137	actualCRC32 = drflac_crc32_buffer(crc32, oggbs->pageData, oggbs->pageDataSize);
7138	if (actualCRC32 != header.checksum) {
7139	if (recoveryMethod == drflac_ogg_recover_on_crc_mismatch) {
7140	continue; / CRC mismatch. Skip this page. /
7141	} else {
7142	/*
7143	Even though we are failing on a CRC mismatch, we still want our stream to be in a good state. Therefore we
7144	go to the next valid page to ensure we're in a good state, but return false to let the caller know that the
7145	seek did not fully complete.
7146	*/
7147	drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch);
7148	return DRFLAC_FALSE;
7149	}
7150	}
7151	#else
7152	(void)recoveryMethod; / <-- Silence a warning. /
7153	#endif
7154
7155	oggbs->currentPageHeader = header;
7156	oggbs->bytesRemainingInPage = pageBodySize;
7157	return DRFLAC_TRUE;
7158	}
7159	}
7160
7161	/ Function below is unused at the moment, but I might be re-adding it later. /
7162	#if 0
7163	static drflac_uint8 drflac_oggbs__get_current_segment_index(drflac_oggbs* oggbs, drflac_uint8* pBytesRemainingInSeg)
7164	{
7165	drflac_uint32 bytesConsumedInPage = drflac_ogg__get_page_body_size(&oggbs->currentPageHeader) - oggbs->bytesRemainingInPage;
7166	drflac_uint8 iSeg = `0`;
7167	drflac_uint32 iByte = `0`;
7168	while (iByte < bytesConsumedInPage) {
7169	drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
7170	if (iByte + segmentSize > bytesConsumedInPage) {
7171	break;
7172	} else {
7173	iSeg += `1`;
7174	iByte += segmentSize;
7175	}
7176	}
7177
7178	*pBytesRemainingInSeg = oggbs->currentPageHeader.segmentTable[iSeg] - (drflac_uint8)(bytesConsumedInPage - iByte);
7179	return iSeg;
7180	}
7181
7182	static drflac_bool32 drflac_oggbs__seek_to_next_packet(drflac_oggbs* oggbs)
7183	{
7184	/ The current packet ends when we get to the segment with a lacing value of < 255 which is not at the end of a page. /
7185	for (;;) {
7186	drflac_bool32 atEndOfPage = DRFLAC_FALSE;
7187
7188	drflac_uint8 bytesRemainingInSeg;
7189	drflac_uint8 iFirstSeg = drflac_oggbs__get_current_segment_index(oggbs, &bytesRemainingInSeg);
7190
7191	drflac_uint32 bytesToEndOfPacketOrPage = bytesRemainingInSeg;
7192	for (drflac_uint8 iSeg = iFirstSeg; iSeg < oggbs->currentPageHeader.segmentCount; ++iSeg) {
7193	drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
7194	if (segmentSize < `255`) {
7195	if (iSeg == oggbs->currentPageHeader.segmentCount-`1`) {
7196	atEndOfPage = DRFLAC_TRUE;
7197	}
7198
7199	break;
7200	}
7201
7202	bytesToEndOfPacketOrPage += segmentSize;
7203	}
7204
7205	/*
7206	At this point we will have found either the packet or the end of the page. If were at the end of the page we'll
7207	want to load the next page and keep searching for the end of the packet.
7208	*/
7209	drflac_oggbs__seek_physical(oggbs, bytesToEndOfPacketOrPage, drflac_seek_origin_current);
7210	oggbs->bytesRemainingInPage -= bytesToEndOfPacketOrPage;
7211
7212	if (atEndOfPage) {
7213	/*
7214	We're potentially at the next packet, but we need to check the next page first to be sure because the packet may
7215	straddle pages.
7216	*/
7217	if (!drflac_oggbs__goto_next_page(oggbs)) {
7218	return DRFLAC_FALSE;
7219	}
7220
7221	/ If it's a fresh packet it most likely means we're at the next packet. /
7222	if ((oggbs->currentPageHeader.headerType & `0x01`) == `0`) {
7223	return DRFLAC_TRUE;
7224	}
7225	} else {
7226	/ We're at the next packet. /
7227	return DRFLAC_TRUE;
7228	}
7229	}
7230	}
7231
7232	static drflac_bool32 drflac_oggbs__seek_to_next_frame(drflac_oggbs* oggbs)
7233	{
7234	/ The bitstream should be sitting on the first byte just after the header of the frame. /
7235
7236	/ What we're actually doing here is seeking to the start of the next packet. /
7237	return drflac_oggbs__seek_to_next_packet(oggbs);
7238	}
7239	#endif
7240
7241	static size_t drflac__on_read_ogg(void* pUserData, void* bufferOut, size_t bytesToRead)
7242	{
7243	drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
7244	drflac_uint8* pRunningBufferOut = (drflac_uint8*)bufferOut;
7245	size_t bytesRead = `0`;
7246
7247	DRFLAC_ASSERT(oggbs != NULL);
7248	DRFLAC_ASSERT(pRunningBufferOut != NULL);
7249
7250	/ Reading is done page-by-page. If we've run out of bytes in the page we need to move to the next one. /
7251	while (bytesRead < bytesToRead) {
7252	size_t bytesRemainingToRead = bytesToRead - bytesRead;
7253
7254	if (oggbs->bytesRemainingInPage >= bytesRemainingToRead) {
7255	DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), bytesRemainingToRead);
7256	bytesRead += bytesRemainingToRead;
7257	oggbs->bytesRemainingInPage -= (drflac_uint32)bytesRemainingToRead;
7258	break;
7259	}
7260
7261	/ If we get here it means some of the requested data is contained in the next pages. /
7262	if (oggbs->bytesRemainingInPage > `0`) {
7263	DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), oggbs->bytesRemainingInPage);
7264	bytesRead += oggbs->bytesRemainingInPage;
7265	pRunningBufferOut += oggbs->bytesRemainingInPage;
7266	oggbs->bytesRemainingInPage = `0`;
7267	}
7268
7269	DRFLAC_ASSERT(bytesRemainingToRead > `0`);
7270	if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7271	break; / Failed to go to the next page. Might have simply hit the end of the stream. /
7272	}
7273	}
7274
7275	return bytesRead;
7276	}
7277
7278	static drflac_bool32 drflac__on_seek_ogg(void* pUserData, int offset, drflac_seek_origin origin)
7279	{
7280	drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
7281	int bytesSeeked = `0`;
7282
7283	DRFLAC_ASSERT(oggbs != NULL);
7284	DRFLAC_ASSERT(offset >= `0`); / <-- Never seek backwards. /
7285
7286	/ Seeking is always forward which makes things a lot simpler. /
7287	if (origin == drflac_seek_origin_start) {
7288	if (!drflac_oggbs__seek_physical(oggbs, (int)oggbs->firstBytePos, drflac_seek_origin_start)) {
7289	return DRFLAC_FALSE;
7290	}
7291
7292	if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
7293	return DRFLAC_FALSE;
7294	}
7295
7296	return drflac__on_seek_ogg(pUserData, offset, drflac_seek_origin_current);
7297	}
7298
7299	DRFLAC_ASSERT(origin == drflac_seek_origin_current);
7300
7301	while (bytesSeeked < offset) {
7302	int bytesRemainingToSeek = offset - bytesSeeked;
7303	DRFLAC_ASSERT(bytesRemainingToSeek >= `0`);
7304
7305	if (oggbs->bytesRemainingInPage >= (size_t)bytesRemainingToSeek) {
7306	bytesSeeked += bytesRemainingToSeek;
7307	(void)bytesSeeked; / <-- Silence a dead store warning emitted by Clang Static Analyzer. /
7308	oggbs->bytesRemainingInPage -= bytesRemainingToSeek;
7309	break;
7310	}
7311
7312	/ If we get here it means some of the requested data is contained in the next pages. /
7313	if (oggbs->bytesRemainingInPage > `0`) {
7314	bytesSeeked += (int)oggbs->bytesRemainingInPage;
7315	oggbs->bytesRemainingInPage = `0`;
7316	}
7317
7318	DRFLAC_ASSERT(bytesRemainingToSeek > `0`);
7319	if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
7320	/ Failed to go to the next page. We either hit the end of the stream or had a CRC mismatch. /
7321	return DRFLAC_FALSE;
7322	}
7323	}
7324
7325	return DRFLAC_TRUE;
7326	}
7327
7328
7329	static drflac_bool32 drflac_ogg__seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
7330	{
7331	drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
7332	drflac_uint64 originalBytePos;
7333	drflac_uint64 runningGranulePosition;
7334	drflac_uint64 runningFrameBytePos;
7335	drflac_uint64 runningPCMFrameCount;
7336
7337	DRFLAC_ASSERT(oggbs != NULL);
7338
7339	originalBytePos = oggbs->currentBytePos; / For recovery. Points to the OggS identifier. /
7340
7341	/ First seek to the first frame. /
7342	if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes)) {
7343	return DRFLAC_FALSE;
7344	}
7345	oggbs->bytesRemainingInPage = `0`;
7346
7347	runningGranulePosition = `0`;
7348	for (;;) {
7349	if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7350	drflac_oggbs__seek_physical(oggbs, originalBytePos, drflac_seek_origin_start);
7351	return DRFLAC_FALSE; / Never did find that sample... /
7352	}
7353
7354	runningFrameBytePos = oggbs->currentBytePos - drflac_ogg__get_page_header_size(&oggbs->currentPageHeader) - oggbs->pageDataSize;
7355	if (oggbs->currentPageHeader.granulePosition >= pcmFrameIndex) {
7356	break; / The sample is somewhere in the previous page. /
7357	}
7358
7359	/*
7360	At this point we know the sample is not in the previous page. It could possibly be in this page. For simplicity we
7361	disregard any pages that do not begin a fresh packet.
7362	*/
7363	if ((oggbs->currentPageHeader.headerType & `0x01`) == `0`) { / <-- Is it a fresh page? /
7364	if (oggbs->currentPageHeader.segmentTable[`0`] >= `2`) {
7365	drflac_uint8 firstBytesInPage[`2`];
7366	firstBytesInPage[`0`] = oggbs->pageData[`0`];
7367	firstBytesInPage[`1`] = oggbs->pageData[`1`];
7368
7369	if ((firstBytesInPage[`0`] == `0xFF`) && (firstBytesInPage[`1`] & `0xFC`) == `0xF8`) { / <-- Does the page begin with a frame's sync code? /
7370	runningGranulePosition = oggbs->currentPageHeader.granulePosition;
7371	}
7372
7373	continue;
7374	}
7375	}
7376	}
7377
7378	/*
7379	We found the page that that is closest to the sample, so now we need to find it. The first thing to do is seek to the
7380	start of that page. In the loop above we checked that it was a fresh page which means this page is also the start of
7381	a new frame. This property means that after we've seeked to the page we can immediately start looping over frames until
7382	we find the one containing the target sample.
7383	*/
7384	if (!drflac_oggbs__seek_physical(oggbs, runningFrameBytePos, drflac_seek_origin_start)) {
7385	return DRFLAC_FALSE;
7386	}
7387	if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7388	return DRFLAC_FALSE;
7389	}
7390
7391	/*
7392	At this point we'll be sitting on the first byte of the frame header of the first frame in the page. We just keep
7393	looping over these frames until we find the one containing the sample we're after.
7394	*/
7395	runningPCMFrameCount = runningGranulePosition;
7396	for (;;) {
7397	/*
7398	There are two ways to find the sample and seek past irrelevant frames:
7399	1) Use the native FLAC decoder.
7400	2) Use Ogg's framing system.
7401
7402	Both of these options have their own pros and cons. Using the native FLAC decoder is slower because it needs to
7403	do a full decode of the frame. Using Ogg's framing system is faster, but more complicated and involves some code
7404	duplication for the decoding of frame headers.
7405
7406	Another thing to consider is that using the Ogg framing system will perform direct seeking of the physical Ogg
7407	bitstream. This is important to consider because it means we cannot read data from the drflac_bs object using the
7408	standard drflac__() APIs because that will read in extra data for its own internal caching which in turn breaks*
7409	the positioning of the read pointer of the physical Ogg bitstream. Therefore, anything that would normally be read
7410	using the native FLAC decoding APIs, such as drflac__read_next_flac_frame_header(), need to be re-implemented so as to
7411	avoid the use of the drflac_bs object.
7412
7413	Considering these issues, I have decided to use the slower native FLAC decoding method for the following reasons:
7414	1) Seeking is already partially accelerated using Ogg's paging system in the code block above.
7415	2) Seeking in an Ogg encapsulated FLAC stream is probably quite uncommon.
7416	3) Simplicity.
7417	*/
7418	drflac_uint64 firstPCMFrameInFLACFrame = `0`;
7419	drflac_uint64 lastPCMFrameInFLACFrame = `0`;
7420	drflac_uint64 pcmFrameCountInThisFrame;
7421
7422	if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
7423	return DRFLAC_FALSE;
7424	}
7425
7426	drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
7427
7428	pcmFrameCountInThisFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + `1`;
7429
7430	/ If we are seeking to the end of the file and we've just hit it, we're done. /
7431	if (pcmFrameIndex == pFlac->totalPCMFrameCount && (runningPCMFrameCount + pcmFrameCountInThisFrame) == pFlac->totalPCMFrameCount) {
7432	drflac_result result = drflac__decode_flac_frame(pFlac);
7433	if (result == DRFLAC_SUCCESS) {
7434	pFlac->currentPCMFrame = pcmFrameIndex;
7435	pFlac->currentFLACFrame.pcmFramesRemaining = `0`;
7436	return DRFLAC_TRUE;
7437	} else {
7438	return DRFLAC_FALSE;
7439	}
7440	}
7441
7442	if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFrame)) {
7443	/*
7444	The sample should be in this FLAC frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
7445	it never existed and keep iterating.
7446	*/
7447	drflac_result result = drflac__decode_flac_frame(pFlac);
7448	if (result == DRFLAC_SUCCESS) {
7449	/ The frame is valid. We just need to skip over some samples to ensure it's sample-exact. /
7450	drflac_uint64 pcmFramesToDecode = (size_t)(pcmFrameIndex - runningPCMFrameCount); / <-- Safe cast because the maximum number of samples in a frame is 65535. /
7451	if (pcmFramesToDecode == `0`) {
7452	return DRFLAC_TRUE;
7453	}
7454
7455	pFlac->currentPCMFrame = runningPCMFrameCount;
7456
7457	return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; / <-- If this fails, something bad has happened (it should never fail). /
7458	} else {
7459	if (result == DRFLAC_CRC_MISMATCH) {
7460	continue; / CRC mismatch. Pretend this frame never existed. /
7461	} else {
7462	return DRFLAC_FALSE;
7463	}
7464	}
7465	} else {
7466	/*
7467	It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
7468	frame never existed and leave the running sample count untouched.
7469	*/
7470	drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
7471	if (result == DRFLAC_SUCCESS) {
7472	runningPCMFrameCount += pcmFrameCountInThisFrame;
7473	} else {
7474	if (result == DRFLAC_CRC_MISMATCH) {
7475	continue; / CRC mismatch. Pretend this frame never existed. /
7476	} else {
7477	return DRFLAC_FALSE;
7478	}
7479	}
7480	}
7481	}
7482	}
7483
7484
7485
7486	static drflac_bool32 drflac__init_private__ogg(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
7487	{
7488	drflac_ogg_page_header header;
7489	drflac_uint32 crc32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
7490	drflac_uint32 bytesRead = `0`;
7491
7492	/ Pre Condition: The bit stream should be sitting just past the 4-byte OggS capture pattern. /
7493	(void)relaxed;
7494
7495	pInit->container = drflac_container_ogg;
7496	pInit->oggFirstBytePos = `0`;
7497
7498	/*
7499	We'll get here if the first 4 bytes of the stream were the OggS capture pattern, however it doesn't necessarily mean the
7500	stream includes FLAC encoded audio. To check for this we need to scan the beginning-of-stream page markers and check if
7501	any match the FLAC specification. Important to keep in mind that the stream may be multiplexed.
7502	*/
7503	if (drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7504	return DRFLAC_FALSE;
7505	}
7506	pInit->runningFilePos += bytesRead;
7507
7508	for (;;) {
7509	int pageBodySize;
7510
7511	/ Break if we're past the beginning of stream page. /
7512	if ((header.headerType & `0x02`) == `0`) {
7513	return DRFLAC_FALSE;
7514	}
7515
7516	/ Check if it's a FLAC header. /
7517	pageBodySize = drflac_ogg__get_page_body_size(&header);
7518	if (pageBodySize == `51`) { / 51 = the lacing value of the FLAC header packet. /
7519	/ It could be a FLAC page... /
7520	drflac_uint32 bytesRemainingInPage = pageBodySize;
7521	drflac_uint8 packetType;
7522
7523	if (onRead(pUserData, &packetType, `1`) != `1`) {
7524	return DRFLAC_FALSE;
7525	}
7526
7527	bytesRemainingInPage -= `1`;
7528	if (packetType == `0x7F`) {
7529	/ Increasingly more likely to be a FLAC page... /
7530	drflac_uint8 sig[`4`];
7531	if (onRead(pUserData, sig, `4`) != `4`) {
7532	return DRFLAC_FALSE;
7533	}
7534
7535	bytesRemainingInPage -= `4`;
7536	if (sig[`0`] == `'F'` && sig[`1`] == `'L'` && sig[`2`] == `'A'` && sig[`3`] == `'C'`) {
7537	/ Almost certainly a FLAC page... /
7538	drflac_uint8 mappingVersion[`2`];
7539	if (onRead(pUserData, mappingVersion, `2`) != `2`) {
7540	return DRFLAC_FALSE;
7541	}
7542
7543	if (mappingVersion[`0`] != `1`) {
7544	return DRFLAC_FALSE; / Only supporting version 1.x of the Ogg mapping. /
7545	}
7546
7547	/*
7548	The next 2 bytes are the non-audio packets, not including this one. We don't care about this because we're going to
7549	be handling it in a generic way based on the serial number and packet types.
7550	*/
7551	if (!onSeek(pUserData, `2`, drflac_seek_origin_current)) {
7552	return DRFLAC_FALSE;
7553	}
7554
7555	/ Expecting the native FLAC signature "fLaC". /
7556	if (onRead(pUserData, sig, `4`) != `4`) {
7557	return DRFLAC_FALSE;
7558	}
7559
7560	if (sig[`0`] == `'f'` && sig[`1`] == `'L'` && sig[`2`] == `'a'` && sig[`3`] == `'C'`) {
7561	/ The remaining data in the page should be the STREAMINFO block. /
7562	drflac_streaminfo streaminfo;
7563	drflac_uint8 isLastBlock;
7564	drflac_uint8 blockType;
7565	drflac_uint32 blockSize;
7566	if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
7567	return DRFLAC_FALSE;
7568	}
7569
7570	if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO \|\| blockSize != `34`) {
7571	return DRFLAC_FALSE; / Invalid block type. First block must be the STREAMINFO block. /
7572	}
7573
7574	if (drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
7575	/ Success! /
7576	pInit->hasStreamInfoBlock = DRFLAC_TRUE;
7577	pInit->sampleRate = streaminfo.sampleRate;
7578	pInit->channels = streaminfo.channels;
7579	pInit->bitsPerSample = streaminfo.bitsPerSample;
7580	pInit->totalPCMFrameCount = streaminfo.totalPCMFrameCount;
7581	pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames;
7582	pInit->hasMetadataBlocks = !isLastBlock;
7583
7584	if (onMeta) {
7585	drflac_metadata metadata;
7586	metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
7587	metadata.pRawData = NULL;
7588	metadata.rawDataSize = `0`;
7589	metadata.data.streaminfo = streaminfo;
7590	onMeta(pUserDataMD, &metadata);
7591	}
7592
7593	pInit->runningFilePos += pageBodySize;
7594	pInit->oggFirstBytePos = pInit->runningFilePos - `79`; / Subtracting 79 will place us right on top of the "OggS" identifier of the FLAC bos page. /
7595	pInit->oggSerial = header.serialNumber;
7596	pInit->oggBosHeader = header;
7597	break;
7598	} else {
7599	/ Failed to read STREAMINFO block. Aww, so close... /
7600	return DRFLAC_FALSE;
7601	}
7602	} else {
7603	/ Invalid file. /
7604	return DRFLAC_FALSE;
7605	}
7606	} else {
7607	/ Not a FLAC header. Skip it. /
7608	if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
7609	return DRFLAC_FALSE;
7610	}
7611	}
7612	} else {
7613	/ Not a FLAC header. Seek past the entire page and move on to the next. /
7614	if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
7615	return DRFLAC_FALSE;
7616	}
7617	}
7618	} else {
7619	if (!onSeek(pUserData, pageBodySize, drflac_seek_origin_current)) {
7620	return DRFLAC_FALSE;
7621	}
7622	}
7623
7624	pInit->runningFilePos += pageBodySize;
7625
7626
7627	/ Read the header of the next page. /
7628	if (drflac_ogg__read_page_header(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7629	return DRFLAC_FALSE;
7630	}
7631	pInit->runningFilePos += bytesRead;
7632	}
7633
7634	/*
7635	If we get here it means we found a FLAC audio stream. We should be sitting on the first byte of the header of the next page. The next
7636	packets in the FLAC logical stream contain the metadata. The only thing left to do in the initialization phase for Ogg is to create the
7637	Ogg bistream object.
7638	*/
7639	pInit->hasMetadataBlocks = DRFLAC_TRUE; / <-- Always have at least VORBIS_COMMENT metadata block. /
7640	return DRFLAC_TRUE;
7641	}
7642	#endif
7643
7644	static drflac_bool32 drflac__init_private(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD)
7645	{
7646	drflac_bool32 relaxed;
7647	drflac_uint8 id[`4`];
7648
7649	if (pInit == NULL \|\| onRead == NULL \|\| onSeek == NULL) {
7650	return DRFLAC_FALSE;
7651	}
7652
7653	DRFLAC_ZERO_MEMORY(pInit, sizeof(*pInit));
7654	pInit->onRead = onRead;
7655	pInit->onSeek = onSeek;
7656	pInit->onMeta = onMeta;
7657	pInit->container = container;
7658	pInit->pUserData = pUserData;
7659	pInit->pUserDataMD = pUserDataMD;
7660
7661	pInit->bs.onRead = onRead;
7662	pInit->bs.onSeek = onSeek;
7663	pInit->bs.pUserData = pUserData;
7664	drflac__reset_cache(&pInit->bs);
7665
7666
7667	/ If the container is explicitly defined then we can try opening in relaxed mode. /
7668	relaxed = container != drflac_container_unknown;
7669
7670	/ Skip over any ID3 tags. /
7671	for (;;) {
7672	if (onRead(pUserData, id, `4`) != `4`) {
7673	return DRFLAC_FALSE; / Ran out of data. /
7674	}
7675	pInit->runningFilePos += `4`;
7676
7677	if (id[`0`] == `'I'` && id[`1`] == `'D'` && id[`2`] == `'3'`) {
7678	drflac_uint8 header[`6`];
7679	drflac_uint8 flags;
7680	drflac_uint32 headerSize;
7681
7682	if (onRead(pUserData, header, `6`) != `6`) {
7683	return DRFLAC_FALSE; / Ran out of data. /
7684	}
7685	pInit->runningFilePos += `6`;
7686
7687	flags = header[`1`];
7688
7689	DRFLAC_COPY_MEMORY(&headerSize, header+`2`, `4`);
7690	headerSize = drflac__unsynchsafe_32(drflac__be2host_32(headerSize));
7691	if (flags & `0x10`) {
7692	headerSize += `10`;
7693	}
7694
7695	if (!onSeek(pUserData, headerSize, drflac_seek_origin_current)) {
7696	return DRFLAC_FALSE; / Failed to seek past the tag. /
7697	}
7698	pInit->runningFilePos += headerSize;
7699	} else {
7700	break;
7701	}
7702	}
7703
7704	if (id[`0`] == `'f'` && id[`1`] == `'L'` && id[`2`] == `'a'` && id[`3`] == `'C'`) {
7705	return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7706	}
7707	#ifndef DR_FLAC_NO_OGG
7708	if (id[`0`] == `'O'` && id[`1`] == `'g'` && id[`2`] == `'g'` && id[`3`] == `'S'`) {
7709	return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7710	}
7711	#endif
7712
7713	/ If we get here it means we likely don't have a header. Try opening in relaxed mode, if applicable. /
7714	if (relaxed) {
7715	if (container == drflac_container_native) {
7716	return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7717	}
7718	#ifndef DR_FLAC_NO_OGG
7719	if (container == drflac_container_ogg) {
7720	return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7721	}
7722	#endif
7723	}
7724
7725	/ Unsupported container. /
7726	return DRFLAC_FALSE;
7727	}
7728
7729	static void drflac__init_from_info(drflac* pFlac, const drflac_init_info* pInit)
7730	{
7731	DRFLAC_ASSERT(pFlac != NULL);
7732	DRFLAC_ASSERT(pInit != NULL);
7733
7734	DRFLAC_ZERO_MEMORY(pFlac, sizeof(*pFlac));
7735	pFlac->bs = pInit->bs;
7736	pFlac->onMeta = pInit->onMeta;
7737	pFlac->pUserDataMD = pInit->pUserDataMD;
7738	pFlac->maxBlockSizeInPCMFrames = pInit->maxBlockSizeInPCMFrames;
7739	pFlac->sampleRate = pInit->sampleRate;
7740	pFlac->channels = (drflac_uint8)pInit->channels;
7741	pFlac->bitsPerSample = (drflac_uint8)pInit->bitsPerSample;
7742	pFlac->totalPCMFrameCount = pInit->totalPCMFrameCount;
7743	pFlac->container = pInit->container;
7744	}
7745
7746
7747	static drflac* drflac_open_with_metadata_private(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD, const drflac_allocation_callbacks* pAllocationCallbacks)
7748	{
7749	drflac_init_info init;
7750	drflac_uint32 allocationSize;
7751	drflac_uint32 wholeSIMDVectorCountPerChannel;
7752	drflac_uint32 decodedSamplesAllocationSize;
7753	#ifndef DR_FLAC_NO_OGG
7754	drflac_oggbs oggbs;
7755	#endif
7756	drflac_uint64 firstFramePos;
7757	drflac_uint64 seektablePos;
7758	drflac_uint32 seektableSize;
7759	drflac_allocation_callbacks allocationCallbacks;
7760	drflac* pFlac;
7761
7762	/ CPU support first. /
7763	drflac__init_cpu_caps();
7764
7765	if (!drflac__init_private(&init, onRead, onSeek, onMeta, container, pUserData, pUserDataMD)) {
7766	return NULL;
7767	}
7768
7769	if (pAllocationCallbacks != NULL) {
7770	allocationCallbacks = *pAllocationCallbacks;
7771	if (allocationCallbacks.onFree == NULL \|\| (allocationCallbacks.onMalloc == NULL && allocationCallbacks.onRealloc == NULL)) {
7772	return NULL; / Invalid allocation callbacks. /
7773	}
7774	} else {
7775	allocationCallbacks.pUserData = NULL;
7776	allocationCallbacks.onMalloc = drflac__malloc_default;
7777	allocationCallbacks.onRealloc = drflac__realloc_default;
7778	allocationCallbacks.onFree = drflac__free_default;
7779	}
7780
7781
7782	/*
7783	The size of the allocation for the drflac object needs to be large enough to fit the following:
7784	1) The main members of the drflac structure
7785	2) A block of memory large enough to store the decoded samples of the largest frame in the stream
7786	3) If the container is Ogg, a drflac_oggbs object
7787
7788	The complicated part of the allocation is making sure there's enough room the decoded samples, taking into consideration
7789	the different SIMD instruction sets.
7790	*/
7791	allocationSize = sizeof(drflac);
7792
7793	/*
7794	The allocation size for decoded frames depends on the number of 32-bit integers that fit inside the largest SIMD vector
7795	we are supporting.
7796	*/
7797	if ((init.maxBlockSizeInPCMFrames % (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) == `0`) {
7798	wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32)));
7799	} else {
7800	wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) + `1`;
7801	}
7802
7803	decodedSamplesAllocationSize = wholeSIMDVectorCountPerChannel * DRFLAC_MAX_SIMD_VECTOR_SIZE * init.channels;
7804
7805	allocationSize += decodedSamplesAllocationSize;
7806	allocationSize += DRFLAC_MAX_SIMD_VECTOR_SIZE; / Allocate extra bytes to ensure we have enough for alignment. /
7807
7808	#ifndef DR_FLAC_NO_OGG
7809	/ There's additional data required for Ogg streams. /
7810	if (init.container == drflac_container_ogg) {
7811	allocationSize += sizeof(drflac_oggbs);
7812	}
7813
7814	DRFLAC_ZERO_MEMORY(&oggbs, sizeof(oggbs));
7815	if (init.container == drflac_container_ogg) {
7816	oggbs.onRead = onRead;
7817	oggbs.onSeek = onSeek;
7818	oggbs.pUserData = pUserData;
7819	oggbs.currentBytePos = init.oggFirstBytePos;
7820	oggbs.firstBytePos = init.oggFirstBytePos;
7821	oggbs.serialNumber = init.oggSerial;
7822	oggbs.bosPageHeader = init.oggBosHeader;
7823	oggbs.bytesRemainingInPage = `0`;
7824	}
7825	#endif
7826
7827	/*
7828	This part is a bit awkward. We need to load the seektable so that it can be referenced in-memory, but I want the drflac object to
7829	consist of only a single heap allocation. To this, the size of the seek table needs to be known, which we determine when reading
7830	and decoding the metadata.
7831	*/
7832	firstFramePos = `42`; / <-- We know we are at byte 42 at this point. /
7833	seektablePos = `0`;
7834	seektableSize = `0`;
7835	if (init.hasMetadataBlocks) {
7836	drflac_read_proc onReadOverride = onRead;
7837	drflac_seek_proc onSeekOverride = onSeek;
7838	void* pUserDataOverride = pUserData;
7839
7840	#ifndef DR_FLAC_NO_OGG
7841	if (init.container == drflac_container_ogg) {
7842	onReadOverride = drflac__on_read_ogg;
7843	onSeekOverride = drflac__on_seek_ogg;
7844	pUserDataOverride = (void*)&oggbs;
7845	}
7846	#endif
7847
7848	if (!drflac__read_and_decode_metadata(onReadOverride, onSeekOverride, onMeta, pUserDataOverride, pUserDataMD, &firstFramePos, &seektablePos, &seektableSize, &allocationCallbacks)) {
7849	return NULL;
7850	}
7851
7852	allocationSize += seektableSize;
7853	}
7854
7855
7856	pFlac = (drflac*)drflac__malloc_from_callbacks(allocationSize, &allocationCallbacks);
7857	if (pFlac == NULL) {
7858	return NULL;
7859	}
7860
7861	drflac__init_from_info(pFlac, &init);
7862	pFlac->allocationCallbacks = allocationCallbacks;
7863	pFlac->pDecodedSamples = (drflac_int32*)drflac_align((size_t)pFlac->pExtraData, DRFLAC_MAX_SIMD_VECTOR_SIZE);
7864
7865	#ifndef DR_FLAC_NO_OGG
7866	if (init.container == drflac_container_ogg) {
7867	drflac_oggbs* pInternalOggbs = (drflac_oggbs)((drflac_uint8)pFlac->pDecodedSamples + decodedSamplesAllocationSize + seektableSize);
7868	*pInternalOggbs = oggbs;
7869
7870	/ The Ogg bistream needs to be layered on top of the original bitstream. /
7871	pFlac->bs.onRead = drflac__on_read_ogg;
7872	pFlac->bs.onSeek = drflac__on_seek_ogg;
7873	pFlac->bs.pUserData = (void*)pInternalOggbs;
7874	pFlac->_oggbs = (void*)pInternalOggbs;
7875	}
7876	#endif
7877
7878	pFlac->firstFLACFramePosInBytes = firstFramePos;
7879
7880	/ NOTE: Seektables are not currently compatible with Ogg encapsulation (Ogg has its own accelerated seeking system). I may change this later, so I'm leaving this here for now. /
7881	#ifndef DR_FLAC_NO_OGG
7882	if (init.container == drflac_container_ogg)
7883	{
7884	pFlac->pSeekpoints = NULL;
7885	pFlac->seekpointCount = `0`;
7886	}
7887	else
7888	#endif
7889	{
7890	/ If we have a seektable we need to load it now, making sure we move back to where we were previously. /
7891	if (seektablePos != `0`) {
7892	pFlac->seekpointCount = seektableSize / sizeof(*pFlac->pSeekpoints);
7893	pFlac->pSeekpoints = (drflac_seekpoint)((drflac_uint8)pFlac->pDecodedSamples + decodedSamplesAllocationSize);
7894
7895	DRFLAC_ASSERT(pFlac->bs.onSeek != NULL);
7896	DRFLAC_ASSERT(pFlac->bs.onRead != NULL);
7897
7898	/ Seek to the seektable, then just read directly into our seektable buffer. /
7899	if (pFlac->bs.onSeek(pFlac->bs.pUserData, (int)seektablePos, drflac_seek_origin_start)) {
7900	if (pFlac->bs.onRead(pFlac->bs.pUserData, pFlac->pSeekpoints, seektableSize) == seektableSize) {
7901	/ Endian swap. /
7902	drflac_uint32 iSeekpoint;
7903	for (iSeekpoint = `0`; iSeekpoint < pFlac->seekpointCount; ++iSeekpoint) {
7904	pFlac->pSeekpoints[iSeekpoint].firstPCMFrame = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].firstPCMFrame);
7905	pFlac->pSeekpoints[iSeekpoint].flacFrameOffset = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].flacFrameOffset);
7906	pFlac->pSeekpoints[iSeekpoint].pcmFrameCount = drflac__be2host_16(pFlac->pSeekpoints[iSeekpoint].pcmFrameCount);
7907	}
7908	} else {
7909	/ Failed to read the seektable. Pretend we don't have one. /
7910	pFlac->pSeekpoints = NULL;
7911	pFlac->seekpointCount = `0`;
7912	}
7913
7914	/ We need to seek back to where we were. If this fails it's a critical error. /
7915	if (!pFlac->bs.onSeek(pFlac->bs.pUserData, (int)pFlac->firstFLACFramePosInBytes, drflac_seek_origin_start)) {
7916	drflac__free_from_callbacks(pFlac, &allocationCallbacks);
7917	return NULL;
7918	}
7919	} else {
7920	/ Failed to seek to the seektable. Ominous sign, but for now we can just pretend we don't have one. /
7921	pFlac->pSeekpoints = NULL;
7922	pFlac->seekpointCount = `0`;
7923	}
7924	}
7925	}
7926
7927
7928	/*
7929	If we get here, but don't have a STREAMINFO block, it means we've opened the stream in relaxed mode and need to decode
7930	the first frame.
7931	*/
7932	if (!init.hasStreamInfoBlock) {
7933	pFlac->currentFLACFrame.header = init.firstFrameHeader;
7934	for (;;) {
7935	drflac_result result = drflac__decode_flac_frame(pFlac);
7936	if (result == DRFLAC_SUCCESS) {
7937	break;
7938	} else {
7939	if (result == DRFLAC_CRC_MISMATCH) {
7940	if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
7941	drflac__free_from_callbacks(pFlac, &allocationCallbacks);
7942	return NULL;
7943	}
7944	continue;
7945	} else {
7946	drflac__free_from_callbacks(pFlac, &allocationCallbacks);
7947	return NULL;
7948	}
7949	}
7950	}
7951	}
7952
7953	return pFlac;
7954	}
7955
7956
7957
7958	#ifndef DR_FLAC_NO_STDIO
7959	#include <stdio.h>
7960	#include <wchar.h> /* For wcslen(), wcsrtombs() */
7961
7962	/ drflac_result_from_errno() is only used for fopen() and wfopen() so putting it inside DR_WAV_NO_STDIO for now. If something else needs this later we can move it out. /
7963	#include <errno.h>
7964	static drflac_result drflac_result_from_errno(int e)
7965	{
7966	switch (e)
7967	{
7968	case `0`: return DRFLAC_SUCCESS;
7969	#ifdef EPERM
7970	case EPERM: return DRFLAC_INVALID_OPERATION;
7971	#endif
7972	#ifdef ENOENT
7973	case ENOENT: return DRFLAC_DOES_NOT_EXIST;
7974	#endif
7975	#ifdef ESRCH
7976	case ESRCH: return DRFLAC_DOES_NOT_EXIST;
7977	#endif
7978	#ifdef EINTR
7979	case EINTR: return DRFLAC_INTERRUPT;
7980	#endif
7981	#ifdef EIO
7982	case EIO: return DRFLAC_IO_ERROR;
7983	#endif
7984	#ifdef ENXIO
7985	case ENXIO: return DRFLAC_DOES_NOT_EXIST;
7986	#endif
7987	#ifdef E2BIG
7988	case E2BIG: return DRFLAC_INVALID_ARGS;
7989	#endif
7990	#ifdef ENOEXEC
7991	case ENOEXEC: return DRFLAC_INVALID_FILE;
7992	#endif
7993	#ifdef EBADF
7994	case EBADF: return DRFLAC_INVALID_FILE;
7995	#endif
7996	#ifdef ECHILD
7997	case ECHILD: return DRFLAC_ERROR;
7998	#endif
7999	#ifdef EAGAIN
8000	case EAGAIN: return DRFLAC_UNAVAILABLE;
8001	#endif
8002	#ifdef ENOMEM
8003	case ENOMEM: return DRFLAC_OUT_OF_MEMORY;
8004	#endif
8005	#ifdef EACCES
8006	case EACCES: return DRFLAC_ACCESS_DENIED;
8007	#endif
8008	#ifdef EFAULT
8009	case EFAULT: return DRFLAC_BAD_ADDRESS;
8010	#endif
8011	#ifdef ENOTBLK
8012	case ENOTBLK: return DRFLAC_ERROR;
8013	#endif
8014	#ifdef EBUSY
8015	case EBUSY: return DRFLAC_BUSY;
8016	#endif
8017	#ifdef EEXIST
8018	case EEXIST: return DRFLAC_ALREADY_EXISTS;
8019	#endif
8020	#ifdef EXDEV
8021	case EXDEV: return DRFLAC_ERROR;
8022	#endif
8023	#ifdef ENODEV
8024	case ENODEV: return DRFLAC_DOES_NOT_EXIST;
8025	#endif
8026	#ifdef ENOTDIR
8027	case ENOTDIR: return DRFLAC_NOT_DIRECTORY;
8028	#endif
8029	#ifdef EISDIR
8030	case EISDIR: return DRFLAC_IS_DIRECTORY;
8031	#endif
8032	#ifdef EINVAL
8033	case EINVAL: return DRFLAC_INVALID_ARGS;
8034	#endif
8035	#ifdef ENFILE
8036	case ENFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
8037	#endif
8038	#ifdef EMFILE
8039	case EMFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
8040	#endif
8041	#ifdef ENOTTY
8042	case ENOTTY: return DRFLAC_INVALID_OPERATION;
8043	#endif
8044	#ifdef ETXTBSY
8045	case ETXTBSY: return DRFLAC_BUSY;
8046	#endif
8047	#ifdef EFBIG
8048	case EFBIG: return DRFLAC_TOO_BIG;
8049	#endif
8050	#ifdef ENOSPC
8051	case ENOSPC: return DRFLAC_NO_SPACE;
8052	#endif
8053	#ifdef ESPIPE
8054	case ESPIPE: return DRFLAC_BAD_SEEK;
8055	#endif
8056	#ifdef EROFS
8057	case EROFS: return DRFLAC_ACCESS_DENIED;
8058	#endif
8059	#ifdef EMLINK
8060	case EMLINK: return DRFLAC_TOO_MANY_LINKS;
8061	#endif
8062	#ifdef EPIPE
8063	case EPIPE: return DRFLAC_BAD_PIPE;
8064	#endif
8065	#ifdef EDOM
8066	case EDOM: return DRFLAC_OUT_OF_RANGE;
8067	#endif
8068	#ifdef ERANGE
8069	case ERANGE: return DRFLAC_OUT_OF_RANGE;
8070	#endif
8071	#ifdef EDEADLK
8072	case EDEADLK: return DRFLAC_DEADLOCK;
8073	#endif
8074	#ifdef ENAMETOOLONG
8075	case ENAMETOOLONG: return DRFLAC_PATH_TOO_LONG;
8076	#endif
8077	#ifdef ENOLCK
8078	case ENOLCK: return DRFLAC_ERROR;
8079	#endif
8080	#ifdef ENOSYS
8081	case ENOSYS: return DRFLAC_NOT_IMPLEMENTED;
8082	#endif
8083	#ifdef ENOTEMPTY
8084	case ENOTEMPTY: return DRFLAC_DIRECTORY_NOT_EMPTY;
8085	#endif
8086	#ifdef ELOOP
8087	case ELOOP: return DRFLAC_TOO_MANY_LINKS;
8088	#endif
8089	#ifdef ENOMSG
8090	case ENOMSG: return DRFLAC_NO_MESSAGE;
8091	#endif
8092	#ifdef EIDRM
8093	case EIDRM: return DRFLAC_ERROR;
8094	#endif
8095	#ifdef ECHRNG
8096	case ECHRNG: return DRFLAC_ERROR;
8097	#endif
8098	#ifdef EL2NSYNC
8099	case EL2NSYNC: return DRFLAC_ERROR;
8100	#endif
8101	#ifdef EL3HLT
8102	case EL3HLT: return DRFLAC_ERROR;
8103	#endif
8104	#ifdef EL3RST
8105	case EL3RST: return DRFLAC_ERROR;
8106	#endif
8107	#ifdef ELNRNG
8108	case ELNRNG: return DRFLAC_OUT_OF_RANGE;
8109	#endif
8110	#ifdef EUNATCH
8111	case EUNATCH: return DRFLAC_ERROR;
8112	#endif
8113	#ifdef ENOCSI
8114	case ENOCSI: return DRFLAC_ERROR;
8115	#endif
8116	#ifdef EL2HLT
8117	case EL2HLT: return DRFLAC_ERROR;
8118	#endif
8119	#ifdef EBADE
8120	case EBADE: return DRFLAC_ERROR;
8121	#endif
8122	#ifdef EBADR
8123	case EBADR: return DRFLAC_ERROR;
8124	#endif
8125	#ifdef EXFULL
8126	case EXFULL: return DRFLAC_ERROR;
8127	#endif
8128	#ifdef ENOANO
8129	case ENOANO: return DRFLAC_ERROR;
8130	#endif
8131	#ifdef EBADRQC
8132	case EBADRQC: return DRFLAC_ERROR;
8133	#endif
8134	#ifdef EBADSLT
8135	case EBADSLT: return DRFLAC_ERROR;
8136	#endif
8137	#ifdef EBFONT
8138	case EBFONT: return DRFLAC_INVALID_FILE;
8139	#endif
8140	#ifdef ENOSTR
8141	case ENOSTR: return DRFLAC_ERROR;
8142	#endif
8143	#ifdef ENODATA
8144	case ENODATA: return DRFLAC_NO_DATA_AVAILABLE;
8145	#endif
8146	#ifdef ETIME
8147	case ETIME: return DRFLAC_TIMEOUT;
8148	#endif
8149	#ifdef ENOSR
8150	case ENOSR: return DRFLAC_NO_DATA_AVAILABLE;
8151	#endif
8152	#ifdef ENONET
8153	case ENONET: return DRFLAC_NO_NETWORK;
8154	#endif
8155	#ifdef ENOPKG
8156	case ENOPKG: return DRFLAC_ERROR;
8157	#endif
8158	#ifdef EREMOTE
8159	case EREMOTE: return DRFLAC_ERROR;
8160	#endif
8161	#ifdef ENOLINK
8162	case ENOLINK: return DRFLAC_ERROR;
8163	#endif
8164	#ifdef EADV
8165	case EADV: return DRFLAC_ERROR;
8166	#endif
8167	#ifdef ESRMNT
8168	case ESRMNT: return DRFLAC_ERROR;
8169	#endif
8170	#ifdef ECOMM
8171	case ECOMM: return DRFLAC_ERROR;
8172	#endif
8173	#ifdef EPROTO
8174	case EPROTO: return DRFLAC_ERROR;
8175	#endif
8176	#ifdef EMULTIHOP
8177	case EMULTIHOP: return DRFLAC_ERROR;
8178	#endif
8179	#ifdef EDOTDOT
8180	case EDOTDOT: return DRFLAC_ERROR;
8181	#endif
8182	#ifdef EBADMSG
8183	case EBADMSG: return DRFLAC_BAD_MESSAGE;
8184	#endif
8185	#ifdef EOVERFLOW
8186	case EOVERFLOW: return DRFLAC_TOO_BIG;
8187	#endif
8188	#ifdef ENOTUNIQ
8189	case ENOTUNIQ: return DRFLAC_NOT_UNIQUE;
8190	#endif
8191	#ifdef EBADFD
8192	case EBADFD: return DRFLAC_ERROR;
8193	#endif
8194	#ifdef EREMCHG
8195	case EREMCHG: return DRFLAC_ERROR;
8196	#endif
8197	#ifdef ELIBACC
8198	case ELIBACC: return DRFLAC_ACCESS_DENIED;
8199	#endif
8200	#ifdef ELIBBAD
8201	case ELIBBAD: return DRFLAC_INVALID_FILE;
8202	#endif
8203	#ifdef ELIBSCN
8204	case ELIBSCN: return DRFLAC_INVALID_FILE;
8205	#endif
8206	#ifdef ELIBMAX
8207	case ELIBMAX: return DRFLAC_ERROR;
8208	#endif
8209	#ifdef ELIBEXEC
8210	case ELIBEXEC: return DRFLAC_ERROR;
8211	#endif
8212	#ifdef EILSEQ
8213	case EILSEQ: return DRFLAC_INVALID_DATA;
8214	#endif
8215	#ifdef ERESTART
8216	case ERESTART: return DRFLAC_ERROR;
8217	#endif
8218	#ifdef ESTRPIPE
8219	case ESTRPIPE: return DRFLAC_ERROR;
8220	#endif
8221	#ifdef EUSERS
8222	case EUSERS: return DRFLAC_ERROR;
8223	#endif
8224	#ifdef ENOTSOCK
8225	case ENOTSOCK: return DRFLAC_NOT_SOCKET;
8226	#endif
8227	#ifdef EDESTADDRREQ
8228	case EDESTADDRREQ: return DRFLAC_NO_ADDRESS;
8229	#endif
8230	#ifdef EMSGSIZE
8231	case EMSGSIZE: return DRFLAC_TOO_BIG;
8232	#endif
8233	#ifdef EPROTOTYPE
8234	case EPROTOTYPE: return DRFLAC_BAD_PROTOCOL;
8235	#endif
8236	#ifdef ENOPROTOOPT
8237	case ENOPROTOOPT: return DRFLAC_PROTOCOL_UNAVAILABLE;
8238	#endif
8239	#ifdef EPROTONOSUPPORT
8240	case EPROTONOSUPPORT: return DRFLAC_PROTOCOL_NOT_SUPPORTED;
8241	#endif
8242	#ifdef ESOCKTNOSUPPORT
8243	case ESOCKTNOSUPPORT: return DRFLAC_SOCKET_NOT_SUPPORTED;
8244	#endif
8245	#ifdef EOPNOTSUPP
8246	case EOPNOTSUPP: return DRFLAC_INVALID_OPERATION;
8247	#endif
8248	#ifdef EPFNOSUPPORT
8249	case EPFNOSUPPORT: return DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED;
8250	#endif
8251	#ifdef EAFNOSUPPORT
8252	case EAFNOSUPPORT: return DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED;
8253	#endif
8254	#ifdef EADDRINUSE
8255	case EADDRINUSE: return DRFLAC_ALREADY_IN_USE;
8256	#endif
8257	#ifdef EADDRNOTAVAIL
8258	case EADDRNOTAVAIL: return DRFLAC_ERROR;
8259	#endif
8260	#ifdef ENETDOWN
8261	case ENETDOWN: return DRFLAC_NO_NETWORK;
8262	#endif
8263	#ifdef ENETUNREACH
8264	case ENETUNREACH: return DRFLAC_NO_NETWORK;
8265	#endif
8266	#ifdef ENETRESET
8267	case ENETRESET: return DRFLAC_NO_NETWORK;
8268	#endif
8269	#ifdef ECONNABORTED
8270	case ECONNABORTED: return DRFLAC_NO_NETWORK;
8271	#endif
8272	#ifdef ECONNRESET
8273	case ECONNRESET: return DRFLAC_CONNECTION_RESET;
8274	#endif
8275	#ifdef ENOBUFS
8276	case ENOBUFS: return DRFLAC_NO_SPACE;
8277	#endif
8278	#ifdef EISCONN
8279	case EISCONN: return DRFLAC_ALREADY_CONNECTED;
8280	#endif
8281	#ifdef ENOTCONN
8282	case ENOTCONN: return DRFLAC_NOT_CONNECTED;
8283	#endif
8284	#ifdef ESHUTDOWN
8285	case ESHUTDOWN: return DRFLAC_ERROR;
8286	#endif
8287	#ifdef ETOOMANYREFS
8288	case ETOOMANYREFS: return DRFLAC_ERROR;
8289	#endif
8290	#ifdef ETIMEDOUT
8291	case ETIMEDOUT: return DRFLAC_TIMEOUT;
8292	#endif
8293	#ifdef ECONNREFUSED
8294	case ECONNREFUSED: return DRFLAC_CONNECTION_REFUSED;
8295	#endif
8296	#ifdef EHOSTDOWN
8297	case EHOSTDOWN: return DRFLAC_NO_HOST;
8298	#endif
8299	#ifdef EHOSTUNREACH
8300	case EHOSTUNREACH: return DRFLAC_NO_HOST;
8301	#endif
8302	#ifdef EALREADY
8303	case EALREADY: return DRFLAC_IN_PROGRESS;
8304	#endif
8305	#ifdef EINPROGRESS
8306	case EINPROGRESS: return DRFLAC_IN_PROGRESS;
8307	#endif
8308	#ifdef ESTALE
8309	case ESTALE: return DRFLAC_INVALID_FILE;
8310	#endif
8311	#ifdef EUCLEAN
8312	case EUCLEAN: return DRFLAC_ERROR;
8313	#endif
8314	#ifdef ENOTNAM
8315	case ENOTNAM: return DRFLAC_ERROR;
8316	#endif
8317	#ifdef ENAVAIL
8318	case ENAVAIL: return DRFLAC_ERROR;
8319	#endif
8320	#ifdef EISNAM
8321	case EISNAM: return DRFLAC_ERROR;
8322	#endif
8323	#ifdef EREMOTEIO
8324	case EREMOTEIO: return DRFLAC_IO_ERROR;
8325	#endif
8326	#ifdef EDQUOT
8327	case EDQUOT: return DRFLAC_NO_SPACE;
8328	#endif
8329	#ifdef ENOMEDIUM
8330	case ENOMEDIUM: return DRFLAC_DOES_NOT_EXIST;
8331	#endif
8332	#ifdef EMEDIUMTYPE
8333	case EMEDIUMTYPE: return DRFLAC_ERROR;
8334	#endif
8335	#ifdef ECANCELED
8336	case ECANCELED: return DRFLAC_CANCELLED;
8337	#endif
8338	#ifdef ENOKEY
8339	case ENOKEY: return DRFLAC_ERROR;
8340	#endif
8341	#ifdef EKEYEXPIRED
8342	case EKEYEXPIRED: return DRFLAC_ERROR;
8343	#endif
8344	#ifdef EKEYREVOKED
8345	case EKEYREVOKED: return DRFLAC_ERROR;
8346	#endif
8347	#ifdef EKEYREJECTED
8348	case EKEYREJECTED: return DRFLAC_ERROR;
8349	#endif
8350	#ifdef EOWNERDEAD
8351	case EOWNERDEAD: return DRFLAC_ERROR;
8352	#endif
8353	#ifdef ENOTRECOVERABLE
8354	case ENOTRECOVERABLE: return DRFLAC_ERROR;
8355	#endif
8356	#ifdef ERFKILL
8357	case ERFKILL: return DRFLAC_ERROR;
8358	#endif
8359	#ifdef EHWPOISON
8360	case EHWPOISON: return DRFLAC_ERROR;
8361	#endif
8362	default: return DRFLAC_ERROR;
8363	}
8364	}
8365
8366	static drflac_result drflac_fopen(FILE** ppFile, const char* pFilePath, const char* pOpenMode)
8367	{
8368	#if defined(_MSC_VER) && _MSC_VER >= 1400
8369	errno_t err;
8370	#endif
8371
8372	if (ppFile != NULL) {
8373	ppFile = NULL; /* Safety. /
8374	}
8375
8376	if (pFilePath == NULL \|\| pOpenMode == NULL \|\| ppFile == NULL) {
8377	return DRFLAC_INVALID_ARGS;
8378	}
8379
8380	#if defined(_MSC_VER) && _MSC_VER >= 1400
8381	err = fopen_s(ppFile, pFilePath, pOpenMode);
8382	if (err != `0`) {
8383	return drflac_result_from_errno(err);
8384	}
8385	#else
8386	#if defined(_WIN32) \|\| defined(__APPLE__)
8387	*ppFile = fopen(pFilePath, pOpenMode);
8388	#else
8389	#if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS == 64 && defined(_LARGEFILE64_SOURCE)
8390	*ppFile = fopen64(pFilePath, pOpenMode);
8391	#else
8392	*ppFile = fopen(pFilePath, pOpenMode);
8393	#endif
8394	#endif
8395	if (*ppFile == NULL) {
8396	drflac_result result = drflac_result_from_errno(errno);
8397	if (result == DRFLAC_SUCCESS) {
8398	result = DRFLAC_ERROR; / Just a safety check to make sure we never ever return success when pFile == NULL. /
8399	}
8400
8401	return result;
8402	}
8403	#endif
8404
8405	return DRFLAC_SUCCESS;
8406	}
8407
8408	/*
8409	_wfopen() isn't always available in all compilation environments.
8410
8411	* Windows only.
8412	* MSVC seems to support it universally as far back as VC6 from what I can tell (haven't checked further back).
8413	* MinGW-64 (both 32- and 64-bit) seems to support it.
8414	* MinGW wraps it in !defined(__STRICT_ANSI__).
8415	* OpenWatcom wraps it in !defined(_NO_EXT_KEYS).
8416
8417	This can be reviewed as compatibility issues arise. The preference is to use _wfopen_s() and _wfopen() as opposed to the wcsrtombs()
8418	fallback, so if you notice your compiler not detecting this properly I'm happy to look at adding support.
8419	*/
8420	#if defined(_WIN32)
8421	#if defined(_MSC_VER) \|\| defined(__MINGW64__) \|\| (!defined(__STRICT_ANSI__) && !defined(_NO_EXT_KEYS))
8422	#define DRFLAC_HAS_WFOPEN
8423	#endif
8424	#endif
8425
8426	static drflac_result drflac_wfopen(FILE** ppFile, const wchar_t* pFilePath, const wchar_t* pOpenMode, const drflac_allocation_callbacks* pAllocationCallbacks)
8427	{
8428	if (ppFile != NULL) {
8429	ppFile = NULL; /* Safety. /
8430	}
8431
8432	if (pFilePath == NULL \|\| pOpenMode == NULL \|\| ppFile == NULL) {
8433	return DRFLAC_INVALID_ARGS;
8434	}
8435
8436	#if defined(DRFLAC_HAS_WFOPEN)
8437	{
8438	/ Use _wfopen() on Windows. /
8439	#if defined(_MSC_VER) && _MSC_VER >= 1400
8440	errno_t err = _wfopen_s(ppFile, pFilePath, pOpenMode);
8441	if (err != `0`) {
8442	return drflac_result_from_errno(err);
8443	}
8444	#else
8445	*ppFile = _wfopen(pFilePath, pOpenMode);
8446	if (*ppFile == NULL) {
8447	return drflac_result_from_errno(errno);
8448	}
8449	#endif
8450	(void)pAllocationCallbacks;
8451	}
8452	#else
8453	/*
8454	Use fopen() on anything other than Windows. Requires a conversion. This is annoying because fopen() is locale specific. The only real way I can
8455	think of to do this is with wcsrtombs(). Note that wcstombs() is apparently not thread-safe because it uses a static global mbstate_t object for
8456	maintaining state. I've checked this with -std=c89 and it works, but if somebody get's a compiler error I'll look into improving compatibility.
8457	*/
8458	{
8459	mbstate_t mbs;
8460	size_t lenMB;
8461	const wchar_t* pFilePathTemp = pFilePath;
8462	char* pFilePathMB = NULL;
8463	char pOpenModeMB[`32`] = {`0`};
8464
8465	/ Get the length first. /
8466	DRFLAC_ZERO_OBJECT(&mbs);
8467	lenMB = wcsrtombs(NULL, &pFilePathTemp, `0`, &mbs);
8468	if (lenMB == (size_t)-`1`) {
8469	return drflac_result_from_errno(errno);
8470	}
8471
8472	pFilePathMB = (char*)drflac__malloc_from_callbacks(lenMB + `1`, pAllocationCallbacks);
8473	if (pFilePathMB == NULL) {
8474	return DRFLAC_OUT_OF_MEMORY;
8475	}
8476
8477	pFilePathTemp = pFilePath;
8478	DRFLAC_ZERO_OBJECT(&mbs);
8479	wcsrtombs(pFilePathMB, &pFilePathTemp, lenMB + `1`, &mbs);
8480
8481	/ The open mode should always consist of ASCII characters so we should be able to do a trivial conversion. /
8482	{
8483	size_t i = `0`;
8484	for (;;) {
8485	if (pOpenMode[i] == `0`) {
8486	pOpenModeMB[i] = `'\0'`;
8487	break;
8488	}
8489
8490	pOpenModeMB[i] = (char)pOpenMode[i];
8491	i += `1`;
8492	}
8493	}
8494
8495	*ppFile = fopen(pFilePathMB, pOpenModeMB);
8496
8497	drflac__free_from_callbacks(pFilePathMB, pAllocationCallbacks);
8498	}
8499
8500	if (*ppFile == NULL) {
8501	return DRFLAC_ERROR;
8502	}
8503	#endif
8504
8505	return DRFLAC_SUCCESS;
8506	}
8507
8508	static size_t drflac__on_read_stdio(void* pUserData, void* bufferOut, size_t bytesToRead)
8509	{
8510	return fread(bufferOut, `1`, bytesToRead, (FILE*)pUserData);
8511	}
8512
8513	static drflac_bool32 drflac__on_seek_stdio(void* pUserData, int offset, drflac_seek_origin origin)
8514	{
8515	DRFLAC_ASSERT(offset >= `0`); / <-- Never seek backwards. /
8516
8517	return fseek((FILE*)pUserData, offset, (origin == drflac_seek_origin_current) ? SEEK_CUR : SEEK_SET) == `0`;
8518	}
8519
8520
8521	DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
8522	{
8523	drflac* pFlac;
8524	FILE* pFile;
8525
8526	if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
8527	return NULL;
8528	}
8529
8530	pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
8531	if (pFlac == NULL) {
8532	fclose(pFile);
8533	return NULL;
8534	}
8535
8536	return pFlac;
8537	}
8538
8539	DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
8540	{
8541	drflac* pFlac;
8542	FILE* pFile;
8543
8544	if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
8545	return NULL;
8546	}
8547
8548	pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
8549	if (pFlac == NULL) {
8550	fclose(pFile);
8551	return NULL;
8552	}
8553
8554	return pFlac;
8555	}
8556
8557	DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8558	{
8559	drflac* pFlac;
8560	FILE* pFile;
8561
8562	if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
8563	return NULL;
8564	}
8565
8566	pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
8567	if (pFlac == NULL) {
8568	fclose(pFile);
8569	return pFlac;
8570	}
8571
8572	return pFlac;
8573	}
8574
8575	DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8576	{
8577	drflac* pFlac;
8578	FILE* pFile;
8579
8580	if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
8581	return NULL;
8582	}
8583
8584	pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
8585	if (pFlac == NULL) {
8586	fclose(pFile);
8587	return pFlac;
8588	}
8589
8590	return pFlac;
8591	}
8592	#endif /* DR_FLAC_NO_STDIO */
8593
8594	static size_t drflac__on_read_memory(void* pUserData, void* bufferOut, size_t bytesToRead)
8595	{
8596	drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
8597	size_t bytesRemaining;
8598
8599	DRFLAC_ASSERT(memoryStream != NULL);
8600	DRFLAC_ASSERT(memoryStream->dataSize >= memoryStream->currentReadPos);
8601
8602	bytesRemaining = memoryStream->dataSize - memoryStream->currentReadPos;
8603	if (bytesToRead > bytesRemaining) {
8604	bytesToRead = bytesRemaining;
8605	}
8606
8607	if (bytesToRead > `0`) {
8608	DRFLAC_COPY_MEMORY(bufferOut, memoryStream->data + memoryStream->currentReadPos, bytesToRead);
8609	memoryStream->currentReadPos += bytesToRead;
8610	}
8611
8612	return bytesToRead;
8613	}
8614
8615	static drflac_bool32 drflac__on_seek_memory(void* pUserData, int offset, drflac_seek_origin origin)
8616	{
8617	drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
8618
8619	DRFLAC_ASSERT(memoryStream != NULL);
8620	DRFLAC_ASSERT(offset >= `0`); / <-- Never seek backwards. /
8621
8622	if (offset > (drflac_int64)memoryStream->dataSize) {
8623	return DRFLAC_FALSE;
8624	}
8625
8626	if (origin == drflac_seek_origin_current) {
8627	if (memoryStream->currentReadPos + offset <= memoryStream->dataSize) {
8628	memoryStream->currentReadPos += offset;
8629	} else {
8630	return DRFLAC_FALSE; / Trying to seek too far forward. /
8631	}
8632	} else {
8633	if ((drflac_uint32)offset <= memoryStream->dataSize) {
8634	memoryStream->currentReadPos = offset;
8635	} else {
8636	return DRFLAC_FALSE; / Trying to seek too far forward. /
8637	}
8638	}
8639
8640	return DRFLAC_TRUE;
8641	}
8642
8643	DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks)
8644	{
8645	drflac__memory_stream memoryStream;
8646	drflac* pFlac;
8647
8648	memoryStream.data = (const drflac_uint8*)pData;
8649	memoryStream.dataSize = dataSize;
8650	memoryStream.currentReadPos = `0`;
8651	pFlac = drflac_open(drflac__on_read_memory, drflac__on_seek_memory, &memoryStream, pAllocationCallbacks);
8652	if (pFlac == NULL) {
8653	return NULL;
8654	}
8655
8656	pFlac->memoryStream = memoryStream;
8657
8658	/ This is an awful hack... /
8659	#ifndef DR_FLAC_NO_OGG
8660	if (pFlac->container == drflac_container_ogg)
8661	{
8662	drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8663	oggbs->pUserData = &pFlac->memoryStream;
8664	}
8665	else
8666	#endif
8667	{
8668	pFlac->bs.pUserData = &pFlac->memoryStream;
8669	}
8670
8671	return pFlac;
8672	}
8673
8674	DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8675	{
8676	drflac__memory_stream memoryStream;
8677	drflac* pFlac;
8678
8679	memoryStream.data = (const drflac_uint8*)pData;
8680	memoryStream.dataSize = dataSize;
8681	memoryStream.currentReadPos = `0`;
8682	pFlac = drflac_open_with_metadata_private(drflac__on_read_memory, drflac__on_seek_memory, onMeta, drflac_container_unknown, &memoryStream, pUserData, pAllocationCallbacks);
8683	if (pFlac == NULL) {
8684	return NULL;
8685	}
8686
8687	pFlac->memoryStream = memoryStream;
8688
8689	/ This is an awful hack... /
8690	#ifndef DR_FLAC_NO_OGG
8691	if (pFlac->container == drflac_container_ogg)
8692	{
8693	drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8694	oggbs->pUserData = &pFlac->memoryStream;
8695	}
8696	else
8697	#endif
8698	{
8699	pFlac->bs.pUserData = &pFlac->memoryStream;
8700	}
8701
8702	return pFlac;
8703	}
8704
8705
8706
8707	DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8708	{
8709	return drflac_open_with_metadata_private(onRead, onSeek, NULL, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
8710	}
8711	DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8712	{
8713	return drflac_open_with_metadata_private(onRead, onSeek, NULL, container, pUserData, pUserData, pAllocationCallbacks);
8714	}
8715
8716	DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8717	{
8718	return drflac_open_with_metadata_private(onRead, onSeek, onMeta, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
8719	}
8720	DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8721	{
8722	return drflac_open_with_metadata_private(onRead, onSeek, onMeta, container, pUserData, pUserData, pAllocationCallbacks);
8723	}
8724
8725	DRFLAC_API void drflac_close(drflac* pFlac)
8726	{
8727	if (pFlac == NULL) {
8728	return;
8729	}
8730
8731	#ifndef DR_FLAC_NO_STDIO
8732	/*
8733	If we opened the file with drflac_open_file() we will want to close the file handle. We can know whether or not drflac_open_file()
8734	was used by looking at the callbacks.
8735	*/
8736	if (pFlac->bs.onRead == drflac__on_read_stdio) {
8737	fclose((FILE*)pFlac->bs.pUserData);
8738	}
8739
8740	#ifndef DR_FLAC_NO_OGG
8741	/ Need to clean up Ogg streams a bit differently due to the way the bit streaming is chained. /
8742	if (pFlac->container == drflac_container_ogg) {
8743	drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8744	DRFLAC_ASSERT(pFlac->bs.onRead == drflac__on_read_ogg);
8745
8746	if (oggbs->onRead == drflac__on_read_stdio) {
8747	fclose((FILE*)oggbs->pUserData);
8748	}
8749	}
8750	#endif
8751	#endif
8752
8753	drflac__free_from_callbacks(pFlac, &pFlac->allocationCallbacks);
8754	}
8755
8756
8757	#if 0
8758	static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8759	{
8760	drflac_uint64 i;
8761	for (i = `0`; i < frameCount; ++i) {
8762	drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample);
8763	drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample);
8764	drflac_uint32 right = left - side;
8765
8766	pOutputSamples[i*`2`+`0`] = (drflac_int32)left;
8767	pOutputSamples[i*`2`+`1`] = (drflac_int32)right;
8768	}
8769	}
8770	#endif
8771
8772	static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8773	{
8774	drflac_uint64 i;
8775	drflac_uint64 frameCount4 = frameCount >> `2`;
8776	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8777	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8778	drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
8779	drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
8780
8781	for (i = `0`; i < frameCount4; ++i) {
8782	drflac_uint32 left0 = pInputSamples0U32[i*`4`+`0`] << shift0;
8783	drflac_uint32 left1 = pInputSamples0U32[i*`4`+`1`] << shift0;
8784	drflac_uint32 left2 = pInputSamples0U32[i*`4`+`2`] << shift0;
8785	drflac_uint32 left3 = pInputSamples0U32[i*`4`+`3`] << shift0;
8786
8787	drflac_uint32 side0 = pInputSamples1U32[i*`4`+`0`] << shift1;
8788	drflac_uint32 side1 = pInputSamples1U32[i*`4`+`1`] << shift1;
8789	drflac_uint32 side2 = pInputSamples1U32[i*`4`+`2`] << shift1;
8790	drflac_uint32 side3 = pInputSamples1U32[i*`4`+`3`] << shift1;
8791
8792	drflac_uint32 right0 = left0 - side0;
8793	drflac_uint32 right1 = left1 - side1;
8794	drflac_uint32 right2 = left2 - side2;
8795	drflac_uint32 right3 = left3 - side3;
8796
8797	pOutputSamples[i*`8`+`0`] = (drflac_int32)left0;
8798	pOutputSamples[i*`8`+`1`] = (drflac_int32)right0;
8799	pOutputSamples[i*`8`+`2`] = (drflac_int32)left1;
8800	pOutputSamples[i*`8`+`3`] = (drflac_int32)right1;
8801	pOutputSamples[i*`8`+`4`] = (drflac_int32)left2;
8802	pOutputSamples[i*`8`+`5`] = (drflac_int32)right2;
8803	pOutputSamples[i*`8`+`6`] = (drflac_int32)left3;
8804	pOutputSamples[i*`8`+`7`] = (drflac_int32)right3;
8805	}
8806
8807	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
8808	drflac_uint32 left = pInputSamples0U32[i] << shift0;
8809	drflac_uint32 side = pInputSamples1U32[i] << shift1;
8810	drflac_uint32 right = left - side;
8811
8812	pOutputSamples[i*`2`+`0`] = (drflac_int32)left;
8813	pOutputSamples[i*`2`+`1`] = (drflac_int32)right;
8814	}
8815	}
8816
8817	#if defined(DRFLAC_SUPPORT_SSE2)
8818	static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8819	{
8820	drflac_uint64 i;
8821	drflac_uint64 frameCount4 = frameCount >> `2`;
8822	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8823	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8824	drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
8825	drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
8826
8827	DRFLAC_ASSERT(pFlac->bitsPerSample <= `24`);
8828
8829	for (i = `0`; i < frameCount4; ++i) {
8830	__m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
8831	__m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
8832	__m128i right = _mm_sub_epi32(left, side);
8833
8834	_mm_storeu_si128((__m128i)(pOutputSamples + i`8` + `0`), _mm_unpacklo_epi32(left, right));
8835	_mm_storeu_si128((__m128i)(pOutputSamples + i`8` + `4`), _mm_unpackhi_epi32(left, right));
8836	}
8837
8838	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
8839	drflac_uint32 left = pInputSamples0U32[i] << shift0;
8840	drflac_uint32 side = pInputSamples1U32[i] << shift1;
8841	drflac_uint32 right = left - side;
8842
8843	pOutputSamples[i*`2`+`0`] = (drflac_int32)left;
8844	pOutputSamples[i*`2`+`1`] = (drflac_int32)right;
8845	}
8846	}
8847	#endif
8848
8849	#if defined(DRFLAC_SUPPORT_NEON)
8850	static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8851	{
8852	drflac_uint64 i;
8853	drflac_uint64 frameCount4 = frameCount >> `2`;
8854	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8855	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8856	drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
8857	drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
8858	int32x4_t shift0_4;
8859	int32x4_t shift1_4;
8860
8861	DRFLAC_ASSERT(pFlac->bitsPerSample <= `24`);
8862
8863	shift0_4 = vdupq_n_s32(shift0);
8864	shift1_4 = vdupq_n_s32(shift1);
8865
8866	for (i = `0`; i < frameCount4; ++i) {
8867	uint32x4_t left;
8868	uint32x4_t side;
8869	uint32x4_t right;
8870
8871	left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*`4`), shift0_4);
8872	side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*`4`), shift1_4);
8873	right = vsubq_u32(left, side);
8874
8875	drflac__vst2q_u32((drflac_uint32)pOutputSamples + i`8`, vzipq_u32(left, right));
8876	}
8877
8878	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
8879	drflac_uint32 left = pInputSamples0U32[i] << shift0;
8880	drflac_uint32 side = pInputSamples1U32[i] << shift1;
8881	drflac_uint32 right = left - side;
8882
8883	pOutputSamples[i*`2`+`0`] = (drflac_int32)left;
8884	pOutputSamples[i*`2`+`1`] = (drflac_int32)right;
8885	}
8886	}
8887	#endif
8888
8889	static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8890	{
8891	#if defined(DRFLAC_SUPPORT_SSE2)
8892	if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= `24`) {
8893	drflac_read_pcm_frames_s32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
8894	} else
8895	#elif defined(DRFLAC_SUPPORT_NEON)
8896	if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= `24`) {
8897	drflac_read_pcm_frames_s32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
8898	} else
8899	#endif
8900	{
8901	/ Scalar fallback. /
8902	#if 0
8903	drflac_read_pcm_frames_s32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
8904	#else
8905	drflac_read_pcm_frames_s32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
8906	#endif
8907	}
8908	}
8909
8910
8911	#if 0
8912	static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8913	{
8914	drflac_uint64 i;
8915	for (i = `0`; i < frameCount; ++i) {
8916	drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample);
8917	drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample);
8918	drflac_uint32 left = right + side;
8919
8920	pOutputSamples[i*`2`+`0`] = (drflac_int32)left;
8921	pOutputSamples[i*`2`+`1`] = (drflac_int32)right;
8922	}
8923	}
8924	#endif
8925
8926	static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8927	{
8928	drflac_uint64 i;
8929	drflac_uint64 frameCount4 = frameCount >> `2`;
8930	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8931	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8932	drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
8933	drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
8934
8935	for (i = `0`; i < frameCount4; ++i) {
8936	drflac_uint32 side0 = pInputSamples0U32[i*`4`+`0`] << shift0;
8937	drflac_uint32 side1 = pInputSamples0U32[i*`4`+`1`] << shift0;
8938	drflac_uint32 side2 = pInputSamples0U32[i*`4`+`2`] << shift0;
8939	drflac_uint32 side3 = pInputSamples0U32[i*`4`+`3`] << shift0;
8940
8941	drflac_uint32 right0 = pInputSamples1U32[i*`4`+`0`] << shift1;
8942	drflac_uint32 right1 = pInputSamples1U32[i*`4`+`1`] << shift1;
8943	drflac_uint32 right2 = pInputSamples1U32[i*`4`+`2`] << shift1;
8944	drflac_uint32 right3 = pInputSamples1U32[i*`4`+`3`] << shift1;
8945
8946	drflac_uint32 left0 = right0 + side0;
8947	drflac_uint32 left1 = right1 + side1;
8948	drflac_uint32 left2 = right2 + side2;
8949	drflac_uint32 left3 = right3 + side3;
8950
8951	pOutputSamples[i*`8`+`0`] = (drflac_int32)left0;
8952	pOutputSamples[i*`8`+`1`] = (drflac_int32)right0;
8953	pOutputSamples[i*`8`+`2`] = (drflac_int32)left1;
8954	pOutputSamples[i*`8`+`3`] = (drflac_int32)right1;
8955	pOutputSamples[i*`8`+`4`] = (drflac_int32)left2;
8956	pOutputSamples[i*`8`+`5`] = (drflac_int32)right2;
8957	pOutputSamples[i*`8`+`6`] = (drflac_int32)left3;
8958	pOutputSamples[i*`8`+`7`] = (drflac_int32)right3;
8959	}
8960
8961	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
8962	drflac_uint32 side = pInputSamples0U32[i] << shift0;
8963	drflac_uint32 right = pInputSamples1U32[i] << shift1;
8964	drflac_uint32 left = right + side;
8965
8966	pOutputSamples[i*`2`+`0`] = (drflac_int32)left;
8967	pOutputSamples[i*`2`+`1`] = (drflac_int32)right;
8968	}
8969	}
8970
8971	#if defined(DRFLAC_SUPPORT_SSE2)
8972	static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8973	{
8974	drflac_uint64 i;
8975	drflac_uint64 frameCount4 = frameCount >> `2`;
8976	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8977	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8978	drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
8979	drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
8980
8981	DRFLAC_ASSERT(pFlac->bitsPerSample <= `24`);
8982
8983	for (i = `0`; i < frameCount4; ++i) {
8984	__m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
8985	__m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
8986	__m128i left = _mm_add_epi32(right, side);
8987
8988	_mm_storeu_si128((__m128i)(pOutputSamples + i`8` + `0`), _mm_unpacklo_epi32(left, right));
8989	_mm_storeu_si128((__m128i)(pOutputSamples + i`8` + `4`), _mm_unpackhi_epi32(left, right));
8990	}
8991
8992	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
8993	drflac_uint32 side = pInputSamples0U32[i] << shift0;
8994	drflac_uint32 right = pInputSamples1U32[i] << shift1;
8995	drflac_uint32 left = right + side;
8996
8997	pOutputSamples[i*`2`+`0`] = (drflac_int32)left;
8998	pOutputSamples[i*`2`+`1`] = (drflac_int32)right;
8999	}
9000	}
9001	#endif
9002
9003	#if defined(DRFLAC_SUPPORT_NEON)
9004	static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9005	{
9006	drflac_uint64 i;
9007	drflac_uint64 frameCount4 = frameCount >> `2`;
9008	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9009	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9010	drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9011	drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9012	int32x4_t shift0_4;
9013	int32x4_t shift1_4;
9014
9015	DRFLAC_ASSERT(pFlac->bitsPerSample <= `24`);
9016
9017	shift0_4 = vdupq_n_s32(shift0);
9018	shift1_4 = vdupq_n_s32(shift1);
9019
9020	for (i = `0`; i < frameCount4; ++i) {
9021	uint32x4_t side;
9022	uint32x4_t right;
9023	uint32x4_t left;
9024
9025	side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*`4`), shift0_4);
9026	right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*`4`), shift1_4);
9027	left = vaddq_u32(right, side);
9028
9029	drflac__vst2q_u32((drflac_uint32)pOutputSamples + i`8`, vzipq_u32(left, right));
9030	}
9031
9032	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
9033	drflac_uint32 side = pInputSamples0U32[i] << shift0;
9034	drflac_uint32 right = pInputSamples1U32[i] << shift1;
9035	drflac_uint32 left = right + side;
9036
9037	pOutputSamples[i*`2`+`0`] = (drflac_int32)left;
9038	pOutputSamples[i*`2`+`1`] = (drflac_int32)right;
9039	}
9040	}
9041	#endif
9042
9043	static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9044	{
9045	#if defined(DRFLAC_SUPPORT_SSE2)
9046	if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= `24`) {
9047	drflac_read_pcm_frames_s32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9048	} else
9049	#elif defined(DRFLAC_SUPPORT_NEON)
9050	if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= `24`) {
9051	drflac_read_pcm_frames_s32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9052	} else
9053	#endif
9054	{
9055	/ Scalar fallback. /
9056	#if 0
9057	drflac_read_pcm_frames_s32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9058	#else
9059	drflac_read_pcm_frames_s32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9060	#endif
9061	}
9062	}
9063
9064
9065	#if 0
9066	static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9067	{
9068	for (drflac_uint64 i = `0`; i < frameCount; ++i) {
9069	drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9070	drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9071
9072	mid = (mid << `1`) \| (side & `0x01`);
9073
9074	pOutputSamples[i*`2`+`0`] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> `1`) << unusedBitsPerSample);
9075	pOutputSamples[i*`2`+`1`] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> `1`) << unusedBitsPerSample);
9076	}
9077	}
9078	#endif
9079
9080	static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9081	{
9082	drflac_uint64 i;
9083	drflac_uint64 frameCount4 = frameCount >> `2`;
9084	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9085	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9086	drflac_int32 shift = unusedBitsPerSample;
9087
9088	if (shift > `0`) {
9089	shift -= `1`;
9090	for (i = `0`; i < frameCount4; ++i) {
9091	drflac_uint32 temp0L;
9092	drflac_uint32 temp1L;
9093	drflac_uint32 temp2L;
9094	drflac_uint32 temp3L;
9095	drflac_uint32 temp0R;
9096	drflac_uint32 temp1R;
9097	drflac_uint32 temp2R;
9098	drflac_uint32 temp3R;
9099
9100	drflac_uint32 mid0 = pInputSamples0U32[i*`4`+`0`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9101	drflac_uint32 mid1 = pInputSamples0U32[i*`4`+`1`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9102	drflac_uint32 mid2 = pInputSamples0U32[i*`4`+`2`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9103	drflac_uint32 mid3 = pInputSamples0U32[i*`4`+`3`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9104
9105	drflac_uint32 side0 = pInputSamples1U32[i*`4`+`0`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9106	drflac_uint32 side1 = pInputSamples1U32[i*`4`+`1`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9107	drflac_uint32 side2 = pInputSamples1U32[i*`4`+`2`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9108	drflac_uint32 side3 = pInputSamples1U32[i*`4`+`3`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9109
9110	mid0 = (mid0 << `1`) \| (side0 & `0x01`);
9111	mid1 = (mid1 << `1`) \| (side1 & `0x01`);
9112	mid2 = (mid2 << `1`) \| (side2 & `0x01`);
9113	mid3 = (mid3 << `1`) \| (side3 & `0x01`);
9114
9115	temp0L = (mid0 + side0) << shift;
9116	temp1L = (mid1 + side1) << shift;
9117	temp2L = (mid2 + side2) << shift;
9118	temp3L = (mid3 + side3) << shift;
9119
9120	temp0R = (mid0 - side0) << shift;
9121	temp1R = (mid1 - side1) << shift;
9122	temp2R = (mid2 - side2) << shift;
9123	temp3R = (mid3 - side3) << shift;
9124
9125	pOutputSamples[i*`8`+`0`] = (drflac_int32)temp0L;
9126	pOutputSamples[i*`8`+`1`] = (drflac_int32)temp0R;
9127	pOutputSamples[i*`8`+`2`] = (drflac_int32)temp1L;
9128	pOutputSamples[i*`8`+`3`] = (drflac_int32)temp1R;
9129	pOutputSamples[i*`8`+`4`] = (drflac_int32)temp2L;
9130	pOutputSamples[i*`8`+`5`] = (drflac_int32)temp2R;
9131	pOutputSamples[i*`8`+`6`] = (drflac_int32)temp3L;
9132	pOutputSamples[i*`8`+`7`] = (drflac_int32)temp3R;
9133	}
9134	} else {
9135	for (i = `0`; i < frameCount4; ++i) {
9136	drflac_uint32 temp0L;
9137	drflac_uint32 temp1L;
9138	drflac_uint32 temp2L;
9139	drflac_uint32 temp3L;
9140	drflac_uint32 temp0R;
9141	drflac_uint32 temp1R;
9142	drflac_uint32 temp2R;
9143	drflac_uint32 temp3R;
9144
9145	drflac_uint32 mid0 = pInputSamples0U32[i*`4`+`0`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9146	drflac_uint32 mid1 = pInputSamples0U32[i*`4`+`1`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9147	drflac_uint32 mid2 = pInputSamples0U32[i*`4`+`2`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9148	drflac_uint32 mid3 = pInputSamples0U32[i*`4`+`3`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9149
9150	drflac_uint32 side0 = pInputSamples1U32[i*`4`+`0`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9151	drflac_uint32 side1 = pInputSamples1U32[i*`4`+`1`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9152	drflac_uint32 side2 = pInputSamples1U32[i*`4`+`2`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9153	drflac_uint32 side3 = pInputSamples1U32[i*`4`+`3`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9154
9155	mid0 = (mid0 << `1`) \| (side0 & `0x01`);
9156	mid1 = (mid1 << `1`) \| (side1 & `0x01`);
9157	mid2 = (mid2 << `1`) \| (side2 & `0x01`);
9158	mid3 = (mid3 << `1`) \| (side3 & `0x01`);
9159
9160	temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> `1`);
9161	temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> `1`);
9162	temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> `1`);
9163	temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> `1`);
9164
9165	temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> `1`);
9166	temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> `1`);
9167	temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> `1`);
9168	temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> `1`);
9169
9170	pOutputSamples[i*`8`+`0`] = (drflac_int32)temp0L;
9171	pOutputSamples[i*`8`+`1`] = (drflac_int32)temp0R;
9172	pOutputSamples[i*`8`+`2`] = (drflac_int32)temp1L;
9173	pOutputSamples[i*`8`+`3`] = (drflac_int32)temp1R;
9174	pOutputSamples[i*`8`+`4`] = (drflac_int32)temp2L;
9175	pOutputSamples[i*`8`+`5`] = (drflac_int32)temp2R;
9176	pOutputSamples[i*`8`+`6`] = (drflac_int32)temp3L;
9177	pOutputSamples[i*`8`+`7`] = (drflac_int32)temp3R;
9178	}
9179	}
9180
9181	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
9182	drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9183	drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9184
9185	mid = (mid << `1`) \| (side & `0x01`);
9186
9187	pOutputSamples[i*`2`+`0`] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> `1`) << unusedBitsPerSample);
9188	pOutputSamples[i*`2`+`1`] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> `1`) << unusedBitsPerSample);
9189	}
9190	}
9191
9192	#if defined(DRFLAC_SUPPORT_SSE2)
9193	static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9194	{
9195	drflac_uint64 i;
9196	drflac_uint64 frameCount4 = frameCount >> `2`;
9197	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9198	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9199	drflac_int32 shift = unusedBitsPerSample;
9200
9201	DRFLAC_ASSERT(pFlac->bitsPerSample <= `24`);
9202
9203	if (shift == `0`) {
9204	for (i = `0`; i < frameCount4; ++i) {
9205	__m128i mid;
9206	__m128i side;
9207	__m128i left;
9208	__m128i right;
9209
9210	mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample);
9211	side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample);
9212
9213	mid = _mm_or_si128(_mm_slli_epi32(mid, `1`), _mm_and_si128(side, _mm_set1_epi32(`0x01`)));
9214
9215	left = _mm_srai_epi32(_mm_add_epi32(mid, side), `1`);
9216	right = _mm_srai_epi32(_mm_sub_epi32(mid, side), `1`);
9217
9218	_mm_storeu_si128((__m128i)(pOutputSamples + i`8` + `0`), _mm_unpacklo_epi32(left, right));
9219	_mm_storeu_si128((__m128i)(pOutputSamples + i`8` + `4`), _mm_unpackhi_epi32(left, right));
9220	}
9221
9222	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
9223	drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9224	drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9225
9226	mid = (mid << `1`) \| (side & `0x01`);
9227
9228	pOutputSamples[i*`2`+`0`] = (drflac_int32)(mid + side) >> `1`;
9229	pOutputSamples[i*`2`+`1`] = (drflac_int32)(mid - side) >> `1`;
9230	}
9231	} else {
9232	shift -= `1`;
9233	for (i = `0`; i < frameCount4; ++i) {
9234	__m128i mid;
9235	__m128i side;
9236	__m128i left;
9237	__m128i right;
9238
9239	mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample);
9240	side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample);
9241
9242	mid = _mm_or_si128(_mm_slli_epi32(mid, `1`), _mm_and_si128(side, _mm_set1_epi32(`0x01`)));
9243
9244	left = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
9245	right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
9246
9247	_mm_storeu_si128((__m128i)(pOutputSamples + i`8` + `0`), _mm_unpacklo_epi32(left, right));
9248	_mm_storeu_si128((__m128i)(pOutputSamples + i`8` + `4`), _mm_unpackhi_epi32(left, right));
9249	}
9250
9251	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
9252	drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9253	drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9254
9255	mid = (mid << `1`) \| (side & `0x01`);
9256
9257	pOutputSamples[i*`2`+`0`] = (drflac_int32)((mid + side) << shift);
9258	pOutputSamples[i*`2`+`1`] = (drflac_int32)((mid - side) << shift);
9259	}
9260	}
9261	}
9262	#endif
9263
9264	#if defined(DRFLAC_SUPPORT_NEON)
9265	static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9266	{
9267	drflac_uint64 i;
9268	drflac_uint64 frameCount4 = frameCount >> `2`;
9269	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9270	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9271	drflac_int32 shift = unusedBitsPerSample;
9272	int32x4_t wbpsShift0_4; / wbps = Wasted Bits Per Sample /
9273	int32x4_t wbpsShift1_4; / wbps = Wasted Bits Per Sample /
9274	uint32x4_t one4;
9275
9276	DRFLAC_ASSERT(pFlac->bitsPerSample <= `24`);
9277
9278	wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample);
9279	wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample);
9280	one4 = vdupq_n_u32(`1`);
9281
9282	if (shift == `0`) {
9283	for (i = `0`; i < frameCount4; ++i) {
9284	uint32x4_t mid;
9285	uint32x4_t side;
9286	int32x4_t left;
9287	int32x4_t right;
9288
9289	mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*`4`), wbpsShift0_4);
9290	side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*`4`), wbpsShift1_4);
9291
9292	mid = vorrq_u32(vshlq_n_u32(mid, `1`), vandq_u32(side, one4));
9293
9294	left = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), `1`);
9295	right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), `1`);
9296
9297	drflac__vst2q_s32(pOutputSamples + i*`8`, vzipq_s32(left, right));
9298	}
9299
9300	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
9301	drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9302	drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9303
9304	mid = (mid << `1`) \| (side & `0x01`);
9305
9306	pOutputSamples[i*`2`+`0`] = (drflac_int32)(mid + side) >> `1`;
9307	pOutputSamples[i*`2`+`1`] = (drflac_int32)(mid - side) >> `1`;
9308	}
9309	} else {
9310	int32x4_t shift4;
9311
9312	shift -= `1`;
9313	shift4 = vdupq_n_s32(shift);
9314
9315	for (i = `0`; i < frameCount4; ++i) {
9316	uint32x4_t mid;
9317	uint32x4_t side;
9318	int32x4_t left;
9319	int32x4_t right;
9320
9321	mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*`4`), wbpsShift0_4);
9322	side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*`4`), wbpsShift1_4);
9323
9324	mid = vorrq_u32(vshlq_n_u32(mid, `1`), vandq_u32(side, one4));
9325
9326	left = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
9327	right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
9328
9329	drflac__vst2q_s32(pOutputSamples + i*`8`, vzipq_s32(left, right));
9330	}
9331
9332	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
9333	drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9334	drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9335
9336	mid = (mid << `1`) \| (side & `0x01`);
9337
9338	pOutputSamples[i*`2`+`0`] = (drflac_int32)((mid + side) << shift);
9339	pOutputSamples[i*`2`+`1`] = (drflac_int32)((mid - side) << shift);
9340	}
9341	}
9342	}
9343	#endif
9344
9345	static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9346	{
9347	#if defined(DRFLAC_SUPPORT_SSE2)
9348	if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= `24`) {
9349	drflac_read_pcm_frames_s32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9350	} else
9351	#elif defined(DRFLAC_SUPPORT_NEON)
9352	if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= `24`) {
9353	drflac_read_pcm_frames_s32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9354	} else
9355	#endif
9356	{
9357	/ Scalar fallback. /
9358	#if 0
9359	drflac_read_pcm_frames_s32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9360	#else
9361	drflac_read_pcm_frames_s32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9362	#endif
9363	}
9364	}
9365
9366
9367	#if 0
9368	static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9369	{
9370	for (drflac_uint64 i = `0`; i < frameCount; ++i) {
9371	pOutputSamples[i*`2`+`0`] = (drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample));
9372	pOutputSamples[i*`2`+`1`] = (drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample));
9373	}
9374	}
9375	#endif
9376
9377	static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9378	{
9379	drflac_uint64 i;
9380	drflac_uint64 frameCount4 = frameCount >> `2`;
9381	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9382	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9383	drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9384	drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9385
9386	for (i = `0`; i < frameCount4; ++i) {
9387	drflac_uint32 tempL0 = pInputSamples0U32[i*`4`+`0`] << shift0;
9388	drflac_uint32 tempL1 = pInputSamples0U32[i*`4`+`1`] << shift0;
9389	drflac_uint32 tempL2 = pInputSamples0U32[i*`4`+`2`] << shift0;
9390	drflac_uint32 tempL3 = pInputSamples0U32[i*`4`+`3`] << shift0;
9391
9392	drflac_uint32 tempR0 = pInputSamples1U32[i*`4`+`0`] << shift1;
9393	drflac_uint32 tempR1 = pInputSamples1U32[i*`4`+`1`] << shift1;
9394	drflac_uint32 tempR2 = pInputSamples1U32[i*`4`+`2`] << shift1;
9395	drflac_uint32 tempR3 = pInputSamples1U32[i*`4`+`3`] << shift1;
9396
9397	pOutputSamples[i*`8`+`0`] = (drflac_int32)tempL0;
9398	pOutputSamples[i*`8`+`1`] = (drflac_int32)tempR0;
9399	pOutputSamples[i*`8`+`2`] = (drflac_int32)tempL1;
9400	pOutputSamples[i*`8`+`3`] = (drflac_int32)tempR1;
9401	pOutputSamples[i*`8`+`4`] = (drflac_int32)tempL2;
9402	pOutputSamples[i*`8`+`5`] = (drflac_int32)tempR2;
9403	pOutputSamples[i*`8`+`6`] = (drflac_int32)tempL3;
9404	pOutputSamples[i*`8`+`7`] = (drflac_int32)tempR3;
9405	}
9406
9407	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
9408	pOutputSamples[i*`2`+`0`] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9409	pOutputSamples[i*`2`+`1`] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9410	}
9411	}
9412
9413	#if defined(DRFLAC_SUPPORT_SSE2)
9414	static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9415	{
9416	drflac_uint64 i;
9417	drflac_uint64 frameCount4 = frameCount >> `2`;
9418	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9419	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9420	drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9421	drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9422
9423	for (i = `0`; i < frameCount4; ++i) {
9424	__m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9425	__m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9426
9427	_mm_storeu_si128((__m128i)(pOutputSamples + i`8` + `0`), _mm_unpacklo_epi32(left, right));
9428	_mm_storeu_si128((__m128i)(pOutputSamples + i`8` + `4`), _mm_unpackhi_epi32(left, right));
9429	}
9430
9431	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
9432	pOutputSamples[i*`2`+`0`] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9433	pOutputSamples[i*`2`+`1`] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9434	}
9435	}
9436	#endif
9437
9438	#if defined(DRFLAC_SUPPORT_NEON)
9439	static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9440	{
9441	drflac_uint64 i;
9442	drflac_uint64 frameCount4 = frameCount >> `2`;
9443	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9444	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9445	drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9446	drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9447
9448	int32x4_t shift4_0 = vdupq_n_s32(shift0);
9449	int32x4_t shift4_1 = vdupq_n_s32(shift1);
9450
9451	for (i = `0`; i < frameCount4; ++i) {
9452	int32x4_t left;
9453	int32x4_t right;
9454
9455	left = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*`4`), shift4_0));
9456	right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*`4`), shift4_1));
9457
9458	drflac__vst2q_s32(pOutputSamples + i*`8`, vzipq_s32(left, right));
9459	}
9460
9461	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
9462	pOutputSamples[i*`2`+`0`] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9463	pOutputSamples[i*`2`+`1`] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9464	}
9465	}
9466	#endif
9467
9468	static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9469	{
9470	#if defined(DRFLAC_SUPPORT_SSE2)
9471	if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= `24`) {
9472	drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9473	} else
9474	#elif defined(DRFLAC_SUPPORT_NEON)
9475	if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= `24`) {
9476	drflac_read_pcm_frames_s32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9477	} else
9478	#endif
9479	{
9480	/ Scalar fallback. /
9481	#if 0
9482	drflac_read_pcm_frames_s32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9483	#else
9484	drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9485	#endif
9486	}
9487	}
9488
9489
9490	DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut)
9491	{
9492	drflac_uint64 framesRead;
9493	drflac_uint32 unusedBitsPerSample;
9494
9495	if (pFlac == NULL \|\| framesToRead == `0`) {
9496	return `0`;
9497	}
9498
9499	if (pBufferOut == NULL) {
9500	return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
9501	}
9502
9503	DRFLAC_ASSERT(pFlac->bitsPerSample <= `32`);
9504	unusedBitsPerSample = `32` - pFlac->bitsPerSample;
9505
9506	framesRead = `0`;
9507	while (framesToRead > `0`) {
9508	/ If we've run out of samples in this frame, go to the next. /
9509	if (pFlac->currentFLACFrame.pcmFramesRemaining == `0`) {
9510	if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
9511	break; / Couldn't read the next frame, so just break from the loop and return. /
9512	}
9513	} else {
9514	unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
9515	drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
9516	drflac_uint64 frameCountThisIteration = framesToRead;
9517
9518	if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
9519	frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
9520	}
9521
9522	if (channelCount == `2`) {
9523	const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[`0`].pSamplesS32 + iFirstPCMFrame;
9524	const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[`1`].pSamplesS32 + iFirstPCMFrame;
9525
9526	switch (pFlac->currentFLACFrame.header.channelAssignment)
9527	{
9528	case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
9529	{
9530	drflac_read_pcm_frames_s32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9531	} break;
9532
9533	case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
9534	{
9535	drflac_read_pcm_frames_s32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9536	} break;
9537
9538	case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
9539	{
9540	drflac_read_pcm_frames_s32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9541	} break;
9542
9543	case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
9544	default:
9545	{
9546	drflac_read_pcm_frames_s32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9547	} break;
9548	}
9549	} else {
9550	/ Generic interleaving. /
9551	drflac_uint64 i;
9552	for (i = `0`; i < frameCountThisIteration; ++i) {
9553	unsigned int j;
9554	for (j = `0`; j < channelCount; ++j) {
9555	pBufferOut[(i*channelCount)+j] = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
9556	}
9557	}
9558	}
9559
9560	framesRead += frameCountThisIteration;
9561	pBufferOut += frameCountThisIteration * channelCount;
9562	framesToRead -= frameCountThisIteration;
9563	pFlac->currentPCMFrame += frameCountThisIteration;
9564	pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
9565	}
9566	}
9567
9568	return framesRead;
9569	}
9570
9571
9572	#if 0
9573	static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9574	{
9575	drflac_uint64 i;
9576	for (i = `0`; i < frameCount; ++i) {
9577	drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample);
9578	drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample);
9579	drflac_uint32 right = left - side;
9580
9581	left >>= `16`;
9582	right >>= `16`;
9583
9584	pOutputSamples[i*`2`+`0`] = (drflac_int16)left;
9585	pOutputSamples[i*`2`+`1`] = (drflac_int16)right;
9586	}
9587	}
9588	#endif
9589
9590	static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9591	{
9592	drflac_uint64 i;
9593	drflac_uint64 frameCount4 = frameCount >> `2`;
9594	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9595	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9596	drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9597	drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9598
9599	for (i = `0`; i < frameCount4; ++i) {
9600	drflac_uint32 left0 = pInputSamples0U32[i*`4`+`0`] << shift0;
9601	drflac_uint32 left1 = pInputSamples0U32[i*`4`+`1`] << shift0;
9602	drflac_uint32 left2 = pInputSamples0U32[i*`4`+`2`] << shift0;
9603	drflac_uint32 left3 = pInputSamples0U32[i*`4`+`3`] << shift0;
9604
9605	drflac_uint32 side0 = pInputSamples1U32[i*`4`+`0`] << shift1;
9606	drflac_uint32 side1 = pInputSamples1U32[i*`4`+`1`] << shift1;
9607	drflac_uint32 side2 = pInputSamples1U32[i*`4`+`2`] << shift1;
9608	drflac_uint32 side3 = pInputSamples1U32[i*`4`+`3`] << shift1;
9609
9610	drflac_uint32 right0 = left0 - side0;
9611	drflac_uint32 right1 = left1 - side1;
9612	drflac_uint32 right2 = left2 - side2;
9613	drflac_uint32 right3 = left3 - side3;
9614
9615	left0 >>= `16`;
9616	left1 >>= `16`;
9617	left2 >>= `16`;
9618	left3 >>= `16`;
9619
9620	right0 >>= `16`;
9621	right1 >>= `16`;
9622	right2 >>= `16`;
9623	right3 >>= `16`;
9624
9625	pOutputSamples[i*`8`+`0`] = (drflac_int16)left0;
9626	pOutputSamples[i*`8`+`1`] = (drflac_int16)right0;
9627	pOutputSamples[i*`8`+`2`] = (drflac_int16)left1;
9628	pOutputSamples[i*`8`+`3`] = (drflac_int16)right1;
9629	pOutputSamples[i*`8`+`4`] = (drflac_int16)left2;
9630	pOutputSamples[i*`8`+`5`] = (drflac_int16)right2;
9631	pOutputSamples[i*`8`+`6`] = (drflac_int16)left3;
9632	pOutputSamples[i*`8`+`7`] = (drflac_int16)right3;
9633	}
9634
9635	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
9636	drflac_uint32 left = pInputSamples0U32[i] << shift0;
9637	drflac_uint32 side = pInputSamples1U32[i] << shift1;
9638	drflac_uint32 right = left - side;
9639
9640	left >>= `16`;
9641	right >>= `16`;
9642
9643	pOutputSamples[i*`2`+`0`] = (drflac_int16)left;
9644	pOutputSamples[i*`2`+`1`] = (drflac_int16)right;
9645	}
9646	}
9647
9648	#if defined(DRFLAC_SUPPORT_SSE2)
9649	static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9650	{
9651	drflac_uint64 i;
9652	drflac_uint64 frameCount4 = frameCount >> `2`;
9653	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9654	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9655	drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9656	drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9657
9658	DRFLAC_ASSERT(pFlac->bitsPerSample <= `24`);
9659
9660	for (i = `0`; i < frameCount4; ++i) {
9661	__m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9662	__m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9663	__m128i right = _mm_sub_epi32(left, side);
9664
9665	left = _mm_srai_epi32(left, `16`);
9666	right = _mm_srai_epi32(right, `16`);
9667
9668	_mm_storeu_si128((__m128i)(pOutputSamples + i`8`), drflac__mm_packs_interleaved_epi32(left, right));
9669	}
9670
9671	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
9672	drflac_uint32 left = pInputSamples0U32[i] << shift0;
9673	drflac_uint32 side = pInputSamples1U32[i] << shift1;
9674	drflac_uint32 right = left - side;
9675
9676	left >>= `16`;
9677	right >>= `16`;
9678
9679	pOutputSamples[i*`2`+`0`] = (drflac_int16)left;
9680	pOutputSamples[i*`2`+`1`] = (drflac_int16)right;
9681	}
9682	}
9683	#endif
9684
9685	#if defined(DRFLAC_SUPPORT_NEON)
9686	static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9687	{
9688	drflac_uint64 i;
9689	drflac_uint64 frameCount4 = frameCount >> `2`;
9690	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9691	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9692	drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9693	drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9694	int32x4_t shift0_4;
9695	int32x4_t shift1_4;
9696
9697	DRFLAC_ASSERT(pFlac->bitsPerSample <= `24`);
9698
9699	shift0_4 = vdupq_n_s32(shift0);
9700	shift1_4 = vdupq_n_s32(shift1);
9701
9702	for (i = `0`; i < frameCount4; ++i) {
9703	uint32x4_t left;
9704	uint32x4_t side;
9705	uint32x4_t right;
9706
9707	left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*`4`), shift0_4);
9708	side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*`4`), shift1_4);
9709	right = vsubq_u32(left, side);
9710
9711	left = vshrq_n_u32(left, `16`);
9712	right = vshrq_n_u32(right, `16`);
9713
9714	drflac__vst2q_u16((drflac_uint16)pOutputSamples + i`8`, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
9715	}
9716
9717	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
9718	drflac_uint32 left = pInputSamples0U32[i] << shift0;
9719	drflac_uint32 side = pInputSamples1U32[i] << shift1;
9720	drflac_uint32 right = left - side;
9721
9722	left >>= `16`;
9723	right >>= `16`;
9724
9725	pOutputSamples[i*`2`+`0`] = (drflac_int16)left;
9726	pOutputSamples[i*`2`+`1`] = (drflac_int16)right;
9727	}
9728	}
9729	#endif
9730
9731	static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9732	{
9733	#if defined(DRFLAC_SUPPORT_SSE2)
9734	if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= `24`) {
9735	drflac_read_pcm_frames_s16__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9736	} else
9737	#elif defined(DRFLAC_SUPPORT_NEON)
9738	if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= `24`) {
9739	drflac_read_pcm_frames_s16__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9740	} else
9741	#endif
9742	{
9743	/ Scalar fallback. /
9744	#if 0
9745	drflac_read_pcm_frames_s16__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9746	#else
9747	drflac_read_pcm_frames_s16__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9748	#endif
9749	}
9750	}
9751
9752
9753	#if 0
9754	static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9755	{
9756	drflac_uint64 i;
9757	for (i = `0`; i < frameCount; ++i) {
9758	drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample);
9759	drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample);
9760	drflac_uint32 left = right + side;
9761
9762	left >>= `16`;
9763	right >>= `16`;
9764
9765	pOutputSamples[i*`2`+`0`] = (drflac_int16)left;
9766	pOutputSamples[i*`2`+`1`] = (drflac_int16)right;
9767	}
9768	}
9769	#endif
9770
9771	static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9772	{
9773	drflac_uint64 i;
9774	drflac_uint64 frameCount4 = frameCount >> `2`;
9775	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9776	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9777	drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9778	drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9779
9780	for (i = `0`; i < frameCount4; ++i) {
9781	drflac_uint32 side0 = pInputSamples0U32[i*`4`+`0`] << shift0;
9782	drflac_uint32 side1 = pInputSamples0U32[i*`4`+`1`] << shift0;
9783	drflac_uint32 side2 = pInputSamples0U32[i*`4`+`2`] << shift0;
9784	drflac_uint32 side3 = pInputSamples0U32[i*`4`+`3`] << shift0;
9785
9786	drflac_uint32 right0 = pInputSamples1U32[i*`4`+`0`] << shift1;
9787	drflac_uint32 right1 = pInputSamples1U32[i*`4`+`1`] << shift1;
9788	drflac_uint32 right2 = pInputSamples1U32[i*`4`+`2`] << shift1;
9789	drflac_uint32 right3 = pInputSamples1U32[i*`4`+`3`] << shift1;
9790
9791	drflac_uint32 left0 = right0 + side0;
9792	drflac_uint32 left1 = right1 + side1;
9793	drflac_uint32 left2 = right2 + side2;
9794	drflac_uint32 left3 = right3 + side3;
9795
9796	left0 >>= `16`;
9797	left1 >>= `16`;
9798	left2 >>= `16`;
9799	left3 >>= `16`;
9800
9801	right0 >>= `16`;
9802	right1 >>= `16`;
9803	right2 >>= `16`;
9804	right3 >>= `16`;
9805
9806	pOutputSamples[i*`8`+`0`] = (drflac_int16)left0;
9807	pOutputSamples[i*`8`+`1`] = (drflac_int16)right0;
9808	pOutputSamples[i*`8`+`2`] = (drflac_int16)left1;
9809	pOutputSamples[i*`8`+`3`] = (drflac_int16)right1;
9810	pOutputSamples[i*`8`+`4`] = (drflac_int16)left2;
9811	pOutputSamples[i*`8`+`5`] = (drflac_int16)right2;
9812	pOutputSamples[i*`8`+`6`] = (drflac_int16)left3;
9813	pOutputSamples[i*`8`+`7`] = (drflac_int16)right3;
9814	}
9815
9816	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
9817	drflac_uint32 side = pInputSamples0U32[i] << shift0;
9818	drflac_uint32 right = pInputSamples1U32[i] << shift1;
9819	drflac_uint32 left = right + side;
9820
9821	left >>= `16`;
9822	right >>= `16`;
9823
9824	pOutputSamples[i*`2`+`0`] = (drflac_int16)left;
9825	pOutputSamples[i*`2`+`1`] = (drflac_int16)right;
9826	}
9827	}
9828
9829	#if defined(DRFLAC_SUPPORT_SSE2)
9830	static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9831	{
9832	drflac_uint64 i;
9833	drflac_uint64 frameCount4 = frameCount >> `2`;
9834	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9835	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9836	drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9837	drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9838
9839	DRFLAC_ASSERT(pFlac->bitsPerSample <= `24`);
9840
9841	for (i = `0`; i < frameCount4; ++i) {
9842	__m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9843	__m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9844	__m128i left = _mm_add_epi32(right, side);
9845
9846	left = _mm_srai_epi32(left, `16`);
9847	right = _mm_srai_epi32(right, `16`);
9848
9849	_mm_storeu_si128((__m128i)(pOutputSamples + i`8`), drflac__mm_packs_interleaved_epi32(left, right));
9850	}
9851
9852	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
9853	drflac_uint32 side = pInputSamples0U32[i] << shift0;
9854	drflac_uint32 right = pInputSamples1U32[i] << shift1;
9855	drflac_uint32 left = right + side;
9856
9857	left >>= `16`;
9858	right >>= `16`;
9859
9860	pOutputSamples[i*`2`+`0`] = (drflac_int16)left;
9861	pOutputSamples[i*`2`+`1`] = (drflac_int16)right;
9862	}
9863	}
9864	#endif
9865
9866	#if defined(DRFLAC_SUPPORT_NEON)
9867	static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9868	{
9869	drflac_uint64 i;
9870	drflac_uint64 frameCount4 = frameCount >> `2`;
9871	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9872	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9873	drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9874	drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9875	int32x4_t shift0_4;
9876	int32x4_t shift1_4;
9877
9878	DRFLAC_ASSERT(pFlac->bitsPerSample <= `24`);
9879
9880	shift0_4 = vdupq_n_s32(shift0);
9881	shift1_4 = vdupq_n_s32(shift1);
9882
9883	for (i = `0`; i < frameCount4; ++i) {
9884	uint32x4_t side;
9885	uint32x4_t right;
9886	uint32x4_t left;
9887
9888	side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*`4`), shift0_4);
9889	right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*`4`), shift1_4);
9890	left = vaddq_u32(right, side);
9891
9892	left = vshrq_n_u32(left, `16`);
9893	right = vshrq_n_u32(right, `16`);
9894
9895	drflac__vst2q_u16((drflac_uint16)pOutputSamples + i`8`, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
9896	}
9897
9898	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
9899	drflac_uint32 side = pInputSamples0U32[i] << shift0;
9900	drflac_uint32 right = pInputSamples1U32[i] << shift1;
9901	drflac_uint32 left = right + side;
9902
9903	left >>= `16`;
9904	right >>= `16`;
9905
9906	pOutputSamples[i*`2`+`0`] = (drflac_int16)left;
9907	pOutputSamples[i*`2`+`1`] = (drflac_int16)right;
9908	}
9909	}
9910	#endif
9911
9912	static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9913	{
9914	#if defined(DRFLAC_SUPPORT_SSE2)
9915	if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= `24`) {
9916	drflac_read_pcm_frames_s16__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9917	} else
9918	#elif defined(DRFLAC_SUPPORT_NEON)
9919	if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= `24`) {
9920	drflac_read_pcm_frames_s16__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9921	} else
9922	#endif
9923	{
9924	/ Scalar fallback. /
9925	#if 0
9926	drflac_read_pcm_frames_s16__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9927	#else
9928	drflac_read_pcm_frames_s16__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9929	#endif
9930	}
9931	}
9932
9933
9934	#if 0
9935	static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9936	{
9937	for (drflac_uint64 i = `0`; i < frameCount; ++i) {
9938	drflac_uint32 mid = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9939	drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9940
9941	mid = (mid << `1`) \| (side & `0x01`);
9942
9943	pOutputSamples[i*`2`+`0`] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> `1`) << unusedBitsPerSample) >> `16`);
9944	pOutputSamples[i*`2`+`1`] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> `1`) << unusedBitsPerSample) >> `16`);
9945	}
9946	}
9947	#endif
9948
9949	static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9950	{
9951	drflac_uint64 i;
9952	drflac_uint64 frameCount4 = frameCount >> `2`;
9953	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9954	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9955	drflac_uint32 shift = unusedBitsPerSample;
9956
9957	if (shift > `0`) {
9958	shift -= `1`;
9959	for (i = `0`; i < frameCount4; ++i) {
9960	drflac_uint32 temp0L;
9961	drflac_uint32 temp1L;
9962	drflac_uint32 temp2L;
9963	drflac_uint32 temp3L;
9964	drflac_uint32 temp0R;
9965	drflac_uint32 temp1R;
9966	drflac_uint32 temp2R;
9967	drflac_uint32 temp3R;
9968
9969	drflac_uint32 mid0 = pInputSamples0U32[i*`4`+`0`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9970	drflac_uint32 mid1 = pInputSamples0U32[i*`4`+`1`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9971	drflac_uint32 mid2 = pInputSamples0U32[i*`4`+`2`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9972	drflac_uint32 mid3 = pInputSamples0U32[i*`4`+`3`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
9973
9974	drflac_uint32 side0 = pInputSamples1U32[i*`4`+`0`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9975	drflac_uint32 side1 = pInputSamples1U32[i*`4`+`1`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9976	drflac_uint32 side2 = pInputSamples1U32[i*`4`+`2`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9977	drflac_uint32 side3 = pInputSamples1U32[i*`4`+`3`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
9978
9979	mid0 = (mid0 << `1`) \| (side0 & `0x01`);
9980	mid1 = (mid1 << `1`) \| (side1 & `0x01`);
9981	mid2 = (mid2 << `1`) \| (side2 & `0x01`);
9982	mid3 = (mid3 << `1`) \| (side3 & `0x01`);
9983
9984	temp0L = (mid0 + side0) << shift;
9985	temp1L = (mid1 + side1) << shift;
9986	temp2L = (mid2 + side2) << shift;
9987	temp3L = (mid3 + side3) << shift;
9988
9989	temp0R = (mid0 - side0) << shift;
9990	temp1R = (mid1 - side1) << shift;
9991	temp2R = (mid2 - side2) << shift;
9992	temp3R = (mid3 - side3) << shift;
9993
9994	temp0L >>= `16`;
9995	temp1L >>= `16`;
9996	temp2L >>= `16`;
9997	temp3L >>= `16`;
9998
9999	temp0R >>= `16`;
10000	temp1R >>= `16`;
10001	temp2R >>= `16`;
10002	temp3R >>= `16`;
10003
10004	pOutputSamples[i*`8`+`0`] = (drflac_int16)temp0L;
10005	pOutputSamples[i*`8`+`1`] = (drflac_int16)temp0R;
10006	pOutputSamples[i*`8`+`2`] = (drflac_int16)temp1L;
10007	pOutputSamples[i*`8`+`3`] = (drflac_int16)temp1R;
10008	pOutputSamples[i*`8`+`4`] = (drflac_int16)temp2L;
10009	pOutputSamples[i*`8`+`5`] = (drflac_int16)temp2R;
10010	pOutputSamples[i*`8`+`6`] = (drflac_int16)temp3L;
10011	pOutputSamples[i*`8`+`7`] = (drflac_int16)temp3R;
10012	}
10013	} else {
10014	for (i = `0`; i < frameCount4; ++i) {
10015	drflac_uint32 temp0L;
10016	drflac_uint32 temp1L;
10017	drflac_uint32 temp2L;
10018	drflac_uint32 temp3L;
10019	drflac_uint32 temp0R;
10020	drflac_uint32 temp1R;
10021	drflac_uint32 temp2R;
10022	drflac_uint32 temp3R;
10023
10024	drflac_uint32 mid0 = pInputSamples0U32[i*`4`+`0`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10025	drflac_uint32 mid1 = pInputSamples0U32[i*`4`+`1`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10026	drflac_uint32 mid2 = pInputSamples0U32[i*`4`+`2`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10027	drflac_uint32 mid3 = pInputSamples0U32[i*`4`+`3`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10028
10029	drflac_uint32 side0 = pInputSamples1U32[i*`4`+`0`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10030	drflac_uint32 side1 = pInputSamples1U32[i*`4`+`1`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10031	drflac_uint32 side2 = pInputSamples1U32[i*`4`+`2`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10032	drflac_uint32 side3 = pInputSamples1U32[i*`4`+`3`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10033
10034	mid0 = (mid0 << `1`) \| (side0 & `0x01`);
10035	mid1 = (mid1 << `1`) \| (side1 & `0x01`);
10036	mid2 = (mid2 << `1`) \| (side2 & `0x01`);
10037	mid3 = (mid3 << `1`) \| (side3 & `0x01`);
10038
10039	temp0L = ((drflac_int32)(mid0 + side0) >> `1`);
10040	temp1L = ((drflac_int32)(mid1 + side1) >> `1`);
10041	temp2L = ((drflac_int32)(mid2 + side2) >> `1`);
10042	temp3L = ((drflac_int32)(mid3 + side3) >> `1`);
10043
10044	temp0R = ((drflac_int32)(mid0 - side0) >> `1`);
10045	temp1R = ((drflac_int32)(mid1 - side1) >> `1`);
10046	temp2R = ((drflac_int32)(mid2 - side2) >> `1`);
10047	temp3R = ((drflac_int32)(mid3 - side3) >> `1`);
10048
10049	temp0L >>= `16`;
10050	temp1L >>= `16`;
10051	temp2L >>= `16`;
10052	temp3L >>= `16`;
10053
10054	temp0R >>= `16`;
10055	temp1R >>= `16`;
10056	temp2R >>= `16`;
10057	temp3R >>= `16`;
10058
10059	pOutputSamples[i*`8`+`0`] = (drflac_int16)temp0L;
10060	pOutputSamples[i*`8`+`1`] = (drflac_int16)temp0R;
10061	pOutputSamples[i*`8`+`2`] = (drflac_int16)temp1L;
10062	pOutputSamples[i*`8`+`3`] = (drflac_int16)temp1R;
10063	pOutputSamples[i*`8`+`4`] = (drflac_int16)temp2L;
10064	pOutputSamples[i*`8`+`5`] = (drflac_int16)temp2R;
10065	pOutputSamples[i*`8`+`6`] = (drflac_int16)temp3L;
10066	pOutputSamples[i*`8`+`7`] = (drflac_int16)temp3R;
10067	}
10068	}
10069
10070	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
10071	drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10072	drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10073
10074	mid = (mid << `1`) \| (side & `0x01`);
10075
10076	pOutputSamples[i*`2`+`0`] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> `1`) << unusedBitsPerSample) >> `16`);
10077	pOutputSamples[i*`2`+`1`] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> `1`) << unusedBitsPerSample) >> `16`);
10078	}
10079	}
10080
10081	#if defined(DRFLAC_SUPPORT_SSE2)
10082	static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10083	{
10084	drflac_uint64 i;
10085	drflac_uint64 frameCount4 = frameCount >> `2`;
10086	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10087	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10088	drflac_uint32 shift = unusedBitsPerSample;
10089
10090	DRFLAC_ASSERT(pFlac->bitsPerSample <= `24`);
10091
10092	if (shift == `0`) {
10093	for (i = `0`; i < frameCount4; ++i) {
10094	__m128i mid;
10095	__m128i side;
10096	__m128i left;
10097	__m128i right;
10098
10099	mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample);
10100	side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample);
10101
10102	mid = _mm_or_si128(_mm_slli_epi32(mid, `1`), _mm_and_si128(side, _mm_set1_epi32(`0x01`)));
10103
10104	left = _mm_srai_epi32(_mm_add_epi32(mid, side), `1`);
10105	right = _mm_srai_epi32(_mm_sub_epi32(mid, side), `1`);
10106
10107	left = _mm_srai_epi32(left, `16`);
10108	right = _mm_srai_epi32(right, `16`);
10109
10110	_mm_storeu_si128((__m128i)(pOutputSamples + i`8`), drflac__mm_packs_interleaved_epi32(left, right));
10111	}
10112
10113	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
10114	drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10115	drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10116
10117	mid = (mid << `1`) \| (side & `0x01`);
10118
10119	pOutputSamples[i*`2`+`0`] = (drflac_int16)(((drflac_int32)(mid + side) >> `1`) >> `16`);
10120	pOutputSamples[i*`2`+`1`] = (drflac_int16)(((drflac_int32)(mid - side) >> `1`) >> `16`);
10121	}
10122	} else {
10123	shift -= `1`;
10124	for (i = `0`; i < frameCount4; ++i) {
10125	__m128i mid;
10126	__m128i side;
10127	__m128i left;
10128	__m128i right;
10129
10130	mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample);
10131	side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample);
10132
10133	mid = _mm_or_si128(_mm_slli_epi32(mid, `1`), _mm_and_si128(side, _mm_set1_epi32(`0x01`)));
10134
10135	left = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
10136	right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
10137
10138	left = _mm_srai_epi32(left, `16`);
10139	right = _mm_srai_epi32(right, `16`);
10140
10141	_mm_storeu_si128((__m128i)(pOutputSamples + i`8`), drflac__mm_packs_interleaved_epi32(left, right));
10142	}
10143
10144	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
10145	drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10146	drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10147
10148	mid = (mid << `1`) \| (side & `0x01`);
10149
10150	pOutputSamples[i*`2`+`0`] = (drflac_int16)(((mid + side) << shift) >> `16`);
10151	pOutputSamples[i*`2`+`1`] = (drflac_int16)(((mid - side) << shift) >> `16`);
10152	}
10153	}
10154	}
10155	#endif
10156
10157	#if defined(DRFLAC_SUPPORT_NEON)
10158	static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10159	{
10160	drflac_uint64 i;
10161	drflac_uint64 frameCount4 = frameCount >> `2`;
10162	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10163	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10164	drflac_uint32 shift = unusedBitsPerSample;
10165	int32x4_t wbpsShift0_4; / wbps = Wasted Bits Per Sample /
10166	int32x4_t wbpsShift1_4; / wbps = Wasted Bits Per Sample /
10167
10168	DRFLAC_ASSERT(pFlac->bitsPerSample <= `24`);
10169
10170	wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample);
10171	wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample);
10172
10173	if (shift == `0`) {
10174	for (i = `0`; i < frameCount4; ++i) {
10175	uint32x4_t mid;
10176	uint32x4_t side;
10177	int32x4_t left;
10178	int32x4_t right;
10179
10180	mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*`4`), wbpsShift0_4);
10181	side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*`4`), wbpsShift1_4);
10182
10183	mid = vorrq_u32(vshlq_n_u32(mid, `1`), vandq_u32(side, vdupq_n_u32(`1`)));
10184
10185	left = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), `1`);
10186	right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), `1`);
10187
10188	left = vshrq_n_s32(left, `16`);
10189	right = vshrq_n_s32(right, `16`);
10190
10191	drflac__vst2q_s16(pOutputSamples + i*`8`, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10192	}
10193
10194	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
10195	drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10196	drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10197
10198	mid = (mid << `1`) \| (side & `0x01`);
10199
10200	pOutputSamples[i*`2`+`0`] = (drflac_int16)(((drflac_int32)(mid + side) >> `1`) >> `16`);
10201	pOutputSamples[i*`2`+`1`] = (drflac_int16)(((drflac_int32)(mid - side) >> `1`) >> `16`);
10202	}
10203	} else {
10204	int32x4_t shift4;
10205
10206	shift -= `1`;
10207	shift4 = vdupq_n_s32(shift);
10208
10209	for (i = `0`; i < frameCount4; ++i) {
10210	uint32x4_t mid;
10211	uint32x4_t side;
10212	int32x4_t left;
10213	int32x4_t right;
10214
10215	mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*`4`), wbpsShift0_4);
10216	side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*`4`), wbpsShift1_4);
10217
10218	mid = vorrq_u32(vshlq_n_u32(mid, `1`), vandq_u32(side, vdupq_n_u32(`1`)));
10219
10220	left = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
10221	right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
10222
10223	left = vshrq_n_s32(left, `16`);
10224	right = vshrq_n_s32(right, `16`);
10225
10226	drflac__vst2q_s16(pOutputSamples + i*`8`, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10227	}
10228
10229	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
10230	drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10231	drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10232
10233	mid = (mid << `1`) \| (side & `0x01`);
10234
10235	pOutputSamples[i*`2`+`0`] = (drflac_int16)(((mid + side) << shift) >> `16`);
10236	pOutputSamples[i*`2`+`1`] = (drflac_int16)(((mid - side) << shift) >> `16`);
10237	}
10238	}
10239	}
10240	#endif
10241
10242	static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10243	{
10244	#if defined(DRFLAC_SUPPORT_SSE2)
10245	if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= `24`) {
10246	drflac_read_pcm_frames_s16__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10247	} else
10248	#elif defined(DRFLAC_SUPPORT_NEON)
10249	if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= `24`) {
10250	drflac_read_pcm_frames_s16__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10251	} else
10252	#endif
10253	{
10254	/ Scalar fallback. /
10255	#if 0
10256	drflac_read_pcm_frames_s16__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10257	#else
10258	drflac_read_pcm_frames_s16__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10259	#endif
10260	}
10261	}
10262
10263
10264	#if 0
10265	static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10266	{
10267	for (drflac_uint64 i = `0`; i < frameCount; ++i) {
10268	pOutputSamples[i*`2`+`0`] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample)) >> `16`);
10269	pOutputSamples[i*`2`+`1`] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample)) >> `16`);
10270	}
10271	}
10272	#endif
10273
10274	static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10275	{
10276	drflac_uint64 i;
10277	drflac_uint64 frameCount4 = frameCount >> `2`;
10278	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10279	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10280	drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10281	drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10282
10283	for (i = `0`; i < frameCount4; ++i) {
10284	drflac_uint32 tempL0 = pInputSamples0U32[i*`4`+`0`] << shift0;
10285	drflac_uint32 tempL1 = pInputSamples0U32[i*`4`+`1`] << shift0;
10286	drflac_uint32 tempL2 = pInputSamples0U32[i*`4`+`2`] << shift0;
10287	drflac_uint32 tempL3 = pInputSamples0U32[i*`4`+`3`] << shift0;
10288
10289	drflac_uint32 tempR0 = pInputSamples1U32[i*`4`+`0`] << shift1;
10290	drflac_uint32 tempR1 = pInputSamples1U32[i*`4`+`1`] << shift1;
10291	drflac_uint32 tempR2 = pInputSamples1U32[i*`4`+`2`] << shift1;
10292	drflac_uint32 tempR3 = pInputSamples1U32[i*`4`+`3`] << shift1;
10293
10294	tempL0 >>= `16`;
10295	tempL1 >>= `16`;
10296	tempL2 >>= `16`;
10297	tempL3 >>= `16`;
10298
10299	tempR0 >>= `16`;
10300	tempR1 >>= `16`;
10301	tempR2 >>= `16`;
10302	tempR3 >>= `16`;
10303
10304	pOutputSamples[i*`8`+`0`] = (drflac_int16)tempL0;
10305	pOutputSamples[i*`8`+`1`] = (drflac_int16)tempR0;
10306	pOutputSamples[i*`8`+`2`] = (drflac_int16)tempL1;
10307	pOutputSamples[i*`8`+`3`] = (drflac_int16)tempR1;
10308	pOutputSamples[i*`8`+`4`] = (drflac_int16)tempL2;
10309	pOutputSamples[i*`8`+`5`] = (drflac_int16)tempR2;
10310	pOutputSamples[i*`8`+`6`] = (drflac_int16)tempL3;
10311	pOutputSamples[i*`8`+`7`] = (drflac_int16)tempR3;
10312	}
10313
10314	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
10315	pOutputSamples[i*`2`+`0`] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> `16`);
10316	pOutputSamples[i*`2`+`1`] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> `16`);
10317	}
10318	}
10319
10320	#if defined(DRFLAC_SUPPORT_SSE2)
10321	static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10322	{
10323	drflac_uint64 i;
10324	drflac_uint64 frameCount4 = frameCount >> `2`;
10325	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10326	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10327	drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10328	drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10329
10330	for (i = `0`; i < frameCount4; ++i) {
10331	__m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10332	__m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10333
10334	left = _mm_srai_epi32(left, `16`);
10335	right = _mm_srai_epi32(right, `16`);
10336
10337	/ At this point we have results. We can now pack and interleave these into a single __m128i object and then store the in the output buffer. /
10338	_mm_storeu_si128((__m128i)(pOutputSamples + i`8`), drflac__mm_packs_interleaved_epi32(left, right));
10339	}
10340
10341	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
10342	pOutputSamples[i*`2`+`0`] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> `16`);
10343	pOutputSamples[i*`2`+`1`] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> `16`);
10344	}
10345	}
10346	#endif
10347
10348	#if defined(DRFLAC_SUPPORT_NEON)
10349	static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10350	{
10351	drflac_uint64 i;
10352	drflac_uint64 frameCount4 = frameCount >> `2`;
10353	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10354	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10355	drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10356	drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10357
10358	int32x4_t shift0_4 = vdupq_n_s32(shift0);
10359	int32x4_t shift1_4 = vdupq_n_s32(shift1);
10360
10361	for (i = `0`; i < frameCount4; ++i) {
10362	int32x4_t left;
10363	int32x4_t right;
10364
10365	left = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*`4`), shift0_4));
10366	right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*`4`), shift1_4));
10367
10368	left = vshrq_n_s32(left, `16`);
10369	right = vshrq_n_s32(right, `16`);
10370
10371	drflac__vst2q_s16(pOutputSamples + i*`8`, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10372	}
10373
10374	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
10375	pOutputSamples[i*`2`+`0`] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> `16`);
10376	pOutputSamples[i*`2`+`1`] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> `16`);
10377	}
10378	}
10379	#endif
10380
10381	static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10382	{
10383	#if defined(DRFLAC_SUPPORT_SSE2)
10384	if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= `24`) {
10385	drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10386	} else
10387	#elif defined(DRFLAC_SUPPORT_NEON)
10388	if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= `24`) {
10389	drflac_read_pcm_frames_s16__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10390	} else
10391	#endif
10392	{
10393	/ Scalar fallback. /
10394	#if 0
10395	drflac_read_pcm_frames_s16__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10396	#else
10397	drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10398	#endif
10399	}
10400	}
10401
10402	DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut)
10403	{
10404	drflac_uint64 framesRead;
10405	drflac_uint32 unusedBitsPerSample;
10406
10407	if (pFlac == NULL \|\| framesToRead == `0`) {
10408	return `0`;
10409	}
10410
10411	if (pBufferOut == NULL) {
10412	return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
10413	}
10414
10415	DRFLAC_ASSERT(pFlac->bitsPerSample <= `32`);
10416	unusedBitsPerSample = `32` - pFlac->bitsPerSample;
10417
10418	framesRead = `0`;
10419	while (framesToRead > `0`) {
10420	/ If we've run out of samples in this frame, go to the next. /
10421	if (pFlac->currentFLACFrame.pcmFramesRemaining == `0`) {
10422	if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
10423	break; / Couldn't read the next frame, so just break from the loop and return. /
10424	}
10425	} else {
10426	unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
10427	drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
10428	drflac_uint64 frameCountThisIteration = framesToRead;
10429
10430	if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
10431	frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
10432	}
10433
10434	if (channelCount == `2`) {
10435	const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[`0`].pSamplesS32 + iFirstPCMFrame;
10436	const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[`1`].pSamplesS32 + iFirstPCMFrame;
10437
10438	switch (pFlac->currentFLACFrame.header.channelAssignment)
10439	{
10440	case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
10441	{
10442	drflac_read_pcm_frames_s16__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10443	} break;
10444
10445	case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
10446	{
10447	drflac_read_pcm_frames_s16__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10448	} break;
10449
10450	case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
10451	{
10452	drflac_read_pcm_frames_s16__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10453	} break;
10454
10455	case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
10456	default:
10457	{
10458	drflac_read_pcm_frames_s16__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10459	} break;
10460	}
10461	} else {
10462	/ Generic interleaving. /
10463	drflac_uint64 i;
10464	for (i = `0`; i < frameCountThisIteration; ++i) {
10465	unsigned int j;
10466	for (j = `0`; j < channelCount; ++j) {
10467	drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
10468	pBufferOut[(i*channelCount)+j] = (drflac_int16)(sampleS32 >> `16`);
10469	}
10470	}
10471	}
10472
10473	framesRead += frameCountThisIteration;
10474	pBufferOut += frameCountThisIteration * channelCount;
10475	framesToRead -= frameCountThisIteration;
10476	pFlac->currentPCMFrame += frameCountThisIteration;
10477	pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
10478	}
10479	}
10480
10481	return framesRead;
10482	}
10483
10484
10485	#if 0
10486	static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10487	{
10488	drflac_uint64 i;
10489	for (i = `0`; i < frameCount; ++i) {
10490	drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample);
10491	drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample);
10492	drflac_uint32 right = left - side;
10493
10494	pOutputSamples[i`2`+`0`] = (float*)((drflac_int32)left / `2147483648.0`);
10495	pOutputSamples[i`2`+`1`] = (float*)((drflac_int32)right / `2147483648.0`);
10496	}
10497	}
10498	#endif
10499
10500	static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10501	{
10502	drflac_uint64 i;
10503	drflac_uint64 frameCount4 = frameCount >> `2`;
10504	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10505	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10506	drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10507	drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10508
10509	float factor = `1` / `2147483648.0`;
10510
10511	for (i = `0`; i < frameCount4; ++i) {
10512	drflac_uint32 left0 = pInputSamples0U32[i*`4`+`0`] << shift0;
10513	drflac_uint32 left1 = pInputSamples0U32[i*`4`+`1`] << shift0;
10514	drflac_uint32 left2 = pInputSamples0U32[i*`4`+`2`] << shift0;
10515	drflac_uint32 left3 = pInputSamples0U32[i*`4`+`3`] << shift0;
10516
10517	drflac_uint32 side0 = pInputSamples1U32[i*`4`+`0`] << shift1;
10518	drflac_uint32 side1 = pInputSamples1U32[i*`4`+`1`] << shift1;
10519	drflac_uint32 side2 = pInputSamples1U32[i*`4`+`2`] << shift1;
10520	drflac_uint32 side3 = pInputSamples1U32[i*`4`+`3`] << shift1;
10521
10522	drflac_uint32 right0 = left0 - side0;
10523	drflac_uint32 right1 = left1 - side1;
10524	drflac_uint32 right2 = left2 - side2;
10525	drflac_uint32 right3 = left3 - side3;
10526
10527	pOutputSamples[i`8`+`0`] = (drflac_int32)left0 factor;
10528	pOutputSamples[i`8`+`1`] = (drflac_int32)right0 factor;
10529	pOutputSamples[i`8`+`2`] = (drflac_int32)left1 factor;
10530	pOutputSamples[i`8`+`3`] = (drflac_int32)right1 factor;
10531	pOutputSamples[i`8`+`4`] = (drflac_int32)left2 factor;
10532	pOutputSamples[i`8`+`5`] = (drflac_int32)right2 factor;
10533	pOutputSamples[i`8`+`6`] = (drflac_int32)left3 factor;
10534	pOutputSamples[i`8`+`7`] = (drflac_int32)right3 factor;
10535	}
10536
10537	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
10538	drflac_uint32 left = pInputSamples0U32[i] << shift0;
10539	drflac_uint32 side = pInputSamples1U32[i] << shift1;
10540	drflac_uint32 right = left - side;
10541
10542	pOutputSamples[i`2`+`0`] = (drflac_int32)left factor;
10543	pOutputSamples[i`2`+`1`] = (drflac_int32)right factor;
10544	}
10545	}
10546
10547	#if defined(DRFLAC_SUPPORT_SSE2)
10548	static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10549	{
10550	drflac_uint64 i;
10551	drflac_uint64 frameCount4 = frameCount >> `2`;
10552	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10553	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10554	drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample) - `8`;
10555	drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample) - `8`;
10556	__m128 factor;
10557
10558	DRFLAC_ASSERT(pFlac->bitsPerSample <= `24`);
10559
10560	factor = _mm_set1_ps(`1.0f` / `8388608.0f`);
10561
10562	for (i = `0`; i < frameCount4; ++i) {
10563	__m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10564	__m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10565	__m128i right = _mm_sub_epi32(left, side);
10566	__m128 leftf = _mm_mul_ps(_mm_cvtepi32_ps(left), factor);
10567	__m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
10568
10569	_mm_storeu_ps(pOutputSamples + i*`8` + `0`, _mm_unpacklo_ps(leftf, rightf));
10570	_mm_storeu_ps(pOutputSamples + i*`8` + `4`, _mm_unpackhi_ps(leftf, rightf));
10571	}
10572
10573	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
10574	drflac_uint32 left = pInputSamples0U32[i] << shift0;
10575	drflac_uint32 side = pInputSamples1U32[i] << shift1;
10576	drflac_uint32 right = left - side;
10577
10578	pOutputSamples[i*`2`+`0`] = (drflac_int32)left / `8388608.0f`;
10579	pOutputSamples[i*`2`+`1`] = (drflac_int32)right / `8388608.0f`;
10580	}
10581	}
10582	#endif
10583
10584	#if defined(DRFLAC_SUPPORT_NEON)
10585	static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10586	{
10587	drflac_uint64 i;
10588	drflac_uint64 frameCount4 = frameCount >> `2`;
10589	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10590	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10591	drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample) - `8`;
10592	drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample) - `8`;
10593	float32x4_t factor4;
10594	int32x4_t shift0_4;
10595	int32x4_t shift1_4;
10596
10597	DRFLAC_ASSERT(pFlac->bitsPerSample <= `24`);
10598
10599	factor4 = vdupq_n_f32(`1.0f` / `8388608.0f`);
10600	shift0_4 = vdupq_n_s32(shift0);
10601	shift1_4 = vdupq_n_s32(shift1);
10602
10603	for (i = `0`; i < frameCount4; ++i) {
10604	uint32x4_t left;
10605	uint32x4_t side;
10606	uint32x4_t right;
10607	float32x4_t leftf;
10608	float32x4_t rightf;
10609
10610	left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*`4`), shift0_4);
10611	side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*`4`), shift1_4);
10612	right = vsubq_u32(left, side);
10613	leftf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)), factor4);
10614	rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
10615
10616	drflac__vst2q_f32(pOutputSamples + i*`8`, vzipq_f32(leftf, rightf));
10617	}
10618
10619	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
10620	drflac_uint32 left = pInputSamples0U32[i] << shift0;
10621	drflac_uint32 side = pInputSamples1U32[i] << shift1;
10622	drflac_uint32 right = left - side;
10623
10624	pOutputSamples[i*`2`+`0`] = (drflac_int32)left / `8388608.0f`;
10625	pOutputSamples[i*`2`+`1`] = (drflac_int32)right / `8388608.0f`;
10626	}
10627	}
10628	#endif
10629
10630	static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10631	{
10632	#if defined(DRFLAC_SUPPORT_SSE2)
10633	if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= `24`) {
10634	drflac_read_pcm_frames_f32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10635	} else
10636	#elif defined(DRFLAC_SUPPORT_NEON)
10637	if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= `24`) {
10638	drflac_read_pcm_frames_f32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10639	} else
10640	#endif
10641	{
10642	/ Scalar fallback. /
10643	#if 0
10644	drflac_read_pcm_frames_f32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10645	#else
10646	drflac_read_pcm_frames_f32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10647	#endif
10648	}
10649	}
10650
10651
10652	#if 0
10653	static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10654	{
10655	drflac_uint64 i;
10656	for (i = `0`; i < frameCount; ++i) {
10657	drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample);
10658	drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample);
10659	drflac_uint32 left = right + side;
10660
10661	pOutputSamples[i`2`+`0`] = (float*)((drflac_int32)left / `2147483648.0`);
10662	pOutputSamples[i`2`+`1`] = (float*)((drflac_int32)right / `2147483648.0`);
10663	}
10664	}
10665	#endif
10666
10667	static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10668	{
10669	drflac_uint64 i;
10670	drflac_uint64 frameCount4 = frameCount >> `2`;
10671	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10672	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10673	drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10674	drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10675	float factor = `1` / `2147483648.0`;
10676
10677	for (i = `0`; i < frameCount4; ++i) {
10678	drflac_uint32 side0 = pInputSamples0U32[i*`4`+`0`] << shift0;
10679	drflac_uint32 side1 = pInputSamples0U32[i*`4`+`1`] << shift0;
10680	drflac_uint32 side2 = pInputSamples0U32[i*`4`+`2`] << shift0;
10681	drflac_uint32 side3 = pInputSamples0U32[i*`4`+`3`] << shift0;
10682
10683	drflac_uint32 right0 = pInputSamples1U32[i*`4`+`0`] << shift1;
10684	drflac_uint32 right1 = pInputSamples1U32[i*`4`+`1`] << shift1;
10685	drflac_uint32 right2 = pInputSamples1U32[i*`4`+`2`] << shift1;
10686	drflac_uint32 right3 = pInputSamples1U32[i*`4`+`3`] << shift1;
10687
10688	drflac_uint32 left0 = right0 + side0;
10689	drflac_uint32 left1 = right1 + side1;
10690	drflac_uint32 left2 = right2 + side2;
10691	drflac_uint32 left3 = right3 + side3;
10692
10693	pOutputSamples[i`8`+`0`] = (drflac_int32)left0 factor;
10694	pOutputSamples[i`8`+`1`] = (drflac_int32)right0 factor;
10695	pOutputSamples[i`8`+`2`] = (drflac_int32)left1 factor;
10696	pOutputSamples[i`8`+`3`] = (drflac_int32)right1 factor;
10697	pOutputSamples[i`8`+`4`] = (drflac_int32)left2 factor;
10698	pOutputSamples[i`8`+`5`] = (drflac_int32)right2 factor;
10699	pOutputSamples[i`8`+`6`] = (drflac_int32)left3 factor;
10700	pOutputSamples[i`8`+`7`] = (drflac_int32)right3 factor;
10701	}
10702
10703	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
10704	drflac_uint32 side = pInputSamples0U32[i] << shift0;
10705	drflac_uint32 right = pInputSamples1U32[i] << shift1;
10706	drflac_uint32 left = right + side;
10707
10708	pOutputSamples[i`2`+`0`] = (drflac_int32)left factor;
10709	pOutputSamples[i`2`+`1`] = (drflac_int32)right factor;
10710	}
10711	}
10712
10713	#if defined(DRFLAC_SUPPORT_SSE2)
10714	static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10715	{
10716	drflac_uint64 i;
10717	drflac_uint64 frameCount4 = frameCount >> `2`;
10718	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10719	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10720	drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample) - `8`;
10721	drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample) - `8`;
10722	__m128 factor;
10723
10724	DRFLAC_ASSERT(pFlac->bitsPerSample <= `24`);
10725
10726	factor = _mm_set1_ps(`1.0f` / `8388608.0f`);
10727
10728	for (i = `0`; i < frameCount4; ++i) {
10729	__m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10730	__m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10731	__m128i left = _mm_add_epi32(right, side);
10732	__m128 leftf = _mm_mul_ps(_mm_cvtepi32_ps(left), factor);
10733	__m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
10734
10735	_mm_storeu_ps(pOutputSamples + i*`8` + `0`, _mm_unpacklo_ps(leftf, rightf));
10736	_mm_storeu_ps(pOutputSamples + i*`8` + `4`, _mm_unpackhi_ps(leftf, rightf));
10737	}
10738
10739	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
10740	drflac_uint32 side = pInputSamples0U32[i] << shift0;
10741	drflac_uint32 right = pInputSamples1U32[i] << shift1;
10742	drflac_uint32 left = right + side;
10743
10744	pOutputSamples[i*`2`+`0`] = (drflac_int32)left / `8388608.0f`;
10745	pOutputSamples[i*`2`+`1`] = (drflac_int32)right / `8388608.0f`;
10746	}
10747	}
10748	#endif
10749
10750	#if defined(DRFLAC_SUPPORT_NEON)
10751	static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10752	{
10753	drflac_uint64 i;
10754	drflac_uint64 frameCount4 = frameCount >> `2`;
10755	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10756	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10757	drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample) - `8`;
10758	drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample) - `8`;
10759	float32x4_t factor4;
10760	int32x4_t shift0_4;
10761	int32x4_t shift1_4;
10762
10763	DRFLAC_ASSERT(pFlac->bitsPerSample <= `24`);
10764
10765	factor4 = vdupq_n_f32(`1.0f` / `8388608.0f`);
10766	shift0_4 = vdupq_n_s32(shift0);
10767	shift1_4 = vdupq_n_s32(shift1);
10768
10769	for (i = `0`; i < frameCount4; ++i) {
10770	uint32x4_t side;
10771	uint32x4_t right;
10772	uint32x4_t left;
10773	float32x4_t leftf;
10774	float32x4_t rightf;
10775
10776	side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*`4`), shift0_4);
10777	right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*`4`), shift1_4);
10778	left = vaddq_u32(right, side);
10779	leftf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)), factor4);
10780	rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
10781
10782	drflac__vst2q_f32(pOutputSamples + i*`8`, vzipq_f32(leftf, rightf));
10783	}
10784
10785	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
10786	drflac_uint32 side = pInputSamples0U32[i] << shift0;
10787	drflac_uint32 right = pInputSamples1U32[i] << shift1;
10788	drflac_uint32 left = right + side;
10789
10790	pOutputSamples[i*`2`+`0`] = (drflac_int32)left / `8388608.0f`;
10791	pOutputSamples[i*`2`+`1`] = (drflac_int32)right / `8388608.0f`;
10792	}
10793	}
10794	#endif
10795
10796	static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10797	{
10798	#if defined(DRFLAC_SUPPORT_SSE2)
10799	if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= `24`) {
10800	drflac_read_pcm_frames_f32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10801	} else
10802	#elif defined(DRFLAC_SUPPORT_NEON)
10803	if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= `24`) {
10804	drflac_read_pcm_frames_f32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10805	} else
10806	#endif
10807	{
10808	/ Scalar fallback. /
10809	#if 0
10810	drflac_read_pcm_frames_f32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10811	#else
10812	drflac_read_pcm_frames_f32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10813	#endif
10814	}
10815	}
10816
10817
10818	#if 0
10819	static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10820	{
10821	for (drflac_uint64 i = `0`; i < frameCount; ++i) {
10822	drflac_uint32 mid = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10823	drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10824
10825	mid = (mid << `1`) \| (side & `0x01`);
10826
10827	pOutputSamples[i`2`+`0`] = (float*)((((drflac_int32)(mid + side) >> `1`) << (unusedBitsPerSample)) / `2147483648.0`);
10828	pOutputSamples[i`2`+`1`] = (float*)((((drflac_int32)(mid - side) >> `1`) << (unusedBitsPerSample)) / `2147483648.0`);
10829	}
10830	}
10831	#endif
10832
10833	static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10834	{
10835	drflac_uint64 i;
10836	drflac_uint64 frameCount4 = frameCount >> `2`;
10837	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10838	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10839	drflac_uint32 shift = unusedBitsPerSample;
10840	float factor = `1` / `2147483648.0`;
10841
10842	if (shift > `0`) {
10843	shift -= `1`;
10844	for (i = `0`; i < frameCount4; ++i) {
10845	drflac_uint32 temp0L;
10846	drflac_uint32 temp1L;
10847	drflac_uint32 temp2L;
10848	drflac_uint32 temp3L;
10849	drflac_uint32 temp0R;
10850	drflac_uint32 temp1R;
10851	drflac_uint32 temp2R;
10852	drflac_uint32 temp3R;
10853
10854	drflac_uint32 mid0 = pInputSamples0U32[i*`4`+`0`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10855	drflac_uint32 mid1 = pInputSamples0U32[i*`4`+`1`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10856	drflac_uint32 mid2 = pInputSamples0U32[i*`4`+`2`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10857	drflac_uint32 mid3 = pInputSamples0U32[i*`4`+`3`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10858
10859	drflac_uint32 side0 = pInputSamples1U32[i*`4`+`0`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10860	drflac_uint32 side1 = pInputSamples1U32[i*`4`+`1`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10861	drflac_uint32 side2 = pInputSamples1U32[i*`4`+`2`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10862	drflac_uint32 side3 = pInputSamples1U32[i*`4`+`3`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10863
10864	mid0 = (mid0 << `1`) \| (side0 & `0x01`);
10865	mid1 = (mid1 << `1`) \| (side1 & `0x01`);
10866	mid2 = (mid2 << `1`) \| (side2 & `0x01`);
10867	mid3 = (mid3 << `1`) \| (side3 & `0x01`);
10868
10869	temp0L = (mid0 + side0) << shift;
10870	temp1L = (mid1 + side1) << shift;
10871	temp2L = (mid2 + side2) << shift;
10872	temp3L = (mid3 + side3) << shift;
10873
10874	temp0R = (mid0 - side0) << shift;
10875	temp1R = (mid1 - side1) << shift;
10876	temp2R = (mid2 - side2) << shift;
10877	temp3R = (mid3 - side3) << shift;
10878
10879	pOutputSamples[i`8`+`0`] = (drflac_int32)temp0L factor;
10880	pOutputSamples[i`8`+`1`] = (drflac_int32)temp0R factor;
10881	pOutputSamples[i`8`+`2`] = (drflac_int32)temp1L factor;
10882	pOutputSamples[i`8`+`3`] = (drflac_int32)temp1R factor;
10883	pOutputSamples[i`8`+`4`] = (drflac_int32)temp2L factor;
10884	pOutputSamples[i`8`+`5`] = (drflac_int32)temp2R factor;
10885	pOutputSamples[i`8`+`6`] = (drflac_int32)temp3L factor;
10886	pOutputSamples[i`8`+`7`] = (drflac_int32)temp3R factor;
10887	}
10888	} else {
10889	for (i = `0`; i < frameCount4; ++i) {
10890	drflac_uint32 temp0L;
10891	drflac_uint32 temp1L;
10892	drflac_uint32 temp2L;
10893	drflac_uint32 temp3L;
10894	drflac_uint32 temp0R;
10895	drflac_uint32 temp1R;
10896	drflac_uint32 temp2R;
10897	drflac_uint32 temp3R;
10898
10899	drflac_uint32 mid0 = pInputSamples0U32[i*`4`+`0`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10900	drflac_uint32 mid1 = pInputSamples0U32[i*`4`+`1`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10901	drflac_uint32 mid2 = pInputSamples0U32[i*`4`+`2`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10902	drflac_uint32 mid3 = pInputSamples0U32[i*`4`+`3`] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10903
10904	drflac_uint32 side0 = pInputSamples1U32[i*`4`+`0`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10905	drflac_uint32 side1 = pInputSamples1U32[i*`4`+`1`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10906	drflac_uint32 side2 = pInputSamples1U32[i*`4`+`2`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10907	drflac_uint32 side3 = pInputSamples1U32[i*`4`+`3`] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10908
10909	mid0 = (mid0 << `1`) \| (side0 & `0x01`);
10910	mid1 = (mid1 << `1`) \| (side1 & `0x01`);
10911	mid2 = (mid2 << `1`) \| (side2 & `0x01`);
10912	mid3 = (mid3 << `1`) \| (side3 & `0x01`);
10913
10914	temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> `1`);
10915	temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> `1`);
10916	temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> `1`);
10917	temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> `1`);
10918
10919	temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> `1`);
10920	temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> `1`);
10921	temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> `1`);
10922	temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> `1`);
10923
10924	pOutputSamples[i`8`+`0`] = (drflac_int32)temp0L factor;
10925	pOutputSamples[i`8`+`1`] = (drflac_int32)temp0R factor;
10926	pOutputSamples[i`8`+`2`] = (drflac_int32)temp1L factor;
10927	pOutputSamples[i`8`+`3`] = (drflac_int32)temp1R factor;
10928	pOutputSamples[i`8`+`4`] = (drflac_int32)temp2L factor;
10929	pOutputSamples[i`8`+`5`] = (drflac_int32)temp2R factor;
10930	pOutputSamples[i`8`+`6`] = (drflac_int32)temp3L factor;
10931	pOutputSamples[i`8`+`7`] = (drflac_int32)temp3R factor;
10932	}
10933	}
10934
10935	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
10936	drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10937	drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10938
10939	mid = (mid << `1`) \| (side & `0x01`);
10940
10941	pOutputSamples[i`2`+`0`] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> `1`) << unusedBitsPerSample) factor;
10942	pOutputSamples[i`2`+`1`] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> `1`) << unusedBitsPerSample) factor;
10943	}
10944	}
10945
10946	#if defined(DRFLAC_SUPPORT_SSE2)
10947	static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10948	{
10949	drflac_uint64 i;
10950	drflac_uint64 frameCount4 = frameCount >> `2`;
10951	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10952	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10953	drflac_uint32 shift = unusedBitsPerSample - `8`;
10954	float factor;
10955	__m128 factor128;
10956
10957	DRFLAC_ASSERT(pFlac->bitsPerSample <= `24`);
10958
10959	factor = `1.0f` / `8388608.0f`;
10960	factor128 = _mm_set1_ps(factor);
10961
10962	if (shift == `0`) {
10963	for (i = `0`; i < frameCount4; ++i) {
10964	__m128i mid;
10965	__m128i side;
10966	__m128i tempL;
10967	__m128i tempR;
10968	__m128 leftf;
10969	__m128 rightf;
10970
10971	mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample);
10972	side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample);
10973
10974	mid = _mm_or_si128(_mm_slli_epi32(mid, `1`), _mm_and_si128(side, _mm_set1_epi32(`0x01`)));
10975
10976	tempL = _mm_srai_epi32(_mm_add_epi32(mid, side), `1`);
10977	tempR = _mm_srai_epi32(_mm_sub_epi32(mid, side), `1`);
10978
10979	leftf = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
10980	rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
10981
10982	_mm_storeu_ps(pOutputSamples + i*`8` + `0`, _mm_unpacklo_ps(leftf, rightf));
10983	_mm_storeu_ps(pOutputSamples + i*`8` + `4`, _mm_unpackhi_ps(leftf, rightf));
10984	}
10985
10986	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
10987	drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
10988	drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
10989
10990	mid = (mid << `1`) \| (side & `0x01`);
10991
10992	pOutputSamples[i`2`+`0`] = ((drflac_int32)(mid + side) >> `1`) factor;
10993	pOutputSamples[i`2`+`1`] = ((drflac_int32)(mid - side) >> `1`) factor;
10994	}
10995	} else {
10996	shift -= `1`;
10997	for (i = `0`; i < frameCount4; ++i) {
10998	__m128i mid;
10999	__m128i side;
11000	__m128i tempL;
11001	__m128i tempR;
11002	__m128 leftf;
11003	__m128 rightf;
11004
11005	mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample);
11006	side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample);
11007
11008	mid = _mm_or_si128(_mm_slli_epi32(mid, `1`), _mm_and_si128(side, _mm_set1_epi32(`0x01`)));
11009
11010	tempL = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
11011	tempR = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
11012
11013	leftf = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
11014	rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
11015
11016	_mm_storeu_ps(pOutputSamples + i*`8` + `0`, _mm_unpacklo_ps(leftf, rightf));
11017	_mm_storeu_ps(pOutputSamples + i*`8` + `4`, _mm_unpackhi_ps(leftf, rightf));
11018	}
11019
11020	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
11021	drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
11022	drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
11023
11024	mid = (mid << `1`) \| (side & `0x01`);
11025
11026	pOutputSamples[i`2`+`0`] = (drflac_int32)((mid + side) << shift) factor;
11027	pOutputSamples[i`2`+`1`] = (drflac_int32)((mid - side) << shift) factor;
11028	}
11029	}
11030	}
11031	#endif
11032
11033	#if defined(DRFLAC_SUPPORT_NEON)
11034	static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11035	{
11036	drflac_uint64 i;
11037	drflac_uint64 frameCount4 = frameCount >> `2`;
11038	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11039	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11040	drflac_uint32 shift = unusedBitsPerSample - `8`;
11041	float factor;
11042	float32x4_t factor4;
11043	int32x4_t shift4;
11044	int32x4_t wbps0_4; / Wasted Bits Per Sample /
11045	int32x4_t wbps1_4; / Wasted Bits Per Sample /
11046
11047	DRFLAC_ASSERT(pFlac->bitsPerSample <= `24`);
11048
11049	factor = `1.0f` / `8388608.0f`;
11050	factor4 = vdupq_n_f32(factor);
11051	wbps0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample);
11052	wbps1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample);
11053
11054	if (shift == `0`) {
11055	for (i = `0`; i < frameCount4; ++i) {
11056	int32x4_t lefti;
11057	int32x4_t righti;
11058	float32x4_t leftf;
11059	float32x4_t rightf;
11060
11061	uint32x4_t mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*`4`), wbps0_4);
11062	uint32x4_t side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*`4`), wbps1_4);
11063
11064	mid = vorrq_u32(vshlq_n_u32(mid, `1`), vandq_u32(side, vdupq_n_u32(`1`)));
11065
11066	lefti = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), `1`);
11067	righti = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), `1`);
11068
11069	leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11070	rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11071
11072	drflac__vst2q_f32(pOutputSamples + i*`8`, vzipq_f32(leftf, rightf));
11073	}
11074
11075	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
11076	drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
11077	drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
11078
11079	mid = (mid << `1`) \| (side & `0x01`);
11080
11081	pOutputSamples[i`2`+`0`] = ((drflac_int32)(mid + side) >> `1`) factor;
11082	pOutputSamples[i`2`+`1`] = ((drflac_int32)(mid - side) >> `1`) factor;
11083	}
11084	} else {
11085	shift -= `1`;
11086	shift4 = vdupq_n_s32(shift);
11087	for (i = `0`; i < frameCount4; ++i) {
11088	uint32x4_t mid;
11089	uint32x4_t side;
11090	int32x4_t lefti;
11091	int32x4_t righti;
11092	float32x4_t leftf;
11093	float32x4_t rightf;
11094
11095	mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*`4`), wbps0_4);
11096	side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*`4`), wbps1_4);
11097
11098	mid = vorrq_u32(vshlq_n_u32(mid, `1`), vandq_u32(side, vdupq_n_u32(`1`)));
11099
11100	lefti = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
11101	righti = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
11102
11103	leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11104	rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11105
11106	drflac__vst2q_f32(pOutputSamples + i*`8`, vzipq_f32(leftf, rightf));
11107	}
11108
11109	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
11110	drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
11111	drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
11112
11113	mid = (mid << `1`) \| (side & `0x01`);
11114
11115	pOutputSamples[i`2`+`0`] = (drflac_int32)((mid + side) << shift) factor;
11116	pOutputSamples[i`2`+`1`] = (drflac_int32)((mid - side) << shift) factor;
11117	}
11118	}
11119	}
11120	#endif
11121
11122	static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11123	{
11124	#if defined(DRFLAC_SUPPORT_SSE2)
11125	if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= `24`) {
11126	drflac_read_pcm_frames_f32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11127	} else
11128	#elif defined(DRFLAC_SUPPORT_NEON)
11129	if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= `24`) {
11130	drflac_read_pcm_frames_f32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11131	} else
11132	#endif
11133	{
11134	/ Scalar fallback. /
11135	#if 0
11136	drflac_read_pcm_frames_f32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11137	#else
11138	drflac_read_pcm_frames_f32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11139	#endif
11140	}
11141	}
11142
11143	#if 0
11144	static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11145	{
11146	for (drflac_uint64 i = `0`; i < frameCount; ++i) {
11147	pOutputSamples[i`2`+`0`] = (float*)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample)) / `2147483648.0`);
11148	pOutputSamples[i`2`+`1`] = (float*)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample)) / `2147483648.0`);
11149	}
11150	}
11151	#endif
11152
11153	static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11154	{
11155	drflac_uint64 i;
11156	drflac_uint64 frameCount4 = frameCount >> `2`;
11157	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11158	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11159	drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample;
11160	drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample;
11161	float factor = `1` / `2147483648.0`;
11162
11163	for (i = `0`; i < frameCount4; ++i) {
11164	drflac_uint32 tempL0 = pInputSamples0U32[i*`4`+`0`] << shift0;
11165	drflac_uint32 tempL1 = pInputSamples0U32[i*`4`+`1`] << shift0;
11166	drflac_uint32 tempL2 = pInputSamples0U32[i*`4`+`2`] << shift0;
11167	drflac_uint32 tempL3 = pInputSamples0U32[i*`4`+`3`] << shift0;
11168
11169	drflac_uint32 tempR0 = pInputSamples1U32[i*`4`+`0`] << shift1;
11170	drflac_uint32 tempR1 = pInputSamples1U32[i*`4`+`1`] << shift1;
11171	drflac_uint32 tempR2 = pInputSamples1U32[i*`4`+`2`] << shift1;
11172	drflac_uint32 tempR3 = pInputSamples1U32[i*`4`+`3`] << shift1;
11173
11174	pOutputSamples[i`8`+`0`] = (drflac_int32)tempL0 factor;
11175	pOutputSamples[i`8`+`1`] = (drflac_int32)tempR0 factor;
11176	pOutputSamples[i`8`+`2`] = (drflac_int32)tempL1 factor;
11177	pOutputSamples[i`8`+`3`] = (drflac_int32)tempR1 factor;
11178	pOutputSamples[i`8`+`4`] = (drflac_int32)tempL2 factor;
11179	pOutputSamples[i`8`+`5`] = (drflac_int32)tempR2 factor;
11180	pOutputSamples[i`8`+`6`] = (drflac_int32)tempL3 factor;
11181	pOutputSamples[i`8`+`7`] = (drflac_int32)tempR3 factor;
11182	}
11183
11184	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
11185	pOutputSamples[i`2`+`0`] = (drflac_int32)(pInputSamples0U32[i] << shift0) factor;
11186	pOutputSamples[i`2`+`1`] = (drflac_int32)(pInputSamples1U32[i] << shift1) factor;
11187	}
11188	}
11189
11190	#if defined(DRFLAC_SUPPORT_SSE2)
11191	static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11192	{
11193	drflac_uint64 i;
11194	drflac_uint64 frameCount4 = frameCount >> `2`;
11195	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11196	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11197	drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample) - `8`;
11198	drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample) - `8`;
11199
11200	float factor = `1.0f` / `8388608.0f`;
11201	__m128 factor128 = _mm_set1_ps(factor);
11202
11203	for (i = `0`; i < frameCount4; ++i) {
11204	__m128i lefti;
11205	__m128i righti;
11206	__m128 leftf;
11207	__m128 rightf;
11208
11209	lefti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
11210	righti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
11211
11212	leftf = _mm_mul_ps(_mm_cvtepi32_ps(lefti), factor128);
11213	rightf = _mm_mul_ps(_mm_cvtepi32_ps(righti), factor128);
11214
11215	_mm_storeu_ps(pOutputSamples + i*`8` + `0`, _mm_unpacklo_ps(leftf, rightf));
11216	_mm_storeu_ps(pOutputSamples + i*`8` + `4`, _mm_unpackhi_ps(leftf, rightf));
11217	}
11218
11219	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
11220	pOutputSamples[i`2`+`0`] = (drflac_int32)(pInputSamples0U32[i] << shift0) factor;
11221	pOutputSamples[i`2`+`1`] = (drflac_int32)(pInputSamples1U32[i] << shift1) factor;
11222	}
11223	}
11224	#endif
11225
11226	#if defined(DRFLAC_SUPPORT_NEON)
11227	static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11228	{
11229	drflac_uint64 i;
11230	drflac_uint64 frameCount4 = frameCount >> `2`;
11231	const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11232	const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11233	drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`0`].wastedBitsPerSample) - `8`;
11234	drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[`1`].wastedBitsPerSample) - `8`;
11235
11236	float factor = `1.0f` / `8388608.0f`;
11237	float32x4_t factor4 = vdupq_n_f32(factor);
11238	int32x4_t shift0_4 = vdupq_n_s32(shift0);
11239	int32x4_t shift1_4 = vdupq_n_s32(shift1);
11240
11241	for (i = `0`; i < frameCount4; ++i) {
11242	int32x4_t lefti;
11243	int32x4_t righti;
11244	float32x4_t leftf;
11245	float32x4_t rightf;
11246
11247	lefti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*`4`), shift0_4));
11248	righti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*`4`), shift1_4));
11249
11250	leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11251	rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11252
11253	drflac__vst2q_f32(pOutputSamples + i*`8`, vzipq_f32(leftf, rightf));
11254	}
11255
11256	for (i = (frameCount4 << `2`); i < frameCount; ++i) {
11257	pOutputSamples[i`2`+`0`] = (drflac_int32)(pInputSamples0U32[i] << shift0) factor;
11258	pOutputSamples[i`2`+`1`] = (drflac_int32)(pInputSamples1U32[i] << shift1) factor;
11259	}
11260	}
11261	#endif
11262
11263	static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11264	{
11265	#if defined(DRFLAC_SUPPORT_SSE2)
11266	if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= `24`) {
11267	drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11268	} else
11269	#elif defined(DRFLAC_SUPPORT_NEON)
11270	if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= `24`) {
11271	drflac_read_pcm_frames_f32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11272	} else
11273	#endif
11274	{
11275	/ Scalar fallback. /
11276	#if 0
11277	drflac_read_pcm_frames_f32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11278	#else
11279	drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11280	#endif
11281	}
11282	}
11283
11284	DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut)
11285	{
11286	drflac_uint64 framesRead;
11287	drflac_uint32 unusedBitsPerSample;
11288
11289	if (pFlac == NULL \|\| framesToRead == `0`) {
11290	return `0`;
11291	}
11292
11293	if (pBufferOut == NULL) {
11294	return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
11295	}
11296
11297	DRFLAC_ASSERT(pFlac->bitsPerSample <= `32`);
11298	unusedBitsPerSample = `32` - pFlac->bitsPerSample;
11299
11300	framesRead = `0`;
11301	while (framesToRead > `0`) {
11302	/ If we've run out of samples in this frame, go to the next. /
11303	if (pFlac->currentFLACFrame.pcmFramesRemaining == `0`) {
11304	if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
11305	break; / Couldn't read the next frame, so just break from the loop and return. /
11306	}
11307	} else {
11308	unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
11309	drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
11310	drflac_uint64 frameCountThisIteration = framesToRead;
11311
11312	if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
11313	frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
11314	}
11315
11316	if (channelCount == `2`) {
11317	const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[`0`].pSamplesS32 + iFirstPCMFrame;
11318	const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[`1`].pSamplesS32 + iFirstPCMFrame;
11319
11320	switch (pFlac->currentFLACFrame.header.channelAssignment)
11321	{
11322	case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
11323	{
11324	drflac_read_pcm_frames_f32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11325	} break;
11326
11327	case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
11328	{
11329	drflac_read_pcm_frames_f32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11330	} break;
11331
11332	case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
11333	{
11334	drflac_read_pcm_frames_f32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11335	} break;
11336
11337	case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
11338	default:
11339	{
11340	drflac_read_pcm_frames_f32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11341	} break;
11342	}
11343	} else {
11344	/ Generic interleaving. /
11345	drflac_uint64 i;
11346	for (i = `0`; i < frameCountThisIteration; ++i) {
11347	unsigned int j;
11348	for (j = `0`; j < channelCount; ++j) {
11349	drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
11350	pBufferOut[(ichannelCount)+j] = (float*)(sampleS32 / `2147483648.0`);
11351	}
11352	}
11353	}
11354
11355	framesRead += frameCountThisIteration;
11356	pBufferOut += frameCountThisIteration * channelCount;
11357	framesToRead -= frameCountThisIteration;
11358	pFlac->currentPCMFrame += frameCountThisIteration;
11359	pFlac->currentFLACFrame.pcmFramesRemaining -= (unsigned int)frameCountThisIteration;
11360	}
11361	}
11362
11363	return framesRead;
11364	}
11365
11366
11367	DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
11368	{
11369	if (pFlac == NULL) {
11370	return DRFLAC_FALSE;
11371	}
11372
11373	/ Don't do anything if we're already on the seek point. /
11374	if (pFlac->currentPCMFrame == pcmFrameIndex) {
11375	return DRFLAC_TRUE;
11376	}
11377
11378	/*
11379	If we don't know where the first frame begins then we can't seek. This will happen when the STREAMINFO block was not present
11380	when the decoder was opened.
11381	*/
11382	if (pFlac->firstFLACFramePosInBytes == `0`) {
11383	return DRFLAC_FALSE;
11384	}
11385
11386	if (pcmFrameIndex == `0`) {
11387	pFlac->currentPCMFrame = `0`;
11388	return drflac__seek_to_first_frame(pFlac);
11389	} else {
11390	drflac_bool32 wasSuccessful = DRFLAC_FALSE;
11391	drflac_uint64 originalPCMFrame = pFlac->currentPCMFrame;
11392
11393	/ Clamp the sample to the end. /
11394	if (pcmFrameIndex > pFlac->totalPCMFrameCount) {
11395	pcmFrameIndex = pFlac->totalPCMFrameCount;
11396	}
11397
11398	/ If the target sample and the current sample are in the same frame we just move the position forward. /
11399	if (pcmFrameIndex > pFlac->currentPCMFrame) {
11400	/ Forward. /
11401	drflac_uint32 offset = (drflac_uint32)(pcmFrameIndex - pFlac->currentPCMFrame);
11402	if (pFlac->currentFLACFrame.pcmFramesRemaining > offset) {
11403	pFlac->currentFLACFrame.pcmFramesRemaining -= offset;
11404	pFlac->currentPCMFrame = pcmFrameIndex;
11405	return DRFLAC_TRUE;
11406	}
11407	} else {
11408	/ Backward. /
11409	drflac_uint32 offsetAbs = (drflac_uint32)(pFlac->currentPCMFrame - pcmFrameIndex);
11410	drflac_uint32 currentFLACFramePCMFrameCount = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
11411	drflac_uint32 currentFLACFramePCMFramesConsumed = currentFLACFramePCMFrameCount - pFlac->currentFLACFrame.pcmFramesRemaining;
11412	if (currentFLACFramePCMFramesConsumed > offsetAbs) {
11413	pFlac->currentFLACFrame.pcmFramesRemaining += offsetAbs;
11414	pFlac->currentPCMFrame = pcmFrameIndex;
11415	return DRFLAC_TRUE;
11416	}
11417	}
11418
11419	/*
11420	Different techniques depending on encapsulation. Using the native FLAC seektable with Ogg encapsulation is a bit awkward so
11421	we'll instead use Ogg's natural seeking facility.
11422	*/
11423	#ifndef DR_FLAC_NO_OGG
11424	if (pFlac->container == drflac_container_ogg)
11425	{
11426	wasSuccessful = drflac_ogg__seek_to_pcm_frame(pFlac, pcmFrameIndex);
11427	}
11428	else
11429	#endif
11430	{
11431	/ First try seeking via the seek table. If this fails, fall back to a brute force seek which is much slower. /
11432	if (/!wasSuccessful && /!pFlac->_noSeekTableSeek) {
11433	wasSuccessful = drflac__seek_to_pcm_frame__seek_table(pFlac, pcmFrameIndex);
11434	}
11435
11436	#if !defined(DR_FLAC_NO_CRC)
11437	/ Fall back to binary search if seek table seeking fails. This requires the length of the stream to be known. /
11438	if (!wasSuccessful && !pFlac->_noBinarySearchSeek && pFlac->totalPCMFrameCount > `0`) {
11439	wasSuccessful = drflac__seek_to_pcm_frame__binary_search(pFlac, pcmFrameIndex);
11440	}
11441	#endif
11442
11443	/ Fall back to brute force if all else fails. /
11444	if (!wasSuccessful && !pFlac->_noBruteForceSeek) {
11445	wasSuccessful = drflac__seek_to_pcm_frame__brute_force(pFlac, pcmFrameIndex);
11446	}
11447	}
11448
11449	if (wasSuccessful) {
11450	pFlac->currentPCMFrame = pcmFrameIndex;
11451	} else {
11452	/ Seek failed. Try putting the decoder back to it's original state. /
11453	if (drflac_seek_to_pcm_frame(pFlac, originalPCMFrame) == DRFLAC_FALSE) {
11454	/ Failed to seek back to the original PCM frame. Fall back to 0. /
11455	drflac_seek_to_pcm_frame(pFlac, `0`);
11456	}
11457	}
11458
11459	return wasSuccessful;
11460	}
11461	}
11462
11463
11464
11465	/ High Level APIs /
11466
11467	#if defined(SIZE_MAX)
11468	#define DRFLAC_SIZE_MAX SIZE_MAX
11469	#else
11470	#if defined(DRFLAC_64BIT)
11471	#define DRFLAC_SIZE_MAX ((drflac_uint64)0xFFFFFFFFFFFFFFFF)
11472	#else
11473	#define DRFLAC_SIZE_MAX 0xFFFFFFFF
11474	#endif
11475	#endif
11476
11477
11478	/ Using a macro as the definition of the drflac__full_decode_and_close_() API family. Sue me. /*
11479	#define DRFLAC_DEFINE_FULL_READ_AND_CLOSE(extension, type) \
11480	static type* drflac__full_read_and_close_ ## extension (drflac* pFlac, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut)\
11481	{ \
11482	type* pSampleData = NULL; \
11483	drflac_uint64 totalPCMFrameCount; \
11484	\
11485	DRFLAC_ASSERT(pFlac != NULL); \
11486	\
11487	totalPCMFrameCount = pFlac->totalPCMFrameCount; \
11488	\
11489	if (totalPCMFrameCount == 0) { \
11490	type buffer[4096]; \
11491	drflac_uint64 pcmFramesRead; \
11492	size_t sampleDataBufferSize = sizeof(buffer); \
11493	\
11494	pSampleData = (type*)drflac__malloc_from_callbacks(sampleDataBufferSize, &pFlac->allocationCallbacks); \
11495	if (pSampleData == NULL) { \
11496	goto on_error; \
11497	} \
11498	\
11499	while ((pcmFramesRead = (drflac_uint64)drflac_read_pcm_frames_##extension(pFlac, sizeof(buffer)/sizeof(buffer[0])/pFlac->channels, buffer)) > 0) { \
11500	if (((totalPCMFrameCount + pcmFramesRead) * pFlac->channels * sizeof(type)) > sampleDataBufferSize) { \
11501	type* pNewSampleData; \
11502	size_t newSampleDataBufferSize; \
11503	\
11504	newSampleDataBufferSize = sampleDataBufferSize * 2; \
11505	pNewSampleData = (type*)drflac__realloc_from_callbacks(pSampleData, newSampleDataBufferSize, sampleDataBufferSize, &pFlac->allocationCallbacks); \
11506	if (pNewSampleData == NULL) { \
11507	drflac__free_from_callbacks(pSampleData, &pFlac->allocationCallbacks); \
11508	goto on_error; \
11509	} \
11510	\
11511	sampleDataBufferSize = newSampleDataBufferSize; \
11512	pSampleData = pNewSampleData; \
11513	} \
11514	\
11515	DRFLAC_COPY_MEMORY(pSampleData + (totalPCMFrameCountpFlac->channels), buffer, (size_t)(pcmFramesReadpFlac->channels*sizeof(type))); \
11516	totalPCMFrameCount += pcmFramesRead; \
11517	} \
11518	\
11519	/* At this point everything should be decoded, but we just want to fill the unused part buffer with silence - need to \
11520	protect those ears from random noise! */ \
11521	DRFLAC_ZERO_MEMORY(pSampleData + (totalPCMFrameCountpFlac->channels), (size_t)(sampleDataBufferSize - totalPCMFrameCountpFlac->channels*sizeof(type))); \
11522	} else { \
11523	drflac_uint64 dataSize = totalPCMFrameCountpFlac->channelssizeof(type); \
11524	if (dataSize > (drflac_uint64)DRFLAC_SIZE_MAX) { \
11525	goto on_error; /* The decoded data is too big. */ \
11526	} \
11527	\
11528	pSampleData = (type)drflac__malloc_from_callbacks((size_t)dataSize, &pFlac->allocationCallbacks); / <-- Safe cast as per the check above. */ \
11529	if (pSampleData == NULL) { \
11530	goto on_error; \
11531	} \
11532	\
11533	totalPCMFrameCount = drflac_read_pcm_frames_##extension(pFlac, pFlac->totalPCMFrameCount, pSampleData); \
11534	} \
11535	\
11536	if (sampleRateOut) *sampleRateOut = pFlac->sampleRate; \
11537	if (channelsOut) *channelsOut = pFlac->channels; \
11538	if (totalPCMFrameCountOut) *totalPCMFrameCountOut = totalPCMFrameCount; \
11539	\
11540	drflac_close(pFlac); \
11541	return pSampleData; \
11542	\
11543	on_error: \
11544	drflac_close(pFlac); \
11545	return NULL; \
11546	}
11547
11548	DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s32, drflac_int32)
11549	DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s16, drflac_int16)
11550	DRFLAC_DEFINE_FULL_READ_AND_CLOSE(f32, float)
11551
11552	DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11553	{
11554	drflac* pFlac;
11555
11556	if (channelsOut) {
11557	*channelsOut = `0`;
11558	}
11559	if (sampleRateOut) {
11560	*sampleRateOut = `0`;
11561	}
11562	if (totalPCMFrameCountOut) {
11563	*totalPCMFrameCountOut = `0`;
11564	}
11565
11566	pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11567	if (pFlac == NULL) {
11568	return NULL;
11569	}
11570
11571	return drflac__full_read_and_close_s32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11572	}
11573
11574	DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11575	{
11576	drflac* pFlac;
11577
11578	if (channelsOut) {
11579	*channelsOut = `0`;
11580	}
11581	if (sampleRateOut) {
11582	*sampleRateOut = `0`;
11583	}
11584	if (totalPCMFrameCountOut) {
11585	*totalPCMFrameCountOut = `0`;
11586	}
11587
11588	pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11589	if (pFlac == NULL) {
11590	return NULL;
11591	}
11592
11593	return drflac__full_read_and_close_s16(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11594	}
11595
11596	DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11597	{
11598	drflac* pFlac;
11599
11600	if (channelsOut) {
11601	*channelsOut = `0`;
11602	}
11603	if (sampleRateOut) {
11604	*sampleRateOut = `0`;
11605	}
11606	if (totalPCMFrameCountOut) {
11607	*totalPCMFrameCountOut = `0`;
11608	}
11609
11610	pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11611	if (pFlac == NULL) {
11612	return NULL;
11613	}
11614
11615	return drflac__full_read_and_close_f32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11616	}
11617
11618	#ifndef DR_FLAC_NO_STDIO
11619	DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11620	{
11621	drflac* pFlac;
11622
11623	if (sampleRate) {
11624	*sampleRate = `0`;
11625	}
11626	if (channels) {
11627	*channels = `0`;
11628	}
11629	if (totalPCMFrameCount) {
11630	*totalPCMFrameCount = `0`;
11631	}
11632
11633	pFlac = drflac_open_file(filename, pAllocationCallbacks);
11634	if (pFlac == NULL) {
11635	return NULL;
11636	}
11637
11638	return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
11639	}
11640
11641	DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11642	{
11643	drflac* pFlac;
11644
11645	if (sampleRate) {
11646	*sampleRate = `0`;
11647	}
11648	if (channels) {
11649	*channels = `0`;
11650	}
11651	if (totalPCMFrameCount) {
11652	*totalPCMFrameCount = `0`;
11653	}
11654
11655	pFlac = drflac_open_file(filename, pAllocationCallbacks);
11656	if (pFlac == NULL) {
11657	return NULL;
11658	}
11659
11660	return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
11661	}
11662
11663	DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11664	{
11665	drflac* pFlac;
11666
11667	if (sampleRate) {
11668	*sampleRate = `0`;
11669	}
11670	if (channels) {
11671	*channels = `0`;
11672	}
11673	if (totalPCMFrameCount) {
11674	*totalPCMFrameCount = `0`;
11675	}
11676
11677	pFlac = drflac_open_file(filename, pAllocationCallbacks);
11678	if (pFlac == NULL) {
11679	return NULL;
11680	}
11681
11682	return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
11683	}
11684	#endif
11685
11686	DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11687	{
11688	drflac* pFlac;
11689
11690	if (sampleRate) {
11691	*sampleRate = `0`;
11692	}
11693	if (channels) {
11694	*channels = `0`;
11695	}
11696	if (totalPCMFrameCount) {
11697	*totalPCMFrameCount = `0`;
11698	}
11699
11700	pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11701	if (pFlac == NULL) {
11702	return NULL;
11703	}
11704
11705	return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
11706	}
11707
11708	DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11709	{
11710	drflac* pFlac;
11711
11712	if (sampleRate) {
11713	*sampleRate = `0`;
11714	}
11715	if (channels) {
11716	*channels = `0`;
11717	}
11718	if (totalPCMFrameCount) {
11719	*totalPCMFrameCount = `0`;
11720	}
11721
11722	pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11723	if (pFlac == NULL) {
11724	return NULL;
11725	}
11726
11727	return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
11728	}
11729
11730	DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11731	{
11732	drflac* pFlac;
11733
11734	if (sampleRate) {
11735	*sampleRate = `0`;
11736	}
11737	if (channels) {
11738	*channels = `0`;
11739	}
11740	if (totalPCMFrameCount) {
11741	*totalPCMFrameCount = `0`;
11742	}
11743
11744	pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11745	if (pFlac == NULL) {
11746	return NULL;
11747	}
11748
11749	return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
11750	}
11751
11752
11753	DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
11754	{
11755	if (pAllocationCallbacks != NULL) {
11756	drflac__free_from_callbacks(p, pAllocationCallbacks);
11757	} else {
11758	drflac__free_default(p, NULL);
11759	}
11760	}
11761
11762
11763
11764
11765	DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments)
11766	{
11767	if (pIter == NULL) {
11768	return;
11769	}
11770
11771	pIter->countRemaining = commentCount;
11772	pIter->pRunningData = (const char*)pComments;
11773	}
11774
11775	DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut)
11776	{
11777	drflac_int32 length;
11778	const char* pComment;
11779
11780	/ Safety. /
11781	if (pCommentLengthOut) {
11782	*pCommentLengthOut = `0`;
11783	}
11784
11785	if (pIter == NULL \|\| pIter->countRemaining == `0` \|\| pIter->pRunningData == NULL) {
11786	return NULL;
11787	}
11788
11789	length = drflac__le2host_32((const* drflac_uint32*)pIter->pRunningData);
11790	pIter->pRunningData += `4`;
11791
11792	pComment = pIter->pRunningData;
11793	pIter->pRunningData += length;
11794	pIter->countRemaining -= `1`;
11795
11796	if (pCommentLengthOut) {
11797	*pCommentLengthOut = length;
11798	}
11799
11800	return pComment;
11801	}
11802
11803
11804
11805
11806	DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData)
11807	{
11808	if (pIter == NULL) {
11809	return;
11810	}
11811
11812	pIter->countRemaining = trackCount;
11813	pIter->pRunningData = (const char*)pTrackData;
11814	}
11815
11816	DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack)
11817	{
11818	drflac_cuesheet_track cuesheetTrack;
11819	const char* pRunningData;
11820	drflac_uint64 offsetHi;
11821	drflac_uint64 offsetLo;
11822
11823	if (pIter == NULL \|\| pIter->countRemaining == `0` \|\| pIter->pRunningData == NULL) {
11824	return DRFLAC_FALSE;
11825	}
11826
11827	pRunningData = pIter->pRunningData;
11828
11829	offsetHi = drflac__be2host_32((const* drflac_uint32*)pRunningData); pRunningData += `4`;
11830	offsetLo = drflac__be2host_32((const* drflac_uint32*)pRunningData); pRunningData += `4`;
11831	cuesheetTrack.offset = offsetLo \| (offsetHi << `32`);
11832	cuesheetTrack.trackNumber = pRunningData[`0`]; pRunningData += `1`;
11833	DRFLAC_COPY_MEMORY(cuesheetTrack.ISRC, pRunningData, sizeof(cuesheetTrack.ISRC)); pRunningData += `12`;
11834	cuesheetTrack.isAudio = (pRunningData[`0`] & `0x80`) != `0`;
11835	cuesheetTrack.preEmphasis = (pRunningData[`0`] & `0x40`) != `0`; pRunningData += `14`;
11836	cuesheetTrack.indexCount = pRunningData[`0`]; pRunningData += `1`;
11837	cuesheetTrack.pIndexPoints = (const drflac_cuesheet_track_index)pRunningData; pRunningData += cuesheetTrack.indexCount sizeof(drflac_cuesheet_track_index);
11838
11839	pIter->pRunningData = pRunningData;
11840	pIter->countRemaining -= `1`;
11841
11842	if (pCuesheetTrack) {
11843	*pCuesheetTrack = cuesheetTrack;
11844	}
11845
11846	return DRFLAC_TRUE;
11847	}
11848
11849	#if defined(__clang__) \|\| (defined(__GNUC__) && (__GNUC__ > 4 \|\| (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
11850	#pragma GCC diagnostic pop
11851	#endif
11852	#endif /* dr_flac_c */
11853	#endif /* DR_FLAC_IMPLEMENTATION */
11854
11855
11856	/*
11857	REVISION HISTORY
11858	================
11859	v0.12.33 - 2021-12-22
11860	- Fix a bug with seeking when the seek table does not start at PCM frame 0.
11861
11862	v0.12.32 - 2021-12-11
11863	- Fix a warning with Clang.
11864
11865	v0.12.31 - 2021-08-16
11866	- Silence some warnings.
11867
11868	v0.12.30 - 2021-07-31
11869	- Fix platform detection for ARM64.
11870
11871	v0.12.29 - 2021-04-02
11872	- Fix a bug where the running PCM frame index is set to an invalid value when over-seeking.
11873	- Fix a decoding error due to an incorrect validation check.
11874
11875	v0.12.28 - 2021-02-21
11876	- Fix a warning due to referencing _MSC_VER when it is undefined.
11877
11878	v0.12.27 - 2021-01-31
11879	- Fix a static analysis warning.
11880
11881	v0.12.26 - 2021-01-17
11882	- Fix a compilation warning due to _BSD_SOURCE being deprecated.
11883
11884	v0.12.25 - 2020-12-26
11885	- Update documentation.
11886
11887	v0.12.24 - 2020-11-29
11888	- Fix ARM64/NEON detection when compiling with MSVC.
11889
11890	v0.12.23 - 2020-11-21
11891	- Fix compilation with OpenWatcom.
11892
11893	v0.12.22 - 2020-11-01
11894	- Fix an error with the previous release.
11895
11896	v0.12.21 - 2020-11-01
11897	- Fix a possible deadlock when seeking.
11898	- Improve compiler support for older versions of GCC.
11899
11900	v0.12.20 - 2020-09-08
11901	- Fix a compilation error on older compilers.
11902
11903	v0.12.19 - 2020-08-30
11904	- Fix a bug due to an undefined 32-bit shift.
11905
11906	v0.12.18 - 2020-08-14
11907	- Fix a crash when compiling with clang-cl.
11908
11909	v0.12.17 - 2020-08-02
11910	- Simplify sized types.
11911
11912	v0.12.16 - 2020-07-25
11913	- Fix a compilation warning.
11914
11915	v0.12.15 - 2020-07-06
11916	- Check for negative LPC shifts and return an error.
11917
11918	v0.12.14 - 2020-06-23
11919	- Add include guard for the implementation section.
11920
11921	v0.12.13 - 2020-05-16
11922	- Add compile-time and run-time version querying.
11923	- DRFLAC_VERSION_MINOR
11924	- DRFLAC_VERSION_MAJOR
11925	- DRFLAC_VERSION_REVISION
11926	- DRFLAC_VERSION_STRING
11927	- drflac_version()
11928	- drflac_version_string()
11929
11930	v0.12.12 - 2020-04-30
11931	- Fix compilation errors with VC6.
11932
11933	v0.12.11 - 2020-04-19
11934	- Fix some pedantic warnings.
11935	- Fix some undefined behaviour warnings.
11936
11937	v0.12.10 - 2020-04-10
11938	- Fix some bugs when trying to seek with an invalid seek table.
11939
11940	v0.12.9 - 2020-04-05
11941	- Fix warnings.
11942
11943	v0.12.8 - 2020-04-04
11944	- Add drflac_open_file_w() and drflac_open_file_with_metadata_w().
11945	- Fix some static analysis warnings.
11946	- Minor documentation updates.
11947
11948	v0.12.7 - 2020-03-14
11949	- Fix compilation errors with VC6.
11950
11951	v0.12.6 - 2020-03-07
11952	- Fix compilation error with Visual Studio .NET 2003.
11953
11954	v0.12.5 - 2020-01-30
11955	- Silence some static analysis warnings.
11956
11957	v0.12.4 - 2020-01-29
11958	- Silence some static analysis warnings.
11959
11960	v0.12.3 - 2019-12-02
11961	- Fix some warnings when compiling with GCC and the -Og flag.
11962	- Fix a crash in out-of-memory situations.
11963	- Fix potential integer overflow bug.
11964	- Fix some static analysis warnings.
11965	- Fix a possible crash when using custom memory allocators without a custom realloc() implementation.
11966	- Fix a bug with binary search seeking where the bits per sample is not a multiple of 8.
11967
11968	v0.12.2 - 2019-10-07
11969	- Internal code clean up.
11970
11971	v0.12.1 - 2019-09-29
11972	- Fix some Clang Static Analyzer warnings.
11973	- Fix an unused variable warning.
11974
11975	v0.12.0 - 2019-09-23
11976	- API CHANGE: Add support for user defined memory allocation routines. This system allows the program to specify their own memory allocation
11977	routines with a user data pointer for client-specific contextual data. This adds an extra parameter to the end of the following APIs:
11978	- drflac_open()
11979	- drflac_open_relaxed()
11980	- drflac_open_with_metadata()
11981	- drflac_open_with_metadata_relaxed()
11982	- drflac_open_file()
11983	- drflac_open_file_with_metadata()
11984	- drflac_open_memory()
11985	- drflac_open_memory_with_metadata()
11986	- drflac_open_and_read_pcm_frames_s32()
11987	- drflac_open_and_read_pcm_frames_s16()
11988	- drflac_open_and_read_pcm_frames_f32()
11989	- drflac_open_file_and_read_pcm_frames_s32()
11990	- drflac_open_file_and_read_pcm_frames_s16()
11991	- drflac_open_file_and_read_pcm_frames_f32()
11992	- drflac_open_memory_and_read_pcm_frames_s32()
11993	- drflac_open_memory_and_read_pcm_frames_s16()
11994	- drflac_open_memory_and_read_pcm_frames_f32()
11995	Set this extra parameter to NULL to use defaults which is the same as the previous behaviour. Setting this NULL will use
11996	DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
11997	- Remove deprecated APIs:
11998	- drflac_read_s32()
11999	- drflac_read_s16()
12000	- drflac_read_f32()
12001	- drflac_seek_to_sample()
12002	- drflac_open_and_decode_s32()
12003	- drflac_open_and_decode_s16()
12004	- drflac_open_and_decode_f32()
12005	- drflac_open_and_decode_file_s32()
12006	- drflac_open_and_decode_file_s16()
12007	- drflac_open_and_decode_file_f32()
12008	- drflac_open_and_decode_memory_s32()
12009	- drflac_open_and_decode_memory_s16()
12010	- drflac_open_and_decode_memory_f32()
12011	- Remove drflac.totalSampleCount which is now replaced with drflac.totalPCMFrameCount. You can emulate drflac.totalSampleCount
12012	by doing pFlac->totalPCMFrameCountpFlac->channels.*
12013	- Rename drflac.currentFrame to drflac.currentFLACFrame to remove ambiguity with PCM frames.
12014	- Fix errors when seeking to the end of a stream.
12015	- Optimizations to seeking.
12016	- SSE improvements and optimizations.
12017	- ARM NEON optimizations.
12018	- Optimizations to drflac_read_pcm_frames_s16().
12019	- Optimizations to drflac_read_pcm_frames_s32().
12020
12021	v0.11.10 - 2019-06-26
12022	- Fix a compiler error.
12023
12024	v0.11.9 - 2019-06-16
12025	- Silence some ThreadSanitizer warnings.
12026
12027	v0.11.8 - 2019-05-21
12028	- Fix warnings.
12029
12030	v0.11.7 - 2019-05-06
12031	- C89 fixes.
12032
12033	v0.11.6 - 2019-05-05
12034	- Add support for C89.
12035	- Fix a compiler warning when CRC is disabled.
12036	- Change license to choice of public domain or MIT-0.
12037
12038	v0.11.5 - 2019-04-19
12039	- Fix a compiler error with GCC.
12040
12041	v0.11.4 - 2019-04-17
12042	- Fix some warnings with GCC when compiling with -std=c99.
12043
12044	v0.11.3 - 2019-04-07
12045	- Silence warnings with GCC.
12046
12047	v0.11.2 - 2019-03-10
12048	- Fix a warning.
12049
12050	v0.11.1 - 2019-02-17
12051	- Fix a potential bug with seeking.
12052
12053	v0.11.0 - 2018-12-16
12054	- API CHANGE: Deprecated drflac_read_s32(), drflac_read_s16() and drflac_read_f32() and replaced them with
12055	drflac_read_pcm_frames_s32(), drflac_read_pcm_frames_s16() and drflac_read_pcm_frames_f32(). The new APIs take
12056	and return PCM frame counts instead of sample counts. To upgrade you will need to change the input count by
12057	dividing it by the channel count, and then do the same with the return value.
12058	- API_CHANGE: Deprecated drflac_seek_to_sample() and replaced with drflac_seek_to_pcm_frame(). Same rules as
12059	the changes to drflac_read_() apply.*
12060	- API CHANGE: Deprecated drflac_open_and_decode_() and replaced with drflac_open__and_read_(). Same rules as*
12061	the changes to drflac_read_() apply.*
12062	- Optimizations.
12063
12064	v0.10.0 - 2018-09-11
12065	- Remove the DR_FLAC_NO_WIN32_IO option and the Win32 file IO functionality. If you need to use Win32 file IO you
12066	need to do it yourself via the callback API.
12067	- Fix the clang build.
12068	- Fix undefined behavior.
12069	- Fix errors with CUESHEET metdata blocks.
12070	- Add an API for iterating over each cuesheet track in the CUESHEET metadata block. This works the same way as the
12071	Vorbis comment API.
12072	- Other miscellaneous bug fixes, mostly relating to invalid FLAC streams.
12073	- Minor optimizations.
12074
12075	v0.9.11 - 2018-08-29
12076	- Fix a bug with sample reconstruction.
12077
12078	v0.9.10 - 2018-08-07
12079	- Improve 64-bit detection.
12080
12081	v0.9.9 - 2018-08-05
12082	- Fix C++ build on older versions of GCC.
12083
12084	v0.9.8 - 2018-07-24
12085	- Fix compilation errors.
12086
12087	v0.9.7 - 2018-07-05
12088	- Fix a warning.
12089
12090	v0.9.6 - 2018-06-29
12091	- Fix some typos.
12092
12093	v0.9.5 - 2018-06-23
12094	- Fix some warnings.
12095
12096	v0.9.4 - 2018-06-14
12097	- Optimizations to seeking.
12098	- Clean up.
12099
12100	v0.9.3 - 2018-05-22
12101	- Bug fix.
12102
12103	v0.9.2 - 2018-05-12
12104	- Fix a compilation error due to a missing break statement.
12105
12106	v0.9.1 - 2018-04-29
12107	- Fix compilation error with Clang.
12108
12109	v0.9 - 2018-04-24
12110	- Fix Clang build.
12111	- Start using major.minor.revision versioning.
12112
12113	v0.8g - 2018-04-19
12114	- Fix build on non-x86/x64 architectures.
12115
12116	v0.8f - 2018-02-02
12117	- Stop pretending to support changing rate/channels mid stream.
12118
12119	v0.8e - 2018-02-01
12120	- Fix a crash when the block size of a frame is larger than the maximum block size defined by the FLAC stream.
12121	- Fix a crash the the Rice partition order is invalid.
12122
12123	v0.8d - 2017-09-22
12124	- Add support for decoding streams with ID3 tags. ID3 tags are just skipped.
12125
12126	v0.8c - 2017-09-07
12127	- Fix warning on non-x86/x64 architectures.
12128
12129	v0.8b - 2017-08-19
12130	- Fix build on non-x86/x64 architectures.
12131
12132	v0.8a - 2017-08-13
12133	- A small optimization for the Clang build.
12134
12135	v0.8 - 2017-08-12
12136	- API CHANGE: Rename dr_ types to drflac_.
12137	- Optimizations. This brings dr_flac back to about the same class of efficiency as the reference implementation.
12138	- Add support for custom implementations of malloc(), realloc(), etc.
12139	- Add CRC checking to Ogg encapsulated streams.
12140	- Fix VC++ 6 build. This is only for the C++ compiler. The C compiler is not currently supported.
12141	- Bug fixes.
12142
12143	v0.7 - 2017-07-23
12144	- Add support for opening a stream without a header block. To do this, use drflac_open_relaxed() / drflac_open_with_metadata_relaxed().
12145
12146	v0.6 - 2017-07-22
12147	- Add support for recovering from invalid frames. With this change, dr_flac will simply skip over invalid frames as if they
12148	never existed. Frames are checked against their sync code, the CRC-8 of the frame header and the CRC-16 of the whole frame.
12149
12150	v0.5 - 2017-07-16
12151	- Fix typos.
12152	- Change drflac_bool types to unsigned.*
12153	- Add CRC checking. This makes dr_flac slower, but can be disabled with #define DR_FLAC_NO_CRC.
12154
12155	v0.4f - 2017-03-10
12156	- Fix a couple of bugs with the bitstreaming code.
12157
12158	v0.4e - 2017-02-17
12159	- Fix some warnings.
12160
12161	v0.4d - 2016-12-26
12162	- Add support for 32-bit floating-point PCM decoding.
12163	- Use drflac_int and drflac_uint* sized types to improve compiler support.*
12164	- Minor improvements to documentation.
12165
12166	v0.4c - 2016-12-26
12167	- Add support for signed 16-bit integer PCM decoding.
12168
12169	v0.4b - 2016-10-23
12170	- A minor change to drflac_bool8 and drflac_bool32 types.
12171
12172	v0.4a - 2016-10-11
12173	- Rename drBool32 to drflac_bool32 for styling consistency.
12174
12175	v0.4 - 2016-09-29
12176	- API/ABI CHANGE: Use fixed size 32-bit booleans instead of the built-in bool type.
12177	- API CHANGE: Rename drflac_open_and_decode() to drflac_open_and_decode_s32().
12178	- API CHANGE: Swap the order of "channels" and "sampleRate" parameters in drflac_open_and_decode(). Rationale for this is to*
12179	keep it consistent with drflac_audio.
12180
12181	v0.3f - 2016-09-21
12182	- Fix a warning with GCC.
12183
12184	v0.3e - 2016-09-18
12185	- Fixed a bug where GCC 4.3+ was not getting properly identified.
12186	- Fixed a few typos.
12187	- Changed date formats to ISO 8601 (YYYY-MM-DD).
12188
12189	v0.3d - 2016-06-11
12190	- Minor clean up.
12191
12192	v0.3c - 2016-05-28
12193	- Fixed compilation error.
12194
12195	v0.3b - 2016-05-16
12196	- Fixed Linux/GCC build.
12197	- Updated documentation.
12198
12199	v0.3a - 2016-05-15
12200	- Minor fixes to documentation.
12201
12202	v0.3 - 2016-05-11
12203	- Optimizations. Now at about parity with the reference implementation on 32-bit builds.
12204	- Lots of clean up.
12205
12206	v0.2b - 2016-05-10
12207	- Bug fixes.
12208
12209	v0.2a - 2016-05-10
12210	- Made drflac_open_and_decode() more robust.
12211	- Removed an unused debugging variable
12212
12213	v0.2 - 2016-05-09
12214	- Added support for Ogg encapsulation.
12215	- API CHANGE. Have the onSeek callback take a third argument which specifies whether or not the seek
12216	should be relative to the start or the current position. Also changes the seeking rules such that
12217	seeking offsets will never be negative.
12218	- Have drflac_open_and_decode() fail gracefully if the stream has an unknown total sample count.
12219
12220	v0.1b - 2016-05-07
12221	- Properly close the file handle in drflac_open_file() and family when the decoder fails to initialize.
12222	- Removed a stale comment.
12223
12224	v0.1a - 2016-05-05
12225	- Minor formatting changes.
12226	- Fixed a warning on the GCC build.
12227
12228	v0.1 - 2016-05-03
12229	- Initial versioned release.
12230	*/
12231
12232	/*
12233	This software is available as a choice of the following licenses. Choose
12234	whichever you prefer.
12235
12236	===============================================================================
12237	ALTERNATIVE 1 - Public Domain (www.unlicense.org)
12238	===============================================================================
12239	This is free and unencumbered software released into the public domain.
12240
12241	Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
12242	software, either in source code form or as a compiled binary, for any purpose,
12243	commercial or non-commercial, and by any means.
12244
12245	In jurisdictions that recognize copyright laws, the author or authors of this
12246	software dedicate any and all copyright interest in the software to the public
12247	domain. We make this dedication for the benefit of the public at large and to
12248	the detriment of our heirs and successors. We intend this dedication to be an
12249	overt act of relinquishment in perpetuity of all present and future rights to
12250	this software under copyright law.
12251
12252	THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12253	IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
12254	FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
12255	AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
12256	ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
12257	WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
12258
12259	For more information, please refer to <http://unlicense.org/>
12260
12261	===============================================================================
12262	ALTERNATIVE 2 - MIT No Attribution
12263	===============================================================================
12264	Copyright 2020 David Reid
12265
12266	Permission is hereby granted, free of charge, to any person obtaining a copy of
12267	this software and associated documentation files (the "Software"), to deal in
12268	the Software without restriction, including without limitation the rights to
12269	use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
12270	of the Software, and to permit persons to whom the Software is furnished to do
12271	so.
12272
12273	THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12274	IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
12275	FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
12276	AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
12277	LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
12278	OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
12279	SOFTWARE.
12280	*/
12281

Browse the source code of LOVE/libraries/dr_flac/dr_flac.h