1/* Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
2
3 This program is free software; you can redistribute it and/or modify
4 it under the terms of the GNU General Public License as published by
5 the Free Software Foundation; version 2 of the License.
6
7 This program is distributed in the hope that it will be useful,
8 but WITHOUT ANY WARRANTY; without even the implied warranty of
9 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
10 GNU General Public License for more details.
11
12 You should have received a copy of the GNU General Public License
13 along with this program; if not, write to the Free Software
14 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
15
16/*
17 rdtsc3 -- multi-platform timer code
18 pgulutzan@mysql.com, 2005-08-29
19 modified 2008-11-02
20
21 Functions:
22
23 my_timer_cycles ulonglong cycles
24 my_timer_nanoseconds ulonglong nanoseconds
25 my_timer_microseconds ulonglong "microseconds"
26 my_timer_milliseconds ulonglong milliseconds
27 my_timer_ticks ulonglong ticks
28 my_timer_init initialization / test
29
30 We'll call the first 5 functions (the ones that return
31 a ulonglong) "my_timer_xxx" functions.
32 Each my_timer_xxx function returns a 64-bit timing value
33 since an arbitrary 'epoch' start. Since the only purpose
34 is to determine elapsed times, wall-clock time-of-day
35 is not known and not relevant.
36
37 The my_timer_init function is necessary for initializing.
38 It returns information (underlying routine name,
39 frequency, resolution, overhead) about all my_timer_xxx
40 functions. A program should call my_timer_init once,
41 use the information to decide what my_timer_xxx function
42 to use, and subsequently call that function by function
43 pointer.
44
45 A typical use would be:
46 my_timer_init() ... once, at program start
47 ...
48 time1= my_timer_xxx() ... time before start
49 [code that's timed]
50 time2= my_timer_xxx() ... time after end
51 elapsed_time= (time2 - time1) - overhead
52*/
53
54#include "my_global.h"
55#include "my_rdtsc.h"
56
57#if defined(_WIN32)
58#include <stdio.h>
59#include "windows.h"
60#else
61#include <stdio.h>
62#endif
63
64#if !defined(_WIN32)
65#if TIME_WITH_SYS_TIME
66#include <sys/time.h>
67#include <time.h> /* for clock_gettime */
68#else
69#if HAVE_SYS_TIME_H
70#include <sys/time.h>
71#elif defined(HAVE_TIME_H)
72#include <time.h>
73#endif
74#endif
75#endif
76
77#if defined(HAVE_ASM_MSR_H) && defined(HAVE_RDTSCLL)
78#include <asm/msr.h> /* for rdtscll */
79#endif
80
81#if defined(HAVE_SYS_TIMEB_H) && defined(HAVE_FTIME)
82#include <sys/timeb.h> /* for ftime */
83#endif
84
85#if defined(HAVE_SYS_TIMES_H) && defined(HAVE_TIMES)
86#include <sys/times.h> /* for times */
87#endif
88
89#if defined(__INTEL_COMPILER) && defined(__ia64__) && defined(HAVE_IA64INTRIN_H)
90#include <ia64intrin.h> /* for __GetReg */
91#endif
92
93#if defined(__APPLE__) && defined(__MACH__)
94#include <mach/mach_time.h>
95#endif
96
97#if defined(__SUNPRO_CC) && defined(__sparcv9) && defined(_LP64) && !defined(__SunOS_5_7)
98extern "C" ulonglong my_timer_cycles_il_sparc64();
99#elif defined(__SUNPRO_CC) && defined(_ILP32) && !defined(__SunOS_5_7)
100extern "C" ulonglong my_timer_cycles_il_sparc32();
101#elif defined(__SUNPRO_CC) && defined(__i386) && defined(_ILP32)
102extern "C" ulonglong my_timer_cycles_il_i386();
103#elif defined(__SUNPRO_CC) && defined(__x86_64) && defined(_LP64)
104extern "C" ulonglong my_timer_cycles_il_x86_64();
105#elif defined(__SUNPRO_C) && defined(__sparcv9) && defined(_LP64) && !defined(__SunOS_5_7)
106ulonglong my_timer_cycles_il_sparc64();
107#elif defined(__SUNPRO_C) && defined(_ILP32) && !defined(__SunOS_5_7)
108ulonglong my_timer_cycles_il_sparc32();
109#elif defined(__SUNPRO_C) && defined(__i386) && defined(_ILP32)
110ulonglong my_timer_cycles_il_i386();
111#elif defined(__SUNPRO_C) && defined(__x86_64) && defined(_LP64)
112ulonglong my_timer_cycles_il_x86_64();
113#endif
114
115#if defined(__INTEL_COMPILER)
116/*
117 icc warning #1011 is:
118 missing return statement at end of non-void function
119*/
120#pragma warning (disable:1011)
121#endif
122
123/*
124 For cycles, we depend on RDTSC for x86 platforms,
125 or on time buffer (which is not really a cycle count
126 but a separate counter with less than nanosecond
127 resolution) for most PowerPC platforms, or on
128 gethrtime which is okay for hpux and solaris, or on
129 clock_gettime(CLOCK_SGI_CYCLE) for Irix platforms,
130 or on read_real_time for aix platforms. There is
131 nothing for Alpha platforms, they would be tricky.
132
133 On the platforms that do not have a CYCLE timer,
134 "wait" events are initialized to use NANOSECOND instead of CYCLE
135 during performance_schema initialization (at the server startup).
136
137 Linux performance monitor (see "man perf_event_open") can
138 provide cycle counter on the platforms that do not have
139 other kinds of cycle counters. But we don't use it so far.
140
141 ARM notes
142 ---------
143 During tests on ARMv7 Debian, perf_even_open() based cycle counter provided
144 too low frequency with too high overhead:
145 MariaDB [performance_schema]> SELECT * FROM performance_timers;
146 +-------------+-----------------+------------------+----------------+
147 | TIMER_NAME | TIMER_FREQUENCY | TIMER_RESOLUTION | TIMER_OVERHEAD |
148 +-------------+-----------------+------------------+----------------+
149 | CYCLE | 689368159 | 1 | 970 |
150 | NANOSECOND | 1000000000 | 1 | 308 |
151 | MICROSECOND | 1000000 | 1 | 417 |
152 | MILLISECOND | 1000 | 1000 | 407 |
153 | TICK | 127 | 1 | 612 |
154 +-------------+-----------------+------------------+----------------+
155 Therefore, it was decided not to use perf_even_open() on ARM
156 (i.e. go without CYCLE and have "wait" events use NANOSECOND by default).
157*/
158
159ulonglong my_timer_cycles(void)
160{
161#if defined(__GNUC__) && defined(__i386__)
162 /* This works much better if compiled with "gcc -O3". */
163 ulonglong result;
164 __asm__ __volatile__ ("rdtsc" : "=A" (result));
165 return result;
166#elif defined(__SUNPRO_C) && defined(__i386)
167 __asm("rdtsc");
168#elif defined(__GNUC__) && defined(__x86_64__)
169 ulonglong result;
170 __asm__ __volatile__ ("rdtsc\n\t" \
171 "shlq $32,%%rdx\n\t" \
172 "orq %%rdx,%%rax"
173 : "=a" (result) :: "%edx");
174 return result;
175#elif defined(HAVE_ASM_MSR_H) && defined(HAVE_RDTSCLL)
176 {
177 ulonglong result;
178 rdtscll(result);
179 return result;
180 }
181#elif defined(_WIN32) && defined(_M_IX86)
182 __asm {rdtsc};
183#elif defined(_WIN64) && defined(_M_X64)
184 /* For 64-bit Windows: unsigned __int64 __rdtsc(); */
185 return __rdtsc();
186#elif defined(__INTEL_COMPILER) && defined(__ia64__) && defined(HAVE_IA64INTRIN_H)
187 return (ulonglong) __getReg(_IA64_REG_AR_ITC); /* (3116) */
188#elif defined(__GNUC__) && defined(__ia64__)
189 {
190 ulonglong result;
191 __asm __volatile__ ("mov %0=ar.itc" : "=r" (result));
192 return result;
193 }
194#elif defined(__GNUC__) && (defined(__powerpc__) || defined(__POWERPC__) || (defined(_POWER) && defined(_AIX52))) && (defined(__64BIT__) || defined(_ARCH_PPC64))
195 {
196 ulonglong result;
197 __asm __volatile__ ("mftb %0" : "=r" (result));
198 return result;
199 }
200#elif defined(__GNUC__) && (defined(__powerpc__) || defined(__POWERPC__) || (defined(_POWER) && defined(_AIX52))) && (!defined(__64BIT__) && !defined(_ARCH_PPC64))
201 {
202 /*
203 mftbu means "move from time-buffer-upper to result".
204 The loop is saying: x1=upper, x2=lower, x3=upper,
205 if x1!=x3 there was an overflow so repeat.
206 */
207 unsigned int x1, x2, x3;
208 ulonglong result;
209 for (;;)
210 {
211 __asm __volatile__ ( "mftbu %0" : "=r"(x1) );
212 __asm __volatile__ ( "mftb %0" : "=r"(x2) );
213 __asm __volatile__ ( "mftbu %0" : "=r"(x3) );
214 if (x1 == x3) break;
215 }
216 result = x1;
217 return ( result << 32 ) | x2;
218 }
219#elif (defined(__SUNPRO_CC) || defined(__SUNPRO_C)) && defined(__sparcv9) && defined(_LP64) && !defined(__SunOS_5_7)
220 return (my_timer_cycles_il_sparc64());
221#elif (defined(__SUNPRO_CC) || defined(__SUNPRO_C)) && defined(_ILP32) && !defined(__SunOS_5_7)
222 return (my_timer_cycles_il_sparc32());
223#elif (defined(__SUNPRO_CC) || defined(__SUNPRO_C)) && defined(__i386) && defined(_ILP32)
224 /* This is probably redundant for __SUNPRO_C. */
225 return (my_timer_cycles_il_i386());
226#elif (defined(__SUNPRO_CC) || defined(__SUNPRO_C)) && defined(__x86_64) && defined(_LP64)
227 return (my_timer_cycles_il_x86_64());
228#elif defined(__GNUC__) && defined(__sparcv9) && defined(_LP64) && (__GNUC__>2)
229 {
230 ulonglong result;
231 __asm __volatile__ ("rd %%tick,%0" : "=r" (result));
232 return result;
233 }
234#elif defined(__GNUC__) && defined(__sparc__) && !defined(_LP64) && (__GNUC__>2)
235 {
236 union {
237 ulonglong wholeresult;
238 struct {
239 ulong high;
240 ulong low;
241 } splitresult;
242 } result;
243 __asm __volatile__ ("rd %%tick,%1; srlx %1,32,%0" : "=r" (result.splitresult.high), "=r" (result.splitresult.low));
244 return result.wholeresult;
245 }
246#elif defined(__sgi) && defined(HAVE_CLOCK_GETTIME) && defined(CLOCK_SGI_CYCLE)
247 {
248 struct timespec tp;
249 clock_gettime(CLOCK_SGI_CYCLE, &tp);
250 return (ulonglong) tp.tv_sec * 1000000000 + (ulonglong) tp.tv_nsec;
251 }
252#elif defined(__GNUC__) && defined(__s390__)
253 /* covers both s390 and s390x */
254 {
255 ulonglong result;
256 __asm__ __volatile__ ("stck %0" : "=Q" (result) : : "cc");
257 return result;
258 }
259#elif defined(HAVE_SYS_TIMES_H) && defined(HAVE_GETHRTIME)
260 /* gethrtime may appear as either cycle or nanosecond counter */
261 return (ulonglong) gethrtime();
262#else
263 return 0;
264#endif
265}
266
267#if defined(__INTEL_COMPILER)
268/* re-enable warning#1011 which was only for my_timer_cycles() */
269/* There may be an icc bug which means we must leave disabled. */
270#pragma warning (default:1011)
271#endif
272
273/*
274 For nanoseconds, most platforms have nothing available that
275 (a) doesn't require bringing in a 40-kb librt.so library
276 (b) really has nanosecond resolution.
277*/
278
279ulonglong my_timer_nanoseconds(void)
280{
281#if defined(HAVE_READ_REAL_TIME)
282 {
283 timebasestruct_t tr;
284 read_real_time(&tr, TIMEBASE_SZ);
285 return (ulonglong) tr.tb_high * 1000000000 + (ulonglong) tr.tb_low;
286 }
287#elif defined(HAVE_SYS_TIMES_H) && defined(HAVE_GETHRTIME)
288 /* SunOS 5.10+, Solaris, HP-UX: hrtime_t gethrtime(void) */
289 return (ulonglong) gethrtime();
290#elif defined(HAVE_CLOCK_GETTIME) && defined(CLOCK_REALTIME)
291 {
292 struct timespec tp;
293 clock_gettime(CLOCK_REALTIME, &tp);
294 return (ulonglong) tp.tv_sec * 1000000000 + (ulonglong) tp.tv_nsec;
295 }
296#elif defined(__APPLE__) && defined(__MACH__)
297 {
298 ulonglong tm;
299 static mach_timebase_info_data_t timebase_info= {0,0};
300 if (timebase_info.denom == 0)
301 (void) mach_timebase_info(&timebase_info);
302 tm= mach_absolute_time();
303 return (tm * timebase_info.numer) / timebase_info.denom;
304 }
305#else
306 return 0;
307#endif
308}
309
310/*
311 For microseconds, gettimeofday() is available on
312 almost all platforms. On Windows we use
313 QueryPerformanceCounter which will usually tick over
314 3.5 million times per second, and we don't throw
315 away the extra precision. (On Windows Server 2003
316 the frequency is same as the cycle frequency.)
317*/
318
319ulonglong my_timer_microseconds(void)
320{
321#if defined(HAVE_GETTIMEOFDAY)
322 {
323 static ulonglong last_value= 0;
324 struct timeval tv;
325 if (gettimeofday(&tv, NULL) == 0)
326 last_value= (ulonglong) tv.tv_sec * 1000000 + (ulonglong) tv.tv_usec;
327 else
328 {
329 /*
330 There are reports that gettimeofday(2) can have intermittent failures
331 on some platform, see for example Bug#36819.
332 We are not trying again or looping, just returning the best value possible
333 under the circumstances ...
334 */
335 last_value++;
336 }
337 return last_value;
338 }
339#elif defined(_WIN32)
340 {
341 /* QueryPerformanceCounter usually works with about 1/3 microsecond. */
342 LARGE_INTEGER t_cnt;
343
344 QueryPerformanceCounter(&t_cnt);
345 return (ulonglong) t_cnt.QuadPart;
346 }
347#else
348 return 0;
349#endif
350}
351
352/*
353 For milliseconds, we use ftime() if it's supported
354 or time()*1000 if it's not. With modern versions of
355 Windows and with HP Itanium, resolution is 10-15
356 milliseconds.
357*/
358
359ulonglong my_timer_milliseconds(void)
360{
361#if defined(HAVE_SYS_TIMEB_H) && defined(HAVE_FTIME)
362 /* ftime() is obsolete but maybe the platform is old */
363 struct timeb ft;
364 ftime(&ft);
365 return (ulonglong)ft.time * 1000 + (ulonglong)ft.millitm;
366#elif defined(HAVE_TIME)
367 return (ulonglong) time(NULL) * 1000;
368#elif defined(_WIN32)
369 FILETIME ft;
370 GetSystemTimeAsFileTime( &ft );
371 return ((ulonglong)ft.dwLowDateTime +
372 (((ulonglong)ft.dwHighDateTime) << 32))/10000;
373#else
374 return 0;
375#endif
376}
377
378/*
379 For ticks, which we handle with times(), the frequency
380 is usually 100/second and the overhead is surprisingly
381 bad, sometimes even worse than gettimeofday's overhead.
382*/
383
384ulonglong my_timer_ticks(void)
385{
386#if defined(HAVE_SYS_TIMES_H) && defined(HAVE_TIMES)
387 {
388 struct tms times_buf;
389 return (ulonglong) times(&times_buf);
390 }
391#elif defined(_WIN32)
392 return (ulonglong) GetTickCount();
393#else
394 return 0;
395#endif
396}
397
398/*
399 The my_timer_init() function and its sub-functions
400 have several loops which call timers. If there's
401 something wrong with a timer -- which has never
402 happened in tests -- we want the loop to end after
403 an arbitrary number of iterations, and my_timer_info
404 will show a discouraging result. The arbitrary
405 number is 1,000,000.
406*/
407#define MY_TIMER_ITERATIONS 1000000
408
409/*
410 Calculate overhead. Called from my_timer_init().
411 Usually best_timer_overhead = cycles.overhead or
412 nanoseconds.overhead, so returned amount is in
413 cycles or nanoseconds. We repeat the calculation
414 ten times, so that we can disregard effects of
415 caching or interrupts. Result is quite consistent
416 for cycles, at least. But remember it's a minimum.
417*/
418
419static void my_timer_init_overhead(ulonglong *overhead,
420 ulonglong (*cycle_timer)(void),
421 ulonglong (*this_timer)(void),
422 ulonglong best_timer_overhead)
423{
424 ulonglong time1, time2;
425 int i;
426
427 /* *overhead, least of 20 calculations - cycles.overhead */
428 for (i= 0, *overhead= 1000000000; i < 20; ++i)
429 {
430 time1= cycle_timer();
431 this_timer(); /* rather than 'time_tmp= timer();' */
432 time2= cycle_timer() - time1;
433 if (*overhead > time2)
434 *overhead= time2;
435 }
436 *overhead-= best_timer_overhead;
437}
438
439/*
440 Calculate Resolution. Called from my_timer_init().
441 If a timer goes up by jumps, e.g. 1050, 1075, 1100, ...
442 then the best resolution is the minimum jump, e.g. 25.
443 If it's always divisible by 1000 then it's just a
444 result of multiplication of a lower-precision timer
445 result, e.g. nanoseconds are often microseconds * 1000.
446 If the minimum jump is less than an arbitrary passed
447 figure (a guess based on maximum overhead * 2), ignore.
448 Usually we end up with nanoseconds = 1 because it's too
449 hard to detect anything <= 100 nanoseconds.
450 Often GetTickCount() has resolution = 15.
451 We don't check with ticks because they take too long.
452*/
453static ulonglong my_timer_init_resolution(ulonglong (*this_timer)(void),
454 ulonglong overhead_times_2)
455{
456 ulonglong time1, time2;
457 ulonglong best_jump;
458 int i, jumps, divisible_by_1000, divisible_by_1000000;
459
460 divisible_by_1000= divisible_by_1000000= 0;
461 best_jump= 1000000;
462 for (i= jumps= 0; jumps < 3 && i < MY_TIMER_ITERATIONS * 10; ++i)
463 {
464 time1= this_timer();
465 time2= this_timer();
466 time2-= time1;
467 if (time2)
468 {
469 ++jumps;
470 if (!(time2 % 1000))
471 {
472 ++divisible_by_1000;
473 if (!(time2 % 1000000))
474 ++divisible_by_1000000;
475 }
476 if (best_jump > time2)
477 best_jump= time2;
478 /* For milliseconds, one jump is enough. */
479 if (overhead_times_2 == 0)
480 break;
481 }
482 }
483 if (jumps == 3)
484 {
485 if (jumps == divisible_by_1000000)
486 return 1000000;
487 if (jumps == divisible_by_1000)
488 return 1000;
489 }
490 if (best_jump > overhead_times_2)
491 return best_jump;
492 return 1;
493}
494
495/*
496 Calculate cycle frequency by seeing how many cycles pass
497 in a 200-microsecond period. I tried with 10-microsecond
498 periods originally, and the result was often very wrong.
499*/
500
501static ulonglong my_timer_init_frequency(MY_TIMER_INFO *mti)
502{
503 int i;
504 ulonglong time1, time2, time3, time4;
505 time1= my_timer_cycles();
506 time2= my_timer_microseconds();
507 time3= time2; /* Avoids a Microsoft/IBM compiler warning */
508 for (i= 0; i < MY_TIMER_ITERATIONS; ++i)
509 {
510 time3= my_timer_microseconds();
511 if (time3 - time2 > 200) break;
512 }
513 time4= my_timer_cycles() - mti->cycles.overhead;
514 time4-= mti->microseconds.overhead;
515 return (mti->microseconds.frequency * (time4 - time1)) / (time3 - time2);
516}
517
518/*
519 Call my_timer_init before the first call to my_timer_xxx().
520 If something must be initialized, it happens here.
521 Set: what routine is being used e.g. "asm_x86"
522 Set: function, overhead, actual frequency, resolution.
523*/
524
525void my_timer_init(MY_TIMER_INFO *mti)
526{
527 ulonglong (*best_timer)(void);
528 ulonglong best_timer_overhead;
529 ulonglong time1, time2;
530 int i;
531
532 /* cycles */
533 mti->cycles.frequency= 1000000000;
534#if defined(__GNUC__) && defined(__i386__)
535 mti->cycles.routine= MY_TIMER_ROUTINE_ASM_X86;
536#elif defined(__SUNPRO_C) && defined(__i386)
537 mti->cycles.routine= MY_TIMER_ROUTINE_ASM_X86;
538#elif defined(__GNUC__) && defined(__x86_64__)
539 mti->cycles.routine= MY_TIMER_ROUTINE_ASM_X86_64;
540#elif defined(HAVE_ASM_MSR_H) && defined(HAVE_RDTSCLL)
541 mti->cycles.routine= MY_TIMER_ROUTINE_RDTSCLL;
542#elif defined(_WIN32) && defined(_M_IX86)
543 mti->cycles.routine= MY_TIMER_ROUTINE_ASM_X86_WIN;
544#elif defined(_WIN64) && defined(_M_X64)
545 mti->cycles.routine= MY_TIMER_ROUTINE_RDTSC;
546#elif defined(__INTEL_COMPILER) && defined(__ia64__) && defined(HAVE_IA64INTRIN_H)
547 mti->cycles.routine= MY_TIMER_ROUTINE_ASM_IA64;
548#elif defined(__GNUC__) && defined(__ia64__)
549 mti->cycles.routine= MY_TIMER_ROUTINE_ASM_IA64;
550#elif defined(__GNUC__) && (defined(__powerpc__) || defined(__POWERPC__) || (defined(_POWER) && defined(_AIX52))) && (defined(__64BIT__) || defined(_ARCH_PPC64))
551 mti->cycles.routine= MY_TIMER_ROUTINE_ASM_PPC64;
552#elif defined(__GNUC__) && (defined(__powerpc__) || defined(__POWERPC__) || (defined(_POWER) && defined(_AIX52))) && (!defined(__64BIT__) && !defined(_ARCH_PPC64))
553 mti->cycles.routine= MY_TIMER_ROUTINE_ASM_PPC;
554#elif (defined(__SUNPRO_CC) || defined(__SUNPRO_C)) && defined(__sparcv9) && defined(_LP64) && !defined(__SunOS_5_7)
555 mti->cycles.routine= MY_TIMER_ROUTINE_ASM_SUNPRO_SPARC64;
556#elif (defined(__SUNPRO_CC) || defined(__SUNPRO_C)) && defined(_ILP32) && !defined(__SunOS_5_7)
557 mti->cycles.routine= MY_TIMER_ROUTINE_ASM_SUNPRO_SPARC32;
558#elif (defined(__SUNPRO_CC) || defined(__SUNPRO_C)) && defined(__i386) && defined(_ILP32)
559 mti->cycles.routine= MY_TIMER_ROUTINE_ASM_SUNPRO_I386;
560#elif (defined(__SUNPRO_CC) || defined(__SUNPRO_C)) && defined(__x86_64) && defined(_LP64)
561 mti->cycles.routine= MY_TIMER_ROUTINE_ASM_SUNPRO_X86_64;
562#elif defined(__GNUC__) && defined(__sparcv9) && defined(_LP64) && (__GNUC__>2)
563 mti->cycles.routine= MY_TIMER_ROUTINE_ASM_GCC_SPARC64;
564#elif defined(__GNUC__) && defined(__sparc__) && !defined(_LP64) && (__GNUC__>2)
565 mti->cycles.routine= MY_TIMER_ROUTINE_ASM_GCC_SPARC32;
566#elif defined(__sgi) && defined(HAVE_CLOCK_GETTIME) && defined(CLOCK_SGI_CYCLE)
567 mti->cycles.routine= MY_TIMER_ROUTINE_SGI_CYCLE;
568#elif defined(__GNUC__) && defined(__s390__)
569 mti->cycles.routine= MY_TIMER_ROUTINE_ASM_S390;
570#elif defined(HAVE_SYS_TIMES_H) && defined(HAVE_GETHRTIME)
571 mti->cycles.routine= MY_TIMER_ROUTINE_GETHRTIME;
572#else
573 mti->cycles.routine= 0;
574#endif
575
576 if (!mti->cycles.routine || !my_timer_cycles())
577 {
578 mti->cycles.routine= 0;
579 mti->cycles.resolution= 0;
580 mti->cycles.frequency= 0;
581 mti->cycles.overhead= 0;
582 }
583
584 /* nanoseconds */
585 mti->nanoseconds.frequency= 1000000000; /* initial assumption */
586#if defined(HAVE_READ_REAL_TIME)
587 mti->nanoseconds.routine= MY_TIMER_ROUTINE_READ_REAL_TIME;
588#elif defined(HAVE_SYS_TIMES_H) && defined(HAVE_GETHRTIME)
589 mti->nanoseconds.routine= MY_TIMER_ROUTINE_GETHRTIME;
590#elif defined(HAVE_CLOCK_GETTIME)
591 mti->nanoseconds.routine= MY_TIMER_ROUTINE_CLOCK_GETTIME;
592#elif defined(__APPLE__) && defined(__MACH__)
593 mti->nanoseconds.routine= MY_TIMER_ROUTINE_MACH_ABSOLUTE_TIME;
594#else
595 mti->nanoseconds.routine= 0;
596#endif
597 if (!mti->nanoseconds.routine || !my_timer_nanoseconds())
598 {
599 mti->nanoseconds.routine= 0;
600 mti->nanoseconds.resolution= 0;
601 mti->nanoseconds.frequency= 0;
602 mti->nanoseconds.overhead= 0;
603 }
604
605 /* microseconds */
606 mti->microseconds.frequency= 1000000; /* initial assumption */
607#if defined(HAVE_GETTIMEOFDAY)
608 mti->microseconds.routine= MY_TIMER_ROUTINE_GETTIMEOFDAY;
609#elif defined(_WIN32)
610 {
611 LARGE_INTEGER li;
612 /* Windows: typical frequency = 3579545, actually 1/3 microsecond. */
613 if (!QueryPerformanceFrequency(&li))
614 mti->microseconds.routine= 0;
615 else
616 {
617 mti->microseconds.frequency= li.QuadPart;
618 mti->microseconds.routine= MY_TIMER_ROUTINE_QUERYPERFORMANCECOUNTER;
619 }
620 }
621#else
622 mti->microseconds.routine= 0;
623#endif
624 if (!mti->microseconds.routine || !my_timer_microseconds())
625 {
626 mti->microseconds.routine= 0;
627 mti->microseconds.resolution= 0;
628 mti->microseconds.frequency= 0;
629 mti->microseconds.overhead= 0;
630 }
631
632 /* milliseconds */
633 mti->milliseconds.frequency= 1000; /* initial assumption */
634#if defined(HAVE_SYS_TIMEB_H) && defined(HAVE_FTIME)
635 mti->milliseconds.routine= MY_TIMER_ROUTINE_FTIME;
636#elif defined(_WIN32)
637 mti->milliseconds.routine= MY_TIMER_ROUTINE_GETSYSTEMTIMEASFILETIME;
638#elif defined(HAVE_TIME)
639 mti->milliseconds.routine= MY_TIMER_ROUTINE_TIME;
640#else
641 mti->milliseconds.routine= 0;
642#endif
643 if (!mti->milliseconds.routine || !my_timer_milliseconds())
644 {
645 mti->milliseconds.routine= 0;
646 mti->milliseconds.resolution= 0;
647 mti->milliseconds.frequency= 0;
648 mti->milliseconds.overhead= 0;
649 }
650
651 /* ticks */
652 mti->ticks.frequency= 100; /* permanent assumption */
653#if defined(HAVE_SYS_TIMES_H) && defined(HAVE_TIMES)
654 mti->ticks.routine= MY_TIMER_ROUTINE_TIMES;
655#elif defined(_WIN32)
656 mti->ticks.routine= MY_TIMER_ROUTINE_GETTICKCOUNT;
657#else
658 mti->ticks.routine= 0;
659#endif
660 if (!mti->ticks.routine || !my_timer_ticks())
661 {
662 mti->ticks.routine= 0;
663 mti->ticks.resolution= 0;
664 mti->ticks.frequency= 0;
665 mti->ticks.overhead= 0;
666 }
667
668 /*
669 Calculate overhead in terms of the timer that
670 gives the best resolution: cycles or nanoseconds.
671 I doubt it ever will be as bad as microseconds.
672 */
673 if (mti->cycles.routine)
674 best_timer= &my_timer_cycles;
675 else
676 {
677 if (mti->nanoseconds.routine)
678 {
679 best_timer= &my_timer_nanoseconds;
680 }
681 else
682 best_timer= &my_timer_microseconds;
683 }
684
685 /* best_timer_overhead = least of 20 calculations */
686 for (i= 0, best_timer_overhead= 1000000000; i < 20; ++i)
687 {
688 time1= best_timer();
689 time2= best_timer() - time1;
690 if (best_timer_overhead > time2)
691 best_timer_overhead= time2;
692 }
693 if (mti->cycles.routine)
694 my_timer_init_overhead(&mti->cycles.overhead,
695 best_timer,
696 &my_timer_cycles,
697 best_timer_overhead);
698 if (mti->nanoseconds.routine)
699 my_timer_init_overhead(&mti->nanoseconds.overhead,
700 best_timer,
701 &my_timer_nanoseconds,
702 best_timer_overhead);
703 if (mti->microseconds.routine)
704 my_timer_init_overhead(&mti->microseconds.overhead,
705 best_timer,
706 &my_timer_microseconds,
707 best_timer_overhead);
708 if (mti->milliseconds.routine)
709 my_timer_init_overhead(&mti->milliseconds.overhead,
710 best_timer,
711 &my_timer_milliseconds,
712 best_timer_overhead);
713 if (mti->ticks.routine)
714 my_timer_init_overhead(&mti->ticks.overhead,
715 best_timer,
716 &my_timer_ticks,
717 best_timer_overhead);
718
719/*
720 Calculate resolution for nanoseconds or microseconds
721 or milliseconds, by seeing if it's always divisible
722 by 1000, and by noticing how much jumping occurs.
723 For ticks, just assume the resolution is 1.
724*/
725 if (mti->cycles.routine)
726 mti->cycles.resolution= 1;
727 if (mti->nanoseconds.routine)
728 mti->nanoseconds.resolution=
729 my_timer_init_resolution(&my_timer_nanoseconds, 20000);
730 if (mti->microseconds.routine)
731 mti->microseconds.resolution=
732 my_timer_init_resolution(&my_timer_microseconds, 20);
733 if (mti->milliseconds.routine)
734 {
735 if (mti->milliseconds.routine == MY_TIMER_ROUTINE_TIME)
736 mti->milliseconds.resolution= 1000;
737 else
738 mti->milliseconds.resolution=
739 my_timer_init_resolution(&my_timer_milliseconds, 0);
740 }
741 if (mti->ticks.routine)
742 mti->ticks.resolution= 1;
743
744/*
745 Calculate cycles frequency,
746 if we have both a cycles routine and a microseconds routine.
747 In tests, this usually results in a figure within 2% of
748 what "cat /proc/cpuinfo" says.
749 If the microseconds routine is QueryPerformanceCounter
750 (i.e. it's Windows), and the microseconds frequency is >
751 500,000,000 (i.e. it's Windows Server so it uses RDTSC)
752 and the microseconds resolution is > 100 (i.e. dreadful),
753 then calculate cycles frequency = microseconds frequency.
754*/
755 if (mti->cycles.routine
756 && mti->microseconds.routine)
757 {
758 if (mti->microseconds.routine ==
759 MY_TIMER_ROUTINE_QUERYPERFORMANCECOUNTER
760 && mti->microseconds.frequency > 500000000
761 && mti->microseconds.resolution > 100)
762 mti->cycles.frequency= mti->microseconds.frequency;
763 else
764 {
765 time1= my_timer_init_frequency(mti);
766 /* Repeat once in case there was an interruption. */
767 time2= my_timer_init_frequency(mti);
768 if (time1 < time2) mti->cycles.frequency= time1;
769 else mti->cycles.frequency= time2;
770 }
771 }
772
773/*
774 Calculate milliseconds frequency =
775 (cycles-frequency/#-of-cycles) * #-of-milliseconds,
776 if we have both a milliseconds routine and a cycles
777 routine.
778 This will be inaccurate if milliseconds resolution > 1.
779 This is probably only useful when testing new platforms.
780*/
781 if (mti->milliseconds.routine
782 && mti->milliseconds.resolution < 1000
783 && mti->microseconds.routine
784 && mti->cycles.routine)
785 {
786 ulonglong time3, time4;
787 time1= my_timer_cycles();
788 time2= my_timer_milliseconds();
789 time3= time2; /* Avoids a Microsoft/IBM compiler warning */
790 for (i= 0; i < MY_TIMER_ITERATIONS * 1000; ++i)
791 {
792 time3= my_timer_milliseconds();
793 if (time3 - time2 > 10) break;
794 }
795 time4= my_timer_cycles();
796 mti->milliseconds.frequency=
797 (mti->cycles.frequency * (time3 - time2)) / (time4 - time1);
798 }
799
800/*
801 Calculate ticks.frequency =
802 (cycles-frequency/#-of-cycles * #-of-ticks,
803 if we have both a ticks routine and a cycles
804 routine,
805 This is probably only useful when testing new platforms.
806*/
807 if (mti->ticks.routine
808 && mti->microseconds.routine
809 && mti->cycles.routine)
810 {
811 ulonglong time3, time4;
812 time1= my_timer_cycles();
813 time2= my_timer_ticks();
814 time3= time2; /* Avoids a Microsoft/IBM compiler warning */
815 for (i= 0; i < MY_TIMER_ITERATIONS * 1000; ++i)
816 {
817 time3= my_timer_ticks();
818 if (time3 - time2 > 10) break;
819 }
820 time4= my_timer_cycles();
821 mti->ticks.frequency=
822 (mti->cycles.frequency * (time3 - time2)) / (time4 - time1);
823 }
824}
825
826/*
827 Additional Comments
828 -------------------
829
830 This is for timing, i.e. finding out how long a piece of code
831 takes. If you want time of day matching a wall clock, the
832 my_timer_xxx functions won't help you.
833
834 The best timer is the one with highest frequency, lowest
835 overhead, and resolution=1. The my_timer_info() routine will tell
836 you at runtime which timer that is. Usually it will be
837 my_timer_cycles() but be aware that, although it's best,
838 it has possible flaws and dangers. Depending on platform:
839 - The frequency might change. We don't test for this. It
840 happens on laptops for power saving, and on blade servers
841 for avoiding overheating.
842 - The overhead that my_timer_init() returns is the minimum.
843 In fact it could be slightly greater because of caching or
844 because you call the routine by address, as recommended.
845 It could be hugely greater if there's an interrupt.
846 - The x86 cycle counter, RDTSC doesn't "serialize". That is,
847 if there is out-of-order execution, rdtsc might be processed
848 after an instruction that logically follows it.
849 (We could force serialization, but that would be slower.)
850 - It is possible to set a flag which renders RDTSC
851 inoperative. Somebody responsible for the kernel
852 of the operating system would have to make this
853 decision. For the platforms we've tested with, there's
854 no such problem.
855 - With a multi-processor arrangement, it's possible
856 to get the cycle count from one processor in
857 thread X, and the cycle count from another processor
858 in thread Y. They may not always be in synch.
859 - You can't depend on a cycle counter being available for
860 all platforms. On Alphas, the
861 cycle counter is only 32-bit, so it would overflow quickly,
862 so we don't bother with it. On platforms that we haven't
863 tested, there might be some if/endif combination that we
864 didn't expect, or some assembler routine that we didn't
865 supply.
866
867 The recommended way to use the timer routines is:
868 1. Somewhere near the beginning of the program, call
869 my_timer_init(). This should only be necessary once,
870 although you can call it again if you think that the
871 frequency has changed.
872 2. Determine the best timer based on frequency, resolution,
873 overhead -- all things that my_timer_init() returns.
874 Preserve the address of the timer and the my_timer_into
875 results in an easily-accessible place.
876 3. Instrument the code section that you're monitoring, thus:
877 time1= my_timer_xxx();
878 Instrumented code;
879 time2= my_timer_xxx();
880 elapsed_time= (time2 - time1) - overhead;
881 If the timer is always on, then overhead is always there,
882 so don't subtract it.
883 4. Save the elapsed time, or add it to a totaller.
884 5. When all timing processes are complete, transfer the
885 saved / totalled elapsed time to permanent storage.
886 Optionally you can convert cycles to microseconds at
887 this point. (Don't do so every time you calculate
888 elapsed_time! That would waste time and lose precision!)
889 For converting cycles to microseconds, use the frequency
890 that my_timer_init() returns. You'll also need to convert
891 if the my_timer_microseconds() function is the Windows
892 function QueryPerformanceCounter(), since that's sometimes
893 a counter with precision slightly better than microseconds.
894
895 Since we recommend calls by function pointer, we supply
896 no inline functions.
897
898 Some comments on the many candidate routines for timing ...
899
900 clock() -- We don't use because it would overflow frequently.
901
902 clock_gettime() -- In tests, clock_gettime often had
903 resolution = 1000.
904
905 ftime() -- A "man ftime" says: "This function is obsolete.
906 Don't use it." On every platform that we tested, if ftime()
907 was available, then so was gettimeofday(), and gettimeofday()
908 overhead was always at least as good as ftime() overhead.
909
910 gettimeofday() -- available on most platforms, though not
911 on Windows. There is a hardware timer (sometimes a Programmable
912 Interrupt Timer or "PIT") (sometimes a "HPET") used for
913 interrupt generation. When it interrupts (a "tick" or "jiffy",
914 typically 1 centisecond) it sets xtime. For gettimeofday, a
915 Linux kernel routine usually gets xtime and then gets rdtsc
916 to get elapsed nanoseconds since the last tick. On Red Hat
917 Enterprise Linux 3, there was once a bug which caused the
918 resolution to be 1000, i.e. one centisecond. We never check
919 for time-zone change.
920
921 getnstimeofday() -- something to watch for in future Linux
922
923 do_gettimeofday() -- exists on Linux but not for "userland"
924
925 get_cycles() -- a multi-platform function, worth watching
926 in future Linux versions. But we found platform-specific
927 functions which were better documented in operating-system
928 manuals. And get_cycles() can fail or return a useless
929 32-bit number. It might be available on some platforms,
930 such as arm, which we didn't test. Using
931 "include <linux/timex.h>" or "include <asm/timex.h>"
932 can lead to autoconf or compile errors, depending on system.
933
934 rdtsc, __rdtsc, rdtscll: available for x86 with Linux BSD,
935 Solaris, Windows. See "possible flaws and dangers" comments.
936
937 times(): what we use for ticks. Should just read the last
938 (xtime) tick count, therefore should be fast, but usually
939 isn't.
940
941 GetTickCount(): we use this for my_timer_ticks() on
942 Windows. Actually it really is a tick counter, so resolution
943 >= 10 milliseconds unless you have a very old Windows version.
944 With Windows 95 or 98 or ME, timeGetTime() has better resolution than
945 GetTickCount (1ms rather than 55ms). But with Windows NT or XP or 2000,
946 they're both getting from a variable in the Process Environment Block
947 (PEB), and the variable is set by the programmable interrupt timer, so
948 the resolution is the same (usually 10-15 milliseconds). Also timeGetTime
949 is slower on old machines:
950 http://www.doumo.jp/aon-java/jsp/postgretips/tips.jsp?tips=74.
951 Also timeGetTime requires linking winmm.lib,
952 Therefore we use GetTickCount.
953 It will overflow every 49 days because the return is 32-bit.
954 There is also a GetTickCount64 but it requires Vista or Windows Server 2008.
955 (As for GetSystemTimeAsFileTime, its precision is spurious, it
956 just reads the tick variable like the other functions do.
957 However, we don't expect it to overflow every 49 days, so we
958 will prefer it for my_timer_milliseconds().)
959
960 QueryPerformanceCounter() we use this for my_timer_microseconds()
961 on Windows. 1-PIT-tick (often 1/3-microsecond). Usually reads
962 the PIT so it's slow. On some Windows variants, uses RDTSC.
963
964 GetLocalTime() this is available on Windows but we don't use it.
965
966 getclock(): documented for Alpha, but not found during tests.
967
968 mach_absolute_time() and UpTime() are recommended for Apple.
969 Inititally they weren't tried, because asm_ppc seems to do the job.
970 But now we use mach_absolute_time for nanoseconds.
971
972 Any clock-based timer can be affected by NPT (ntpd program),
973 which means:
974 - full-second correction can occur for leap second
975 - tiny corrections can occcur approimately every 11 minutes
976 (but I think they only affect the RTC which isn't the PIT).
977
978 We define "precision" as "frequency" and "high precision" is
979 "frequency better than 1 microsecond". We define "resolution"
980 as a synonym for "granularity". We define "accuracy" as
981 "closeness to the truth" as established by some authoritative
982 clock, but we can't measure accuracy.
983
984 Do not expect any of our timers to be monotonic; we
985 won't guarantee that they return constantly-increasing
986 unique numbers.
987
988 We tested with AIX, Solaris (x86 + Sparc), Linux (x86 +
989 Itanium), Windows, 64-bit Windows, QNX, FreeBSD, HPUX,
990 Irix, Mac. We didn't test with SCO.
991
992*/
993
994