summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Improve performance of sinf/cosf/sincosfWilco Dijkstra2018-06-2111-6/+667
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is the correct patch with both filenames and int cast fixed: This patch is a complete rewrite of sinf, cosf and sincosf. The new version is significantly faster, as well as simple and accurate. The worst-case ULP is 0.56072, maximum relative error is 0.5303p-23 over all 4 billion inputs. In non-nearest rounding modes the error is 1ULP. The algorithm uses 3 main cases: small inputs which don't need argument reduction, small inputs which need a simple range reduction and large inputs requiring complex range reduction. The code uses approximate integer comparisons to quickly decide between these cases - on some targets this may be slow, so this can be configured to use floating point comparisons. The small range reducer uses a single reduction step to handle values up to 120.0. It is fastest on targets which support inlined round instructions. The large range reducer uses integer arithmetic for simplicity. It does a 32x96 bit multiply to compute a 64-bit modulo result. This is more than accurate enough to handle the worst-case cancellation for values close to an integer multiple of PI/4. It could be further optimized, however it is already much faster than necessary. Simple benchmark showing speedup factor on AArch64 for various ranges: range 0.7853982 sinf 1.7 cosf 2.2 sincosf 2.8 range 1.570796 sinf 1.9 cosf 1.9 sincosf 2.7 range 3.141593 sinf 2.0 cosf 2.0 sincosf 3.5 range 6.283185 sinf 2.3 cosf 2.3 sincosf 4.2 range 125.6637 sinf 2.9 cosf 3.0 sincosf 5.1 range 1.1259e15 sinf 26.8 cosf 26.8 sincosf 45.2 ChangeLog: 2018-05-18 Wilco Dijkstra <wdijkstr@arm.com> * newlib/libm/common/Makefile.in: Regenerated. * newlib/libm/common/Makefile.am: Add sinf.c, cosf.c, sincosf.c sincosf.h, sincosf_data.c. Add -fbuiltin -fno-math-errno to CFLAGS. * newlib/libm/common/math_config.h: Add HAVE_FAST_ROUND, HAVE_FAST_LROUND, roundtoint, converttoint, force_eval_float, force_eval_double, eval_as_float, eval_as_double, likely, unlikely. * newlib/libm/common/cosf.c: New file. * newlib/libm/common/sinf.c: Likewise. * newlib/libm/common/sincosf.h: Likewise. * newlib/libm/common/sincosf.c: Likewise. * newlib/libm/common/sincosf_data.c: Likewise. * newlib/libm/math/sf_cos.c: Add #if to build conditionally. * newlib/libm/math/sf_sin.c: Likewise. * newlib/libm/math/wf_sincos.c: Likewise. --
* Revert "Improve performance of sinf/cosf/sincosf"Corinna Vinschen2018-06-2111-667/+6
| | | | | | This reverts commit fca80a9d1b3fa6620cdaccec6b726eef1a6530a1. Accidentally pushed a preliminary version
* libm/common/s_round.c (round): Add cast for 16-bit CPUsJon Beniston2018-06-211-1/+1
|
* Improve performance of sinf/cosf/sincosfWilco Dijkstra2018-06-1911-6/+667
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is a complete rewrite of sinf, cosf and sincosf. The new version is significantly faster, as well as simple and accurate. The worst-case ULP is 0.56072, maximum relative error is 0.5303p-23 over all 4 billion inputs. In non-nearest rounding modes the error is 1ULP. The algorithm uses 3 main cases: small inputs which don't need argument reduction, small inputs which need a simple range reduction and large inputs requiring complex range reduction. The code uses approximate integer comparisons to quickly decide between these cases - on some targets this may be slow, so this can be configured to use floating point comparisons. The small range reducer uses a single reduction step to handle values up to 120.0. It is fastest on targets which support inlined round instructions. The large range reducer uses integer arithmetic for simplicity. It does a 32x96 bit multiply to compute a 64-bit modulo result. This is more than accurate enough to handle the worst-case cancellation for values close to an integer multiple of PI/4. It could be further optimized, however it is already much faster than necessary. Simple benchmark showing speedup factor on AArch64 for various ranges: range 0.7853982 sinf 1.7 cosf 2.2 sincosf 2.8 range 1.570796 sinf 1.9 cosf 1.9 sincosf 2.7 range 3.141593 sinf 2.0 cosf 2.0 sincosf 3.5 range 6.283185 sinf 2.3 cosf 2.3 sincosf 4.2 range 125.6637 sinf 2.9 cosf 3.0 sincosf 5.1 range 1.1259e15 sinf 26.8 cosf 26.8 sincosf 45.2 ChangeLog: 2018-06-18 Wilco Dijkstra <wdijkstr@arm.com> * newlib/libm/common/Makefile.in: Regenerated. * newlib/libm/common/Makefile.am: Add sinf.c, cosf.c, sincosf.c sincosf.h, sincosf_data.c. Add -fbuiltin -fno-math-errno to CFLAGS. * newlib/libm/common/math_config.h: Add HAVE_FAST_ROUND, HAVE_FAST_LROUND, roundtoint, converttoint, force_eval_float, force_eval_double, eval_as_float, eval_as_double, likely, unlikely. * newlib/libm/common/cosf.c: New file. * newlib/libm/common/sinf.c: Likewise. * newlib/libm/common/sincosf.h: Likewise. * newlib/libm/common/sincosf.c: Likewise. * newlib/libm/common/sincosf_data.c: Likewise. * newlib/libm/math/sf_cos.c: Add #if to build conditionally. * newlib/libm/math/sf_sin.c: Likewise. * newlib/libm/math/wf_sincos.c: Likewise. --
* newlib: getopt now permutes multi-flag options correctlyThomas Kindler2018-06-182-13/+23
| | | | | | | Previously, "test 1 2 3 -a -b -c" was permuted to "test -a -b -c 1 2 3", but "test 1 2 3 -abc" was left as "test 1 2 3 -abc". Signed-off-by: Thomas Kindler <mail+newlib@t-kindler.de>
* Bump Cygwin DLL version to 2.11.0Ken Brown2018-06-071-2/+2
|
* Cygwin: Document clearenv and bump API minorKen Brown2018-06-074-1/+24
| | | | Also add earlier "What changed" items to new-features.xml.
* Cygwin: Remove workaround in environ.ccKen Brown2018-06-071-6/+4
| | | | | | | Commit ebd645e on 2001-10-03 made environ.cc:_addenv() add unneeded space at the end of the environment block to "work around problems with some buggy applications." This clutters the code and is presumably no longer needed.
* Cygwin: Implement the GNU extension clearenvKen Brown2018-06-073-0/+24
|
* Cygwin: Allow the environment pointer to be NULLKen Brown2018-06-072-17/+30
| | | | Following glibc, interpret this as meaning the environment is empty.
* Cygwin: Clarify some code in environ.ccKen Brown2018-06-071-1/+7
|
* Cygwin: Add pthread_rwlock_* fix to release notesKen Brown2018-06-011-0/+3
|
* Declare the pthread_rwlock_* functions if __cplusplus >= 201402LKen Brown2018-06-011-1/+1
| | | | Some of these functions are used in the <shared_mutex> C++ header.
* Cygwin: Add stack alignment crash after fork fix to release notesCorinna Vinschen2018-05-291-0/+4
| | | | Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* Cygwin: Fixing the math behind rounding down ch.stacklimit to page size.Sergejs Lukanihins2018-05-291-1/+1
|
* Cygwin: Add Sergejs Lukanihins to contributorsCorinna Vinschen2018-05-291-0/+1
| | | | Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* Cygwin: Add buffer underrun fix to release notesCorinna Vinschen2018-05-291-0/+3
| | | | Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* Cygwin: normalize_win32_path: Avoid buffer underrunsCorinna Vinschen2018-05-291-8/+25
| | | | | | | | | | | | | | | | Thanks to Ken Harris <Ken.Harris@mathworks.com> for the diagnosis. When backing up tail to handle a "..", the code only checked that it didn't underrun the destination buffer while removing path components. It did *not* take into account that the first backslash in the path had to be kept intact. Example path to trigger the problem: "C:\A..\..\..\B' Fix this by moving the dst pointer to the first backslash so subsequent tests cannot underrun this position. Also make sure that we always *have* a backslash. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* Cygwin: TEST only: Add a buffer underrun assertion to symlink_info::checkCorinna Vinschen2018-05-291-1/+2
| | | | | | | | | Thanks to Ken Harris <Ken.Harris@mathworks.com> for the diagnosis which led to a buffer underrun in this loop. Revert before release. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* Fix issue with malloc_extend_topJeff Johnston2018-05-291-1/+6
| | | | | | | | | - when calculating a correction to align next brk to page boundary, ensure that the correction is less than a page size - if allocating the correction fails, ensure that the top size is set to brk + sbrk_size (minus any front alignment made) Signed-off-by: Jeff Johnston <jjohnstn@redhat.com>
* fix llrint and lrint for 52 <= exponent <= 62Matthias Kannwischer2018-05-292-4/+4
|
* Fix 32-bit overflow in mktime() when time_t is 64-bits longFreddie Chopin2018-05-291-1/+1
| | | | | | | | | When converting number of days since epoch (32-bits) to seconds, calculations using 32-bit `long` overflow for years above 2038. Solve this by casting number of days to `time_t` just before final multiplication. Signed-off-by: Freddie Chopin <freddie.chopin@gmail.com>
* Use _LDBL_EQ_DBL in nexttowardf.cJeff Johnston2018-05-071-2/+2
| | | | | | | 2018-05-07 Tom de Vries <tom@codesourcery.com> * libm/common/nexttowardf.c: Use _LDBL_EQ_DBL instead of _LDBL_EQ_DOUBLE.
* libgloss: microblaze: adjust handlers to be weak.Ben Levinsky2018-05-032-2/+4
| | | | | | | Previously, hw exception handler stub and interrupt handler stub for microbaze were unable to be overwritten. Change to weak to fix this. Signed-off-by: Ben Levinsky <ben.levinsky@xilinx.com>
* Cygwin: fix build with GCC 7Yaakov Selkowitz2018-04-161-9/+5
| | | | | | | GCC 7 is able to see straight through this trick, so use a more formal method to avoid the warning. Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
* Add nvptx port.Jeff Johnston2018-04-13132-113/+7504
| | | | | | | | | - From: Cesar Philippidis <cesar@codesourcery.com> Date: Tue, 10 Apr 2018 14:43:42 -0700 Subject: [PATCH] nvptx port This port adds support for Nvidia GPU's, which are primarily used as offload accelerators in OpenACC and OpenMP.
* Cygwin: fix guard checking for current user's AuthZ contextCorinna Vinschen2018-04-121-3/+7
| | | | Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* Cygwin: add cuinof changes to release textCorinna Vinschen2018-04-111-0/+5
| | | | Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* Cygwin: cpuinfo: Use active CPU count per groupCorinna Vinschen2018-04-111-20/+36
| | | | | | | | | There are systems with a MaximumProcessorCount not reflecting the actually available CPUs. The ActiveProcessorCount is correct though. So we use ActiveProcessorCount rather than MaximumProcessorCount per group to set group affinity correctly. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* Cygwin: wincap: expose more SYSTEM_INFO members and use as appropriateCorinna Vinschen2018-04-113-12/+10
| | | | Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* Cygwin: cpuinfo: report L3 cache on Intel CPUsCorinna Vinschen2018-04-111-2/+3
| | | | Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* Cygwin: add strtod fix to release notesCorinna Vinschen2018-04-091-0/+4
| | | | Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* strtod: Convert 64 bit double to 64 bit int during computationCorinna Vinschen2018-04-091-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The gdtoa implementation uses the type long, defined as Long, in lots of code. For historical reason newlib defines Long as int32_t instead. This works fine, as long as floating point exceptions are not enabled. The conversion to 32 bit int can lead to a FE_INVALID situation. Example: const char *str = "121645100408832000.0"; char *ptr; feenableexcept (FE_INVALID); strtod (str, &ptr); This leads to the following situation in strtod double aadj; Long L; [...] L = (Long)aadj; For instance, on x86_64 the code here is cvttsd2si %xmm0,%eax At this point, aadj is 2529648000.0 in our example. The conversion to 32 bit %eax results in a negative int value, thus the conversion is invalid. With feenableexcept (FE_INVALID), a SIGFPE is raised. Fix this by always using 64 bit ints here if double is not a 32 bit type to avoid this type of FP exceptions. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* newlib: fix iswupper_l in !_MB_CAPABLE caseCorinna Vinschen2018-03-271-1/+1
| | | | Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* Cygwin: AF_LOCAL: fix identifing abstract sockets in FS-related functionsCorinna Vinschen2018-03-261-6/+6
| | | | Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* Cygwin: fix typo in accept on inet and local socketsCorinna Vinschen2018-03-262-2/+2
| | | | Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* comments to document struct caseconv_entryThomas Wolff2018-03-261-3/+33
| | | | | explain design of compact (packed) struct caseconv_entry, in case it needs to be modified for future Unicode versions
* newlib: fix indentation in toulowerThomas Wolff2018-03-261-10/+10
| | | | Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* Cygwin: delete /dev/kmsg and thus fhandler_mailslot without substitutionCorinna Vinschen2018-03-2513-1042/+623
| | | | Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* Cygwin: AF_UNIX: Redesign various aspectsCorinna Vinschen2018-03-182-190/+442
| | | | | | | | | | | | | | | | | * Change set_socket_type/get_socket_type to virtual methods * Move various variables into af_unix_shmem_t * Change sun_name_t to match new usage pattern * Move shut_state definition and add a name for the 0 value * Allow marking packet as administrative packet. This allows filtering out info packets exchange between peers and tweak data accordingly. * Rename send_my_name to send_sock_info and send credentials if not called from bind (so the socket was already connected) * Handle SO_PASSCRED in setsockopt/getsockopt * Add input size checking to setsockopt/getsockopt * Use NT functions where appropriate Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* Cygwin: ntdll.h: Define FSCTL_PIPE_PEEK and NtWaitForSingleObjectCorinna Vinschen2018-03-181-1/+4
| | | | Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* Cygwin: AF_UNIX: Add state_lock to guard manipulating shared state infoCorinna Vinschen2018-03-182-0/+9
| | | | Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* Cygwin: AF_UNIX: Use spinlock rather than SRWLOCKsCorinna Vinschen2018-03-181-17/+39
| | | | | | | | We need to share socket info between threads *and* processes. SRWLOCKs are single-process only, unfortunately. Provide a sharable low-profile spinlock instead. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* Cygwin: tags: drop _EXFUN regexCorinna Vinschen2018-03-171-2/+0
| | | | | | _EXFUN has been removed a while back Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* Reduce qsort stack consumptionHakan Lindqvist2018-03-161-6/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Classical function call recursion wastes a lot of stack space. Each recursion level requires a full stack frame comprising all local variables and additional space as dictated by the processor calling convention. This implementation instead stores the variables that are unique for each recursion level in a parameter stack array, and uses iteration to emulate recursion. Function call recursion is not used until the array is full. To ensure the stack consumption isn't worsened by this design, the size of the parameter stack array is chosen to be similar to the stack frame excluding the array. Each function call recursion level can handle 8 iterative recursion levels. Stack consumption will worsen when sorting tiny arrays that do not need recursion (of 6 elements or less). It will be about equal for up to 15 elements, and be an improvement for larger arrays. The best case improvement is a stack size reduction down to about one quarter of the stack consumption before the change. A design where the parameter stack array is large enough for the worst case recursion level was rejected because it would worsen the stack consumption when sorting arrays smaller than about 1500 elements. The worst case is 31 levels on a 32-bit system. A design with a dynamic parameter array size was rejected because of limitations in some compilers.
* Ensure qsort recursion depth is boundedHakan Lindqvist2018-03-161-12/+56
| | | | | | | | | | | | | | The qsort algorithm splits the input array in three parts. The left and right parts may need further sorting. One of them is sorted by recursion, the other by iteration. This update ensures that it is the smaller part that is chosen for recursion. By choosing the smaller part, each recursion level will handle less than half the array of the previous recursion level. Hence the recursion depth is bounded to be less than log2(n) i.e. 1 level per significant bit in the array size n. The update also includes code comments explaining the algorithm.
* Correct prototypes of pthread_mutex_getprioceiling() and pthread_setschedparam()Joel Sherrill2018-03-151-2/+2
|
* [arm] Fix syscalls.c for newlib embedded syscalls buildsRichard Earnshaw2018-03-151-119/+126
| | | | | | | | | | | | | | | | | | Newlib has a build configuration where syscalls can be directly embedded in the newlib library rather than relying on libgloss. This configuration was broken recently by an update to the libgloss support for Arm that was not propagated to the syscalls interface in newlib itself. This patch restores the build. It's essentially a copy of https://sourceware.org/ml/newlib/2018/msg00128.html but there are some other minor cleanups and changes that I've made at the same time. None of those cleanups affect functionality. The prototypes of the following functions have been updated: _link, _sbrk, _getpid, _write, _swiwrite, _lseek, _swilseek, _read and _swiread. Signed-off-by: Richard Earnshaw <Richard.Earnshaw@arm.com>
* ssp: fix wchar.h with -std=c99Yaakov Selkowitz2018-03-141-2/+2
| | | | | | https://sourceware.org/ml/newlib/2018/msg00261.html Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
* Fix alloc_align and alloc_size macros for multiple argumentsYaakov Selkowitz2018-03-142-4/+4
| | | | | | | | https://sourceware.org/ml/newlib/2018/msg00263.html This is a follow-up to commit 4564b30f331a067e71b25308ac7c8a85ceb4b122. Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>