aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawk.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r--doc/gawk.texi181
1 files changed, 181 insertions, 0 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi
index def2a019..b4b014e7 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -13531,6 +13531,178 @@ numeric value, regardless of what the subarray itself contains,
and all subarrays are treated as being equal to each other. Their
order relative to each other is determined by their index strings.
+@subsubsection Controlling Array Scanning Order With a User-defined Function
+
+The value of @code{PROCINFO["sorted_in"]} can also be a function name
+that will let you traverse an array based on any custom criterion.
+The array elements are ordered according to the return value of this
+function. This comparison function should be defined with at least
+four arguments:
+
+@example
+function comp_func(i1, v1, i2, v2)
+@{
+ @var{compare elements 1 and 2 in some fashion}
+ @var{return < 0; 0; or > 0}
+@}
+@end example
+
+Here, @var{i1} and @var{i2} are the indices, and @var{v1} and @var{v2}
+are the corresponding values of the two elements being compared.
+Either @var{v1} or @var{v2}, or both, can be arrays if the array being
+traversed contains subarrays as values. The three possible return values
+are interpreted this way:
+
+@quotation
+* If the return value of @code{comp_func(i1, v1, i2, v2)} is less than 0,
+index @var{i1} comes before index @var{i2} during loop traversal.
+
+* If @code{comp_func(i1, v1, i2, v2)} returns 0, @var{i1} and @var{i2}
+come together but relative order with respect to each other is undefined.
+
+* If the return value of @code{comp_func(i1, v1, i2, v2)} is greater than 0,
+@var{i1} comes after @var{i2}.
+@end quotation
+
+The following comparison function can be used to scan an array in
+numerical order of the indices:
+
+@example
+function cmp_num_idx(i1, v1, i2, v2)
+@{
+ # numerical index comparison, ascending order
+ return (i1 - i2)
+@}
+@end example
+
+This function will traverse an array based on an order by element values
+rather than by indices:
+
+@example
+function cmp_str_val(i1, v1, i2, v2)
+@{
+ # string value comparison, ascending order
+ v1 = v1 ""
+ v2 = v2 ""
+ if (v1 < v2) return -1
+ return (v1 != v2)
+@}
+@end example
+
+A comparison function to make all numbers, and numeric strings without
+any leading or trailing spaces come out first during loop traversal:
+
+@example
+function cmp_num_str_val(i1, v1, i2, v2, n1, n2)
+@{
+ # numbers before string value comparison, ascending order
+ n1 = v1 + 0
+ n2 = v2 + 0
+ if (n1 == v1)
+ return (n2 == v2) ? (n1 - n2) : -1
+ else if (n2 == v2)
+ return 1
+ return (v1 < v2) ? -1 : (v1 != v2)
+@}
+@end example
+
+Consider sorting the entries of a GNU/Linux system password file
+according to login names. The following program which sorts records
+by a specific field position can be used for this purpose:
+
+@example
+# sort.awk --- simple program to sort by field position
+# field position is specified by POS
+
+function cmp_field(i1, v1, i2, v2)
+@{
+ # comparison by value, as string, and ascending order
+ return v1[POS] < v2[POS] ? -1 : (v1[POS] != v2[POS])
+@}
+
+@{
+ for (i = 1; i <= NF; i++)
+ a[NR][i] = $i
+@}
+
+END @{
+ PROCINFO["sorted_in"] = "cmp_field"
+ if (POS < 1 || POS > NF)
+ POS = 1
+ for (i in a) @{
+ for (j = 1; j <= NF; j++)
+ printf("%s%c", a[i][j], j < NF ? ":" : "")
+ print ""
+ @}
+@}
+@end example
+
+The first field in each entry of the password file is the user's login name,
+and the fields are seperated by colons. Running the program produces the
+following output:
+
+@example
+@kbd{$ gawk -vPOS=1 -F: -f sort.awk /etc/passwd}
+@print{} adm:x:3:4:adm:/var/adm:/sbin/nologin
+@print{} apache:x:48:48:Apache:/var/www:/sbin/nologin
+@print{} avahi:x:70:70:Avahi daemon:/:/sbin/nologin
+@dots{}
+@end example
+
+The comparison normally should always return the same value when given a
+specific pair of array elements as its arguments. If inconsistent
+results are returned then the order is undefined. This behavior is
+sometimes exploited to introduce random order in otherwise seemingly
+ordered data:
+
+@example
+function cmp_randomize(i1, v1, i2, v2)
+@{
+ # random order
+ return (2 - 4 * rand())
+@}
+@end example
+
+As mentioned above, the order of the indices is arbitrary if two
+elements compare equal. This is usually not a problem, but letting
+the tied elements come out in arbitrary order can be an issue, specially
+when comparing item values. The partial ordering of the equal elements
+may change during next loop traversal, if other elements are added or
+removed from the array. One way to resolve ties when comparing elements
+with otherwise equal values is to include the indices in the comparison
+rules. Note that doing this may make the loop traversal less efficient,
+so consider it only if necessary. The following comparison functions
+will force a deterministic order, and are based on the fact that the
+indices of two elements are never equal:
+
+@example
+function cmp_numeric(i1, v1, i2, v2)
+@{
+ # numerical value (and index) comparison, descending order
+ return (v1 != v2) ? (v2 - v1) : (i2 - i1)
+@}
+
+function cmp_string(i1, v1, i2, v2)
+@{
+ # string value (and index) comparison, descending order
+ v1 = v1 i1
+ v2 = v2 i2
+ return (v1 > v2) ? -1 : (v1 != v2)
+@}
+@end example
+
+@ignore
+Avoid using the term stable when describing the unpredictable behavior
+if two items compare equal. Usually, the goal of a "stable algorithm"
+is to maintain the original order of the items, which is a meaningless
+concept for a list constructed from a hash.
+@end ignore
+
+A custom comparison function can often simplify ordered loop
+traversal, and the the sky is really the limit when it comes to
+designing such a function.
+
+
When string comparisons are made during a sort, either for element
values where one or both aren't numbers or for element indices
handled as strings, the value of @code{IGNORECASE}
@@ -13992,6 +14164,12 @@ replaced with:
asort(source, dest, "descending number")
@end example
+The third argument to @code{asort()} can also be a user-defined
+function name which is used to order the array elements before
+constructing the result array.
+@xref{Scanning an Array}, for more information.
+
+
Often, what's needed is to sort on the values of the @emph{indices}
instead of the values of the elements.
To do that, use the
@@ -14479,6 +14657,9 @@ An empty string "" is the same as the default @code{"ascending string"}
for the value of @var{how}. If the @samp{source} array contains subarrays as values,
they will come out last(first) in the @samp{dest} array for @samp{ascending}(@samp{descending})
order specification. The value of @code{IGNORECASE} affects the sorting.
+The third argument can also be a user-defined function name in which case
+the value returned by the function is used to order the array elements
+before constructing the result array.
@xref{Scanning an Array}, for more information.
For example, if the contents of @code{a} are as follows: