+++ title = "Fixing GNU bash associative array insert speed" author = ["George M Jones"] publishDate = 2020-04-18T00:00:00-04:00 lastmod = 2022-12-05T06:10:29-05:00 tags = ["geek", "programming", "gnu", "linux"] categories = ["blog"] draft = false +++ Bash uses linear search to insert values in to associative arrays. This is all well and good for small numbers of keys. I was adding millions[^fn:1]. I went poking around the bash source code today (2020-04-18) to confirm my suspicion and gauge the difficulty of adding an option to do something more sensible. In less than a day after I reported it, there is a patch My timing code and pre and post patch timings are here: Here the steps I took and where I might go if I get serious about fixing the problem: ## 1 Get the source code {#get-the-source-code} ### 1.1 Find it {#find-it} find the homepage : A quick bit of googling lead to the homepage use git : For a minute it looked like GNU was still stuck in the bad old days of having to download a tarball and then apply a series of patches, but fortunately, it there is a git repo ### 1.2 Download it {#download-it} ```bash git clone https://git.savannah.gnu.org/git/bash.git ``` ### 1.3 Build it {#build-it} Bash follows a time honored build convention ```bash ./configure make ``` ### 1.4 Analyze it {#analyze-it} - I read the NEWS file for any indication that associative arrays has been worked on to speed up associative array insert/look-ups. No indication that they had. - I checked the git commit logs, which appear to be meaningful after Bash-4.4 patch 19. Nothing. - With judicious use of grep ("grep-find in Emacs") for "associative" and "hash_search" it turns out that associative array inserts (as all inserts) are done with use of the "hash_search" function in `hashlib.c` - has_insert() begins as follows: ```C /* Create an entry for STRING, in TABLE. If the entry already exists, then return it (unless the HASH_NOSRCH flag is set). */ BUCKET_CONTENTS * hash_insert (string, table, flags) char *string; HASH_TABLE *table; int flags; { BUCKET_CONTENTS *item; int bucket; unsigned int hv; if (table == 0) table = hash_create (0); item = (flags & HASH_NOSRCH) ? (BUCKET_CONTENTS *)NULL : hash_search (string, table, 0); ``` - and there it is, the linear search walking the list in `hash_search()` ```C /* Return a pointer to the hashed item. If the HASH_CREATE flag is passed, create a new hash table entry for STRING, otherwise return NULL. */ BUCKET_CONTENTS * hash_search (string, table, flags) const char *string; HASH_TABLE *table; int flags; { BUCKET_CONTENTS *list; int bucket; unsigned int hv; if (table == 0 || ((flags & HASH_CREATE) == 0 && HASH_ENTRIES (table) == 0)) return (BUCKET_CONTENTS *)NULL; bucket = HASH_BUCKET (string, table, hv); for (list = table->bucket_array ? table->bucket_array[bucket] : 0; list; list = list->next) { /* This is the comparison function */ if (hv == list->khash && STREQ (list->key, string)) { list->times_found++; return (list); } } ``` ## 2 Next steps {#next-steps} ### 2.1 DONE Reach out to the maintainers {#done-reach-out-to-the-maintainers} see if they would even entertain the idea of a patch ### 2.2 CANCELED Look for appropriate in-memory hash insert/lookup functions {#canceled-look-for-appropriate-in-memory-hash-insert-lookup-functions} - btrees ? ### 2.3 CANCELED Code it {#canceled-code-it} ### 2.4 CANCELED test it {#canceled-test-it} ### 2.5 CANCELED submit patch {#canceled-submit-patch} See [^fn:1]: yes, there are many better tools for this job, but not in the constrained environment where this had to run.