LJ Archive

Letters

Hash Tables

Regarding Mihalis Tsoukalos' article “Hash Tables—Theory and Practice” in the August 2015 issue: thank you very much for your interesting article about this important data structure. It was well written and illustrated with programming examples. However, the results of ht1.c and ht2.c (execution time vs. table size) are not surprising. Since the hash function of ht1.c uses only the first character of the key, there are effectively no more than 26 different outcomes of the hash function. Since each result maps to one bucket (which are all different in the best case of no collisions), no more than 26 buckets will be employed. Once tablesize exceeds 26, there will be no increase in performance at all (26 being the number of uppercase ASCII characters).

As for ht2.c, the same effect is expected at a table size of about 500, assuming a maximum word length of, say, 20 and a maximum contribution of 26 per character, yielding roughly 26 x 20 different outcomes of the hash function. In practice, the limit should be observed even earlier. This is what we see in the diagrams.

Better results with respect to table size scaling can be expected using the following hash function:

int hash_3(char *str, int tablesize)
{
#define MAX_LEN     6
static int v[MAX_LEN] = {1, 26, 26*26, 26*26*26,
↪26*26*26*26, 26*26*26*26*26};
int value, i;
if (str==NULL)
{
return -1;
}
value=0;
for (i=0; *str && i<MAX_LEN; i++, str++)
{
value += (toupper(*str) - 'A' + 1) * v[i];
}
return value % tablesize;
}

Provided a large table size, this function has up to 0.3 billion different outcomes (266), at a slightly higher computation cost. Increasing MAX_LEN beyond 6 is possible but demands 64-bit integer arithmetic.


Florian Gagel

Slay Some Orcs

Thanks for Shawn Powers' “Slay Some Orcs” article in the August 2015 issue of Linux Journal.

In addition to the mentioned Steam and Desura on-line game stores, I also suggest looking at GOG (www.gog.com). Although most of its titles are old (GOG stands for Good Old Games), there are some current games including Linux ones. If both GOG and Steam have a game, I usually purchase it from GOG because all of its games are DRM-free and you don't need to run a distribution client (like Steam) to run the game.

Three games I would add to your list:

1) Pillars of Eternity (Steam and GOG): a modern strategy/adventure game in the style of Baulder's Gate.

2) Middle Earth: Shadow of Mordor (Steam): the Linux version is out. With an article title like “Slay Some Orcs”, how could this one be missed? Lots of Orc to kill here. (But there's an out: the article was written before the Linux version release.)

3) Dominions 4: Thrones of Ascension (Desura): old-school, very deep, turn-based strategy.

I'd also like to suggest a monthly half-page “Linux Games Update” column.


Richard

Richard, thanks for the links and also for the column idea. If nothing else, I'll try to be more game-aware in my UpFront pieces. Please don't hesitate to send me game recommendations.—Shawn Powers

Using MySQL for Load Balancing and Job Control under Slurm

Regarding Steven Buczkowski's “Using MySQL for Load Balancing and Job Control under Slurm” in the August 2015 issue: I have a question on the queuing system to process big data. If you're using MySQL on different nodes, how are you dealing with each system's data locking? Or, is the DB on the head node and you're just running via client access on the nodes? It would be good to hear your setup, but I like the idea.


Gary

Steven Buczkowski replies: Ahhh, you got to the problem faster than I did. I was still a fairly naive slurm user when I originally put this idea together, and it turns out, I was trying to populate the db under slurm (that is, on multiple nodes, simultaneously...). Somehow, I managed to escape having that come back and bite me as resource contention for some time. I don't understand how I managed that for so long.

When I did start having trouble, I tried a number of mutex-like things to let the first instance of my process populate the db while the other spawned instances blocked until this was done: all very ugly and very inefficient use of the cluster resources.

In the end, the cleanest approach was to do the db population outside the sbatch call (that is, on the head node) and start the parallel processing only once the stack table was built. The problem here is that I was keying the table off the slurm process ID, and on the head node, we don't have access to that yet. The solution was to have a wrapper script that first generates a random number to use as a slurm procID proxy to build the table, then kicks off the table build with that ID, then starts sbatch and passes that random number in to my processing script as either a command-line parameter or as an environment variable via sbatch's --export keyword. The table pop process keys off this random number, but the table updates when closing out a process can add the real procID if you want that for accounting purposes.

Welcome to Issue 10000000? Really?

I'm probably not the first one to write about it, but it would seem that Shawn Powers erred when he said “Welcome to issue 10000000” (August 2015 issue). He most likely meant “Welcome to issue 100000000.” It's okay, we all make miskates...uh mistakes, I meant.


Pierre

Binary Fail

No matter how many times I count them, I can find only seven zeros behind the one. I read that as binary 10000000 or hexadecimal 80, aka decimal 128. But the cover says Issue 256. I could not believe that slipped by so I counted them again—still seven. Ooops.


Bonnie Packert

Pierre and Bonnie, never before have I gotten so much e-mail as when I miscounted zeros in issue 256! Truly, people e-mailed me, Tweeted at me and even instant-messaged me! Of course, you are both correct, and I made a silly error. I guess if I'm going to make a mistake, I'd rather it be something silly like that as opposed to something in an instructional article.

But yes, I totally dropped the ball on that one—or maybe the zero!—Shawn Powers

Gettin' Sticky with It

Regarding Shawn Powers' “Gettin' Sticky with It” in the June 2015 UpFront section: just a comment about listing the attributes of a directory. In the article, the following command was given:

ls -l / | grep tmp

You could use:

ls -ld /tmp

The -d option makes it list the directory and not the contents.

Linux Journal is the only mag that I've continued enjoying since its beginning. Thanks for the good work of keeping Linux and its special features in focus.


Dan

Cool, thanks Dan! I never knew about the -d flag. Also thanks for the great compliment!—Shawn Powers

“What Does Fast Mean?”

I think you made a mistake here [see Reuven M. Lerner's “What Does Fast Mean?” in the September 2015 issue]:

Let's now replace that high-speed wire with a satellite link. Suddenly, because it takes time to send the data from one network to another, you have decreased your latency while keeping your bandwidth identical.

Shouldn't this be increased your latency? Other then that, it was a great read.


Vic

Reuven M. Lerner replies: Good catch! You're totally right.

LJ Archive