Archive

Archive for December, 2013

Starting with concurrency and a “Hello world” using pthreads

December 31, 2013 No comments


Nowadays is very common to see dual-core or quad-core CPUs, but there are still systems with one single core.
In this post, I want to about the very basis of concurrency, a brief introduction to this world, so some descriptions will be so so simple, but real world is a little bit more complex.

In multi-core or multi-CPU systems, the operating system (OS) distributes tasks within our CPUs, and we can run several tasks at the same time, but we have been enjoying multitasking for decades, long before multi-core appeared, so, how is it possible? Let’s think about single-core systems, they can only perform one task, but the operating system distributes the CPU control along all tasks very fast so it makes us think all of them are running at the same time. So single-core CPUs usually have one execution thread, dual-core system, will have two execution threads, so they can perform two tasks at the same time. But don’t worry about how it is distributed, because the operating system does it for you.

We’ve seen the OS is smart enough to take advantage of our CPUs or CPU cores, so does it have any sense to make our application use several cores, or make our application run two things at the same time?
Of course it does, let’s see some examples:

  • Imagine a GUI program performing a heavy task. While this task is running, we may want to give the user some little control to cancel the process, for example, or to carry on using our software. We can run the heavy task in one thread and the GUI in another one.
  • A server which attends several clients simultaneously.
  • A program which does intense calculations, if we have dual-core, it can be twice faster

In the last case, we must think about something a little bit. Distributing tasks in time takes some time for the OS, so we must make sure the time we save (separating the process in several threads) is bigger than the time the OS wastes managing the new tasks and sometimes we must put all the results of all individual threads together (and it takes time too).
We may think it will be valid only for multi-core processors but the time taken by our tasks may vary depending on what we are doing, for example, calculations take all CPU Time, but if we access external devices (keyboard, screen, hard disk, USB, etc), even access a file, it makes our process wait for the data and while we are waiting, the CPU is wasting time, so we can take this time to perform calculations for some other process, and it is valid even on single-core CPUs. Sometimes running a tasks in two threads can squeeze our CPU better than one thread using CPU while the other thread is just waiting for data an vice-versa.

Ok, before playing a little bit with it we must know there are some ways to do that. We can create several processes, it’s like running some programs at the same time, and the other way is creating threads. The difference is that processes must store in RAM memory, data and stack among other things, and some threads can share code and data, so only stack and a little information about execution status belongs to each thread exclusively. We can say process initiates also a single execution thread.

Let’s code a bit, we will use pthreads (POSIX threads):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

void *newtask(void *null)
{
   printf("Hello world! I'm a new thread\n");
   sleep(1);
   printf("Goodbye world! I'm a thread about to die\n");
   pthread_exit(NULL);
}

int main (int argc, char *argv[])
{
   pthread_t thread;
   int rc;

   printf ("Main process just started.\n");
   rc = pthread_create(&thread, NULL, newtask, NULL);
   if (rc)
     {
       printf("ERROR in pthread_create(): %d\n", rc);
       exit(-1);
     }

   printf ("Main process about to finish.\n");
   /* Last thing that main() should do */
   pthread_exit(NULL);
}

This example can be compiled including pthread library, this way:

$ gcc -o onethread onethread.c -lpthread

We are just executing a program that creates another thread that performs a task, this task is writing a couple of messages on screen and wait a bit between them. We can see a result like this:

Main process just started.
Main process about to finish.
Hello world! I’m a new thread
[ wait for a second ]
Goodbye world! I’m a thread about to die

The first two messages belongs to the main thread, one when it stats and the other one when it ends, but before ending we are invoking the other thread that writes the other two messages. As we can see, each thread is executed independently, but to see it a bit more clearer, let’s see the next code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

void *newtask(void *null)
{
   printf("Hello world! I'm a new thread\n");
   int i;

   for (i=0; i<10;++i)
     {
       printf (" THREAD: %d\n", i);
       usleep(60);
     }
   printf("Goodbye world! I'm a thread about to die\n");
   pthread_exit(NULL);
}

int main (int argc, char *argv[])
{
   pthread_t thread;
   int rc;
   int i;

   printf ("Main process just started.\n");
   rc = pthread_create(&thread, NULL, newtask, NULL);
   if (rc)
     {
       printf("ERROR in pthread_create(): %d\n", rc);
       exit(-1);
     }

   for (i=0; i<10;++i)
     {
       usleep(100);
       printf (" MAIN: %d\n", i);
     }
   printf ("Main process about to finish.\n");
   /* Last thing that main() should do */
   pthread_exit(NULL);
}

What we’ve done is that the main thread (MAIN) and secondary thread (THREAD) will write on screen several messages inside a for loop. They will be counting from 0 to 9, but waiting a little bit between each writing (instead of doing heavier time eater tasks we simulate them with sleeps).

We can see a execution result like this:

Main process just started.
Hello world! I’m a new thread
THREAD: 0
THREAD: 1
MAIN: 0
THREAD: 2
MAIN: 1
THREAD: 3
THREAD: 4
MAIN: 2
THREAD: 5
MAIN: 3
THREAD: 6
MAIN: 4
THREAD: 7
THREAD: 8
MAIN: 5
THREAD: 9
MAIN: 6
Goodbye world! I’m a thread about to die
MAIN: 7
MAIN: 8
MAIN: 9
Main process about to finish.

Messages are written alternatively both MAIN and THREAD, like interlaced, now we can see they are both running concurrently, at the same time. We can start making multi-thread software in C.

Photo: Toastyken (Flickr) CC-by

MySQL loops and cursors with examples

December 16, 2013 No comments

First of all, you may not abuse of these techniques, and you must use them only when necessary. Most of the times you can find a faster solution.

Let’s see an easy loop, similar to a for loop, we will make a SELECT x with x from 1 to 9:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
DELIMITER $$
CREATE PROCEDURE simple_loop ( )
BEGIN
  DECLARE counter BIGINT DEFAULT 0;
 
  my_loop: LOOP
    SET counter=counter+1;

    IF counter=10 THEN
      LEAVE my_loop;
    END IF;

    SELECT counter;

  END LOOP my_loop;
END$$
DELIMITER ;

if we do:

1
CALL simple_loop();

We’ll see something like

+———+
| counter |
+———+
| 1 |
+———+
1 row in set (0.01 sec)

+———+
| counter |
+———+
| 2 |
+———+
1 row in set (0.01 sec)

+———+
| counter |
+———+
| 3 |
+———+
1 row in set (0.01 sec)

+———+
| counter |
+———+
| 4 |
+———+
1 row in set (0.01 sec)

+———+
| counter |
+———+
| 5 |
+———+
1 row in set (0.01 sec)

+———+
| counter |
+———+
| 6 |
+———+
1 row in set (0.01 sec)

+———+
| counter |
+———+
| 7 |
+———+
1 row in set (0.01 sec)

+———+
| counter |
+———+
| 8 |
+———+
1 row in set (0.01 sec)

+———+
| counter |
+———+
| 9 |
+———+
1 row in set (0.01 sec)

Query OK, 0 rows affected (0.01 sec)

The code we iterate is between LOOP…END LOOP. What we see right before (my_loop) is a label, just to give a name to that loop, and reference it. In this example, we simply increment counter variable, and with a condition we can exit the loop when this variable reaches the value 10. We won’t see the 10 because we leave the loop before printing this number.

Let’s do something a bit more complicated, we will record scores in a game, this game will be a skill test to do in the shortest time as possible, hopscotch jumping and with obstacles. There are two types of penalty, touch the ground with both feet and hitting an obstacle. At the end of the tests the results will be written on a table, and a score will be assigned, this score will be also stored on that table to avoid calculating it each time.

1
2
3
4
5
6
7
8
9
CREATE TABLE Runners (
    Runner_id BIGINT NOT NULL AUTO_INCREMENT,
    Name VARCHAR(120) NOT NULL,
    Time BIGINT NOT NULL,
    Penalty1 BIGINT NOT NULL,
    Penalty2 BIGINT NOT NULL,
    Points BIGINT,
    PRIMARY KEY (Runner_id)
) ENGINE=InnoDB DEFAULT CHARSET=UTF8;

Now, enter some test information:

1
2
3
4
5
6
7
INSERT INTO Runners VALUES (NULL, 'Michael', 123, 5, 2, NULL);
INSERT INTO Runners VALUES (NULL, 'Sarah', 83, 3, 3, NULL);
INSERT INTO Runners VALUES (NULL, 'John', 323, 1, 1, NULL);
INSERT INTO Runners VALUES (NULL, 'Ramon', 100, 8, 4, NULL);
INSERT INTO Runners VALUES (NULL, 'Andrew', 143, 4, 3, NULL);
INSERT INTO Runners VALUES (NULL, 'Antoine', 199, 3, 2, NULL);
INSERT INTO Runners VALUES (NULL, 'David', 101, 2, 1, NULL);

The first thing to do is a stored procedure with the basic loop, and a SELECT in each iteration, just to see that we are doing ok. (I will explain the code later):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
DROP PROCEDURE IF EXISTS cursorTest;
DELIMITER $$
CREATE PROCEDURE cursorTest (
) BEGIN
-- Variables where we will store what the SELECT returns
  DECLARE v_name VARCHAR(120);
  DECLARE v_time BIGINT;
  DECLARE v_penalty1 BIGINT;
  DECLARE v_penalty2 BIGINT;
-- Variable to control the end of the loop
  DECLARE fin INTEGER DEFAULT 0;

-- El SELECT to query
  DECLARE runners_cursor CURSOR FOR
    SELECT Name, Time, Penalty1, Penalty2 FROM Runners;

-- Exit condition
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET fin=1;

  OPEN runners_cursor;
  get_runners: LOOP
    FETCH runners_cursor INTO v_name, v_time, v_penalty1, v_penalty2;
    IF fin = 1 THEN
       LEAVE get_runners;
    END IF;

  SELECT v_name, v_time, v_penalty1, v_penalty2;

  END LOOP get_runners;

  CLOSE runners_cursor;
END$$
DELIMITER ;

Must have in mind something. With the CURSOR we will go through the result of a SELECT statement, and we’ll have to store the values returned for each row in variables (that’s why I declared v_name, v_time, v_penalty1 and v_penalty2). In the end, each iteration will do a SELECT Name, Time, Penalty1, Penalty2 INTO v_name, v_time, v_penalty1, v_penalty2 WHERE …, so we’ll have these variables filled with the data obtained for each row in each iteration. That’s DECLARE xxx CURSOR FOR SELECT …

We must put a finish condition, usually we will finish the loop when no more results are found, that’s why we use DECLARE CONTINUE HANDLER FOR NOT FOUND SET fin=1, In this case, we will set fin to 1 when no more rows are found.

Inside the loop, we will test the value of this variable and LEAVE the loop if fin is set to 1, it’s automatically done when we reach the condition given before.

One step more, let’s create a function to assign the scores to each one of the runners with a formula. For example, if Time is the time taken in seconds, 500-Time will be the initial score and we will take away 5*penalty+3*penalty2. So:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
DROP FUNCTION IF EXISTS calculate_runner_points;
DELIMITER $$
CREATE FUNCTION calculate_runner_points (
  In_time BIGINT,
  In_penalty1 BIGINT,
  In_penalty2 BIGINT
) RETURNS BIGINT
BEGIN
  DECLARE points BIGINT;
 
  SET points = 500 - In_time - In_penalty1*5 - In_penalty2*3;
 
  RETURN points;
END$$
DELIMITER ;

Now, this is the code to calculate the final score of all players:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
DROP PROCEDURE IF EXISTS calculate_all_points;
DELIMITER $$
CREATE PROCEDURE calculate_all_points (
) BEGIN
-- Variables where we will store what the SELECT returns
  DECLARE v_name VARCHAR(120);
  DECLARE v_time BIGINT;
  DECLARE v_penalty1 BIGINT;
  DECLARE v_penalty2 BIGINT;
  DECLARE v_runner_id BIGINT;
-- Variable to control the end of the loop
  DECLARE fin INTEGER DEFAULT 0;

-- SELECT to query
  DECLARE runners_cursor CURSOR FOR
    SELECT Runner_id, Name, Time, Penalty1, Penalty2 FROM Runners;

-- Exit condition
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET fin=1;

  OPEN runners_cursor;
  get_runners: LOOP
    FETCH runners_cursor INTO v_runner_id, v_name, v_time, v_penalty1,
v_penalty2;
    IF fin = 1 THEN
       LEAVE get_runners;
    END IF;

  UPDATE Runners SET Points=calculate_runner_points(v_time, v_penalty1,
v_penalty2) WHERE Runner_id=v_runner_id;

  END LOOP get_runners;

  CLOSE runners_cursor;
END$$
DELIMITER ;

Of course, as I said in the beginning of this post we must see if there is no other method to do the same. I know using MySQL loops is amazing, as I always say with regex, but there may be faster methods, like:

1
UPDATE Runners SET Points=calculate_runner_points(Time, Penalty1, Penalty2);

But, we can do more things with the loop, for example, if the time is greater than 250, we can swap penalties, editing the loop, with a IF statement, we could also do it in the procedure, but that’s an option.

A small example (or not so small) come to mi mind. Imagine we have a user system. Each user has its information stored in three tables: one for login, password and access info; another one for profile data and the last one for permissions. In this cas, all tables except permission table has one row per user. But as the permission table stores the access level for a user and another object (maybe a web page), and one single user can have permissions over several pages, there may be some rows for one user.

We will also have a table to store messages between users.

We will also have pages, these pages will be objects in our system, and users can see , edit, create derivates and delete them (if they are allowed to). A hierarchy may exist with pages, so we can have child pages, but when we create a new page:

  • If a user is allowed to edit a parent page, will be allowed to edit the new child page
  • If a user could create derivatives in a parent, will be also allowed in the child
  • If a user was allowd to edit and create derivatives in the parent page, will also be allowed to remove the child page.
  • We will have to send a message to the user telling him or her, about the new page and what is allowed to do with it.
  • We also have these procedures and functions:
    • can_create_derivatives(user, page) – It will return TRUE if the user can create derivatives
    • can_edit(user, page) – It will do the same, but with edit permission
    • new_permission(user, page, permission) – It will allow the user to do what permission says with this page
    • message(from, to, message) – Send a message to a user.

Functions can_create_derivatives() and can_edit() seem to be easy to understand, but what they do internally is far more complicated. It’s done by a colleague and I don’t want to fight with it. The same with new_permission() (it can insert rows or updates existent ones) or message(), it can send notifications and create a job to send real e-mail, so our procedure may be something like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
DROP PROCEDURE IF EXISTS create_page;
DELIMITER $$
CREATE PROCEDURE create_page (
  IN in_name VARCHAR(120),
  IN in_parent BIGINT
) BEGIN
  DECLARE v_user_id BIGINT;
  DECLARE v_create_derivatives TINYINT;
  DECLARE v_object_id BIGINT;
  DECLARE v_msg TEXT;

-- Variable to control the end of the LOOP
  DECLARE fin INTEGER DEFAULT 0;

-- The SELECT
  DECLARE users_cursor CURSOR FOR
    SELECT User_id FROM Users;

-- Exit condition
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET fin=1;

  INSERT INTO Pages (Name, Parent) VALUES (in_name, in_parent);
  SELECT LAST_INSERT_ID() INTO v_object_id;

  OPEN users_cursor;
  get_users: LOOP
    FETCH users_cursor INTO v_user_id;

    IF fin = 1 THEN
       LEAVE get_users;
    END IF;

    SET v_msg = CONCAT('New permissions on page ',in_name,': ');  

    IF can_create_derivatives(v_user_id, in_parent) THEN
      CALL new_permission(v_user_id, v_object_id, 'derivatives');
      SET v_msg = CONCAT(v_msg, 'Create derivatives ');
      SET v_create_derivatives=1;
    ELSE
      SET v_create_derivatives=0;
    END IF;

    IF can_edit(v_user_id, in_parent) THEN
      CALL new_permission(v_user_id, v_object_id, 'edit');
      SET v_msg = CONCAT(v_msg, 'Edit ');
      IF v_create_derivatives=1 THEN
         CALL new_permission(v_user_id, v_object_id, 'remove');
         SET v_msg = CONCAT(v_msg, 'Remove ');
      END IF;
    END IF;

    CALL message(1, v_user_id, v_msg);
  END LOOP get_users;

  CLOSE users_cursor;
END$$
DELIMITER ;

I’m sure I can give you better examples in the future. I accept suggestions in comments, so I can recreate them in my computer and solve them in future posts.

Photo: Kellie Bollaret (Flickr CC) Licensed CC-by

Top