Archive

Archive for the ‘Strings’ Category

Regular expressions in C++ with Glib::Regex

November 30, 2013 No comments

In our programming life, there are some “before and after”s, and one of them is when we learn to use regular expressions… and they are like Twitter, you start with it, follow some famous people and a couple of friends, write a test tweet and a “how wonderful life is” tweet, and forget it. But when it’s your time, you can’t stop using it. So regular expressions or “regex” are the same, when you discover them you say: “Oh! It’s nice!”, or “I could do a lot with it”, but after some time (maybe weeks, months or years), when you have a strings problem, the first solution you try is a regex.

One of the common commands using regex is grep, of course this system is too good to be used only in one place. This is the reason why lots of programming languages have functions or classes to use them easily, for example, PHP had ereg_* in the past, now we use preg_*, in Javascript we use RegExp class, in Java we can even use the String class to parse regex, and so on.

But when working in C++ we don’t have native solutions for that, at least in std, ok C++11 has, but we don’t always have a C++11 compiler ready. We have to use libraries as Boost or Glib to support them, if we don’t want to do it by hand.

We are going to do it with Glib. Imagine we are making a template. Some keywords will be replaced with calculated values. Keywords will begin and end with a %, so we want to get the position of these keywords, and which keyword has been discovered:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <glibmm/regex.h>
#include <glibmm/ustring.h>
#include <iostream>

using namespace std;
using namespace Glib;

int main()
{
  Glib::ustring str1 = "Hi %name%, your friend %friend% told me you are a %job%.";
  cout << "Original string: "<< str1 << endl;

  Glib::RefPtr<Regex> myr = Regex::create("%[a-z]*%");
  MatchInfo minfo;

  myr->match(str1, minfo);
  int start, end;
  int i=0;

  while (minfo.matches())
    {
      cout << "Word: " << minfo.fetch(0)<<endl;
      if (minfo.fetch_pos(0, start, end))
        {
          cout << "   Start:  "<<start<<endl<<"   End: "<<end<<endl;
        }
      minfo.next();
      ++i;
    }
    cout << "Occurrences: "<<i<<endl;
}

To compile it, we must have glibmm installed, then:

$ g++ -o regex1 regex1.cpp `pkg-config –libs –cflags glibmm-2.4`

In this piece of code, we can see, the regex “%[a-z]*%” has been applied, so we can get lowercase letters from a to z enclosed between % symbols. In the sample string we’ve found 3 occurrences, printing on screen start position, end position and the matched string for each one.

It can be enough for many cases, but this example will return strings like %name% or %friend%, which in certain cases it is not useful, we want name or friend, ok, we can handle that, but we can get those values with regex too applying a parenthesis in the regex, enclosing what we want, this way: “%([a-z]*)%”, in other words, we are interested in this part of the string. But we will obtain several values. One of them will be the old string, the entire match and not only the part we are interested in. But if we change the code a little bit, we’ll be able to get it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <glibmm/regex.h>
#include <glibmm/ustring.h>
#include <iostream>

using namespace std;
using namespace Glib;

int main()
{
  Glib::ustring str1 = "Hi %name%, your friend %friend% told me you are a %job%.";
  cout << "Original string: "<< str1 << endl;

  Glib::RefPtr<Regex> myr = Regex::create("%([a-z]*)%");
  MatchInfo minfo;

  myr->match(str1, minfo);
  int start, end;
  int i=0;

  while (minfo.matches())
    {
      cout << "Match "<< i+1 << ": "<<endl;

      for (unsigned j = 0; j< minfo.get_match_count(); ++j)
    {
      cout << "Word ("<<j<<"): " << minfo.fetch(j)<<endl;
      if (minfo.fetch_pos(j, start, end))
        {
          cout << "   Start:  "<<start<<endl<<"   End: "<<end<<endl;
        }
    }
      minfo.next();
      ++i;
    }
    cout << "Occurrences: "<<i<<endl;
}

In this case, we are iterating get_match_count() times, so we will get the number of strings returned by each match of the expression (expressions can be so complex, and we can add more parenthesis). Calling minfo.fetch(1) we will get the strings: “name”, “friend” and “job”.

But, to write a better example, let’s parse a simple XML tag. As regex we are taking: “<([\\w:]*)( [^<>]*)?>([^<>]*)</\\1>“, that means:

  • Symbol <
  • a word, letters and numbers
  • maybe a space and several characters, different than < and >
  • Symbol >
  • Several characters. Neither < nor >
  • Symbols < and /
  • The same word found in the beginning
  • Symbol >

Then our text string will be: “Sample text

And the result will be:

Original String: <MyTag id=”123″>Sample text</MyTag>
Match 1:
Word (0): <MyTag id=”123″>Sample Text</MyTag>
Start: 0
End: 51
Word (1): MyTag
Start: 1
End: 6
Word (2): id=”123″
Start: 6
End: 15
Word (3): Sample text
Start: 16
End: 43
Occurrences: 1

(Note: Start and End position, won’t match reality, they were taken by another example)

So, with this little regex we have parsed this XML tag, this would be useful in little projects.

Foto: li xiang (Flickr) CC-by

stermp.h, trying to port conio.h to Linux

October 22, 2013 No comments

This time I want to rescue an old project. I started it long ago. These days I’ve been reading some source codes in facebook using conio.h so I hope this could be interesting for anyone.

Of course there are some libraries that allow us to to write strings in colors and get/set position on screen and keys without echoing and pressing Enter, or we can do it without them, using ANSI codes directly but we would have to do a lot of changes in the source code.

I tried to keep the name of the functions the same, we use:

  • clrscr() : To clean screen
  • textbackground(color) : To change background color
  • textcolor(color) : To change text color
  • gotoxy(x,y) : Go to specific position
  • wherex() : To get X position
  • wherey() : To get Y position
  • getch() : To get a key press without ENTER
  • getche() : Like getch but echoing character on screen
  • kbhit() : To know if a key has been pressed without stopping execution. Returns true or false

We also have some additional stuff like:

  • wherexy() : Returns X,Y position in a struct
  • kbhit2() : Gets a key code if pressed without stopping execution
  • kbhit_pre() : Prepares to do lots of kbhits() to increase performance
  • restore_terminal_color() : Restores terminal color
  • screenheight() : Gets screen height.
  • screenwidth() : Gets screen width.

I tried also to keep color names the same. Let’s see an example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#include <stdio.h>
#include <time.h>
#include "stermp.h"

void update_time()
{
  struct tm *tm;
  time_t _time;
  char text[50];
  textcolor(YELLOW);
  _time=time(NULL);
  tm = localtime(&_time);
  strftime(text,50,"%d/%m/%Y %H:%M:%S", tm);
  gotoxy(1,1);
  printf("%s    ", text);
}

int main()
{
  int x,y;
  int width, height;
  int key;
  term_init();

  width = screenwidth();
  height = screenheight();
  /* Rellenamos de verde la pantalla */
  textbackground(GREEN);
  clrscr();

  textbackground(BLUE);
  /* Rellenamos de azul la primera fila */
  for (x=0; x<width; x++)
    printf(" ");

  gotoxy(1,height);
  /* Rellenamos de azul la última fila */
  for (x=0; x<width; x++)
    printf(" ");

  gotoxy(2,2);
  while ((key=kbhit2())==0)
      update_time();

  printf("You have pressed: %d\n", key);

  term_defaults();
 
}

We can see I’m calling term_init() and term_defaults() but they are just to restore terminal after the execution ends.
You can download the source code on github. Just include stermp.h in your code and include stermp.h and stermp.c in your project.

Replacing substrings in C++, this time using maps, for multiple replacements

October 18, 2013 No comments

Some days ago, we talked about how to replace substrings inside a string in C++. We finally got a method to just copy and paste into our projects, but when we want to replace multiple substrings we will get some ugly code, and some times it won’t fit.

We will use, one common container in C++ called map, it’s just a collection of associations between two values, we can see it as an array of key and value elements. So we will associate some substrings with another substring (we will associate fromStrs with toStrs. We also will make a replace() function accepting two initial arguments (the big string, and the map), we will look for each one of the keys and replace them like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#include <iostream>
#include <string>
#include <map>

using namespace std;

string replace(string source, std::map<string,string>strMap, int offset=0, int times=0)
{
  int total = 0;
  string::size_type pos;

  for (std::map<string, string>::iterator i=strMap.begin(); i!=strMap.end(); ++i)
    {
      string fromStr = i->first;
      string toStr = i->second;
      pos=offset;
      while ( (pos = source.find(fromStr, pos)) < string::npos)
    {
      if ( (times!=0) && (total++>=times) )
        return source;  // Don't work anymore

      source.replace(pos, fromStr.length(), toStr);
      pos+=toStr.size();
    }
    }
  return source;
}

int main()
{
  string original = "I usually write silly things when testing my programs.";

  map<string,string> mapa;
  mapa["usually"] = "always";
  mapa["silly things"] = "lorem ipsum";

  cout << "Original string: "<<original<<endl;

  cout << "Resulting string: "<<replace2(original, mapa)<<endl;

  return 0;
}

In this case, we can add as much elements as we want to the map, and all of them will be searched in the big string. This function is good when we don’t know the fromStr and toStr in compilation time (we can generate them in runtime), we want to fill the map little by little and then do all replacements at once.

But we can have a little problem, when some toStr are contained inside some fromStr and vice versa, this function won’t work as expected. We will have to iterate the map for each one of the substitutions, instead of doing it globally and make single replacements (like we did with the old replace):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#include <iostream>
#include <string>
#include <map>

using namespace std;

string replace2(string source, std::map<string,string>strMap, int offset=0, int times=0)
{
  int total = 0;
  string::size_type pos=offset;
  string::size_type newPos;
  string::size_type lowerPos;

  do
    {
      string rep;
      for (std::map<string, string>::iterator i=strMap.begin(); i!=strMap.end(); ++i)
    {
      string fromStr = i->first;

      newPos = source.find(fromStr, pos);
      if ( (i==strMap.begin()) || (newPos<lowerPos) )
        {
          rep = fromStr;
          lowerPos = newPos;
        }
    }

      pos = lowerPos;
      if (pos == string::npos)
    break;

      string toStr = strMap[rep];

      source.replace(pos, rep.length(), toStr);
      pos+=toStr.size();

    } while ( (times==0) || (++total<times) );

  return source;
}

int main()
{

  string original = "If a black bug bleeds black blood, what color blood does a blue bug bleed?";
  map<string,string> mapa;
  mapa["black"] = "blue";
  mapa["blue"] = "black";

  cout << "Original string: "<<original<<endl;

  cout << "Resulting string: "<<replace2(original, mapa)<<endl;

  return 0;
}

The expected result is:

Original string: If a black bug bleeds black blood, what color blood does a blue bug bleed?
Resulting string: If a blue bug bleeds blue blood, what color blood does a black bug bleed?

A bit tongue twisting but I think you’ve got the idea.

One more interesting thing is the map creation. Lot’s of times we will have a clear idea of what elements go in the map. So we don’t want to spend time adding elements one by one. As I told you some days ago, you can pass a variable number of arguments to a C function, so, why not do it here? Let’s pass the strings as char* and add them to the map:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
map <string, string> strMap(const char* first, const char* second, ...)
{
  va_list args;
  map<string, string> ret;;
  int n=0;
  char *value;
  string _first, _second;

  ret.insert(pair<string, string>(first, second));
  va_start(args, second);

  do
    {
      value = va_arg(args, char*);
      if (value==NULL)
        break;

      if (++n % 2 ==0)
    {
      _second = string(value);
      ret.insert(pair<string, string>(_first, _second));
    }
      else
    _first = string(value);

    } while (1);

  return ret;
}

Now we can create the map by doing:

1
map <string, string> mapa = strMap("black", "blue", "blue", "black", NULL);

Don’t forget the last NULL, because it can cause a disaster in runtime (not always, you may have luck, but so often), due to strMap condition to stop reading arguments, it stops when it sees a NULL there, if you don’t put it, maybe it is there yet, or maybe not.

I hope this code is useful for you.

Alternate letters of several strings to create a new one

October 16, 2013 No comments

When hashing a password, it is recommended not only hashing the password. Concatenate the password string with other strings (or salt value), so those passwords won’t be broken using hashes databases. This is another technique to include salt to our passwords, or even a method to generate passwords.

This function will alternate letters of some given strings and put them in a new one. So if we take these two words: “Binary” and “Poetry”, it will generate “BPioneatrryy” (I’m sure you haven’t read that :) )

Lets see the function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  function letterAlternate($strs)
  {
    $n_strs = count ($strs);

    if ($n_strs==1)        /* If there is only one string, return it */
      return $strs[0];

    /* Count the characters of every string and store them */
    $chars_str=array();
    for ($i=0; $i<$n_strs; $i++)
      $chars_str[$i]=strlen($strs[$i]);

    $max_str = max ($chars_str); /* The larges string determines when we stop */
    $res='';

    for ($j=0; $j<$max_str; $j++)
      {

    for ($i=0; $i<$n_strs; $i++)
      {
        /* If the string isn't finished, we'll add the character to the string */
        if ($j<$chars_str[$i])
          $res.=$strs[$i][$j];
      }
      }

    return $res;
  }

echo letterAlternate (array("Binary", "Poetry", "Gaspar", "Fernandez", "Programming", "Blog", "Free", "Software"));

We will get: BPGFPBFSioaerlronesrooefatpnggetrraarwyyrnaadmremezing

Replace substrings inside strings in C++

October 12, 2013 No comments

One of the most useful tools when programming is searching and replacing text from a string. In other words, searching for substrings inside a bigger string and replacing them with other substrings. For example, we can use templates to generate a message or our desired output, we just have to replace some keywords located inside our template (our big string) with our generated data.

We will do it calling standard string methods to find this substring and then replacing it with another one:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <iostream>
#include <string>

using namespace std;

int main()
{
  string original = "Going to sleep. Everyone knows past 12 we must go to sleep.";

  string::size_type pos = original.find("sleep", 0);

  cout << "Original string: "<<original<<endl;

  if (pos < string::npos)
    original.replace(pos, string("sleep").length(), "code");

  cout << "Resulting string: "<<original<<endl;
  return 0;
}

To compile:

$g++ -o replace replace.cpp

In this case, we use method find to search for the position where the substring “sleep” starts. If the string is not found, it returns string::npos (a constant), but if it is, we will replace it with the word “code”, so we will have to say the start position inside original, and the size of the substring we want to replace.

Now, what we want is to replace all occurrences of “sleep” inside the string, so we just have to loop find() and replace() while we are still finding the substring, this way:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#include <iostream>
#include <string>

using namespace std;

int main()
{
  string original = "Going to sleep. Everyone knows past 12 we must go to sleep.";

  string::size_type pos = 0;

  // No we are using these variables with the substring to be replaced and the replacement.
  string fromStr = "sleep";
  string toStr = "code";

  cout << "Original: "<<original<<endl;

  // Loop while finding substrings
  while ((pos = original.find(fromStr, pos)) < string::npos)
    {
      original.replace(pos, fromStr.length(), toStr);
      pos+=toStr.size();    // Very important, add the replacement substring size
                                // to pos to avoid infinite loops
    }

  cout << "Resulting string: "<<original<<endl;
  return 0;
}

If we run this code, the two occurrences of “sleep” will be replaced by “code”. But we find a interesting comment in the code: “to avoid infinite loops”, what we do is adding the size of the string “code” to the pos variable. This would be a special case when the substring to be replaced is contained inside the replacement string (or they both are the same), in other words, if we search “to” and we want to replace it by “ton” and we comment this line. It will do the replacement with no ending, and if we cout the string inside the loop we will se “tonnnnnn … with a growing number of n”.

We have now a good way to do search and replace, so let’s create a function to simplify the usage, but with an added value: we can give an offset value (so we can choose where to start searching in the original string, just giving pos a non-zero value), and a counter, so we can replace fromStr with toStr a number of times, so if our keyword appears ten times, we can choose to replace it five times.

Let’s do this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include <iostream>
#include <string>

using namespace std;

string replace(string source, string fromStr, string toStr, int offset=0, int times=0)
{
  int total = 0;
  string::size_type pos=offset;
  while ( ( (pos = source.find(fromStr, pos)) < string::npos) && ( (times==0) || (total++<times) ) )
    {
      source.replace(pos, fromStr.length(), toStr);
      pos+=toStr.size();
    }
  return source;
}

int main()
{
  string original = "Going to sleep. Everyone knows past 12 we must go to sleep. So if I sleep tonight, I don't have to sleep tomorrow morning";

  cout << "Original string: "<<original<<endl;

  cout << "Resulting string: "<<replace(original, "sleep", "code 20, 2)<<endl;

  return 0;
}

The output will be like this:

Original string: Going to sleep. Everyone knows past 12 we must go to sleep. So if I sleep tonight, I don’t have to sleep tomorrow morning
Resulting string: Going to sleep. Everyone knows past 12 we must go to code. So if I code tonight, I don’t have to sleep tomorrow morning.

So, we start searching substring “sleep” from character number 20, and then replace this substring twice. The replace function has initial values, so if we omit offset we will start from the beginning of the string, and if we omit the number of times, this will be zero, and it means all occurrences will be replaced.

To make it better, we can make use of Glib::ustring instead of string. Glib is a cross-platform utility library that implements its own version of string with UTF8 support. So if we use special characters like tildes or symbols we will have encoding problems. We can test it this way:

1
2
3
4
5
6
7
#include <string>

int main()
{
  cout << string("piñata").length() << endl;
  return 0;
}

Let’s see the output. We know “piñata” has six letters, but sometimes we will get seven because we are using a multibyte encoding (like UTF8 wich uses two bytes to encode “ñ”). There it goes Glib::ustring, this library (free, of course), behaves exactly like string (but with UTF-8 support. Let’s see:

1
2
3
4
5
6
7
8
9
10
11
12
13
#include <string>
#include <iostream>
#include <glibmm/ustring.h>

using namespace std;

int main()
{
  cout << Glib::ustring("piñata").length()<<endl;
  cout << string("piñata").length()<<endl;

  return 0;
}

To compile:

$ g++ -o wordlength wordlength.cpp `pkg-config –libs –cflags glibmm-2.4`

We will have to install the library glibmm (version 2.4 as October 2013)

In this example (if the source file is saved with utf-8 encoding), it will give us the right value

And of course, the replace function can be compatible with Glib::ustring, just replacing string with Glib::ustring:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include <iostream>
#include <glibmm/ustring.h>

using namespace std;

Glib::ustring replace(Glib::ustring source, Glib::ustring fromStr, Glib::ustring toStr, int offset=0, int times=0)
{
  int total = 0;
  Glib::ustring::size_type pos=offset;
  while ( ( (pos = source.find(fromStr, pos)) < Glib::ustring::npos) && ( (times==0) || (total++<times) ) )
    {
      source.replace(pos, fromStr.length(), toStr);
      pos+=toStr.size();
    }
  return source;
}

int main()
{
  Glib::ustring original = "Going to sleep. Everyone knows past 12 we must go to sleep. So if I sleep tonight, I don't have to sleep tomorrow morning.";

  cout << "Original string: "<<original.raw()<<endl;

  cout << "Resulting string: "<<replace(original, "sleep", "code").raw()<<endl;

  return 0;
}

Notice that I’ve used the method raw() to display the string with cout, this is because we have to pass just bytes to cout and not characters (in this case, multibyte characters).

Photo: Jon-Eric Melsæter (Flickr) CC-by

Top