Archive

Archive for November, 2013

Regular expressions in C++ with Glib::Regex

November 30, 2013 No comments

In our programming life, there are some “before and after”s, and one of them is when we learn to use regular expressions… and they are like Twitter, you start with it, follow some famous people and a couple of friends, write a test tweet and a “how wonderful life is” tweet, and forget it. But when it’s your time, you can’t stop using it. So regular expressions or “regex” are the same, when you discover them you say: “Oh! It’s nice!”, or “I could do a lot with it”, but after some time (maybe weeks, months or years), when you have a strings problem, the first solution you try is a regex.

One of the common commands using regex is grep, of course this system is too good to be used only in one place. This is the reason why lots of programming languages have functions or classes to use them easily, for example, PHP had ereg_* in the past, now we use preg_*, in Javascript we use RegExp class, in Java we can even use the String class to parse regex, and so on.

But when working in C++ we don’t have native solutions for that, at least in std, ok C++11 has, but we don’t always have a C++11 compiler ready. We have to use libraries as Boost or Glib to support them, if we don’t want to do it by hand.

We are going to do it with Glib. Imagine we are making a template. Some keywords will be replaced with calculated values. Keywords will begin and end with a %, so we want to get the position of these keywords, and which keyword has been discovered:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <glibmm/regex.h>
#include <glibmm/ustring.h>
#include <iostream>

using namespace std;
using namespace Glib;

int main()
{
  Glib::ustring str1 = "Hi %name%, your friend %friend% told me you are a %job%.";
  cout << "Original string: "<< str1 << endl;

  Glib::RefPtr<Regex> myr = Regex::create("%[a-z]*%");
  MatchInfo minfo;

  myr->match(str1, minfo);
  int start, end;
  int i=0;

  while (minfo.matches())
    {
      cout << "Word: " << minfo.fetch(0)<<endl;
      if (minfo.fetch_pos(0, start, end))
        {
          cout << "   Start:  "<<start<<endl<<"   End: "<<end<<endl;
        }
      minfo.next();
      ++i;
    }
    cout << "Occurrences: "<<i<<endl;
}

To compile it, we must have glibmm installed, then:

$ g++ -o regex1 regex1.cpp `pkg-config –libs –cflags glibmm-2.4`

In this piece of code, we can see, the regex “%[a-z]*%” has been applied, so we can get lowercase letters from a to z enclosed between % symbols. In the sample string we’ve found 3 occurrences, printing on screen start position, end position and the matched string for each one.

It can be enough for many cases, but this example will return strings like %name% or %friend%, which in certain cases it is not useful, we want name or friend, ok, we can handle that, but we can get those values with regex too applying a parenthesis in the regex, enclosing what we want, this way: “%([a-z]*)%”, in other words, we are interested in this part of the string. But we will obtain several values. One of them will be the old string, the entire match and not only the part we are interested in. But if we change the code a little bit, we’ll be able to get it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <glibmm/regex.h>
#include <glibmm/ustring.h>
#include <iostream>

using namespace std;
using namespace Glib;

int main()
{
  Glib::ustring str1 = "Hi %name%, your friend %friend% told me you are a %job%.";
  cout << "Original string: "<< str1 << endl;

  Glib::RefPtr<Regex> myr = Regex::create("%([a-z]*)%");
  MatchInfo minfo;

  myr->match(str1, minfo);
  int start, end;
  int i=0;

  while (minfo.matches())
    {
      cout << "Match "<< i+1 << ": "<<endl;

      for (unsigned j = 0; j< minfo.get_match_count(); ++j)
    {
      cout << "Word ("<<j<<"): " << minfo.fetch(j)<<endl;
      if (minfo.fetch_pos(j, start, end))
        {
          cout << "   Start:  "<<start<<endl<<"   End: "<<end<<endl;
        }
    }
      minfo.next();
      ++i;
    }
    cout << "Occurrences: "<<i<<endl;
}

In this case, we are iterating get_match_count() times, so we will get the number of strings returned by each match of the expression (expressions can be so complex, and we can add more parenthesis). Calling minfo.fetch(1) we will get the strings: “name”, “friend” and “job”.

But, to write a better example, let’s parse a simple XML tag. As regex we are taking: “<([\\w:]*)( [^<>]*)?>([^<>]*)</\\1>“, that means:

  • Symbol <
  • a word, letters and numbers
  • maybe a space and several characters, different than < and >
  • Symbol >
  • Several characters. Neither < nor >
  • Symbols < and /
  • The same word found in the beginning
  • Symbol >

Then our text string will be: “Sample text

And the result will be:

Original String: <MyTag id=”123″>Sample text</MyTag>
Match 1:
Word (0): <MyTag id=”123″>Sample Text</MyTag>
Start: 0
End: 51
Word (1): MyTag
Start: 1
End: 6
Word (2): id=”123″
Start: 6
End: 15
Word (3): Sample text
Start: 16
End: 43
Occurrences: 1

(Note: Start and End position, won’t match reality, they were taken by another example)

So, with this little regex we have parsed this XML tag, this would be useful in little projects.

Foto: li xiang (Flickr) CC-by

Call a variable whose name is contained by another one [BASH]

November 4, 2013 No comments

numbers

I’m sure this has happened to you. Imagine we have three variables (RED = “ff0000″ ; GREEN = “00ff00″ ; BLUE = “0000ff”) and a function which giving the name of the color, it gives us the code. Ok, we can do this many ways, but this is just an example. What I want is to tell get_color() function a name of a variable, and it will give me the value:

1
2
3
4
5
6
7
8
9
10
RED="ff0000"
GREEN="00ff00"
BLUE="0000ff"

function get_color()
{
   echo ${!1}
}

get_color BLUE

The key, as you may see is the ! sign before the name of the variable in braces. It can be a bit more clearer:

1
2
3
4
5
6
7
SEVILLE="Torre del Oro"
LONDON="Big Ben"
NEWYORK="Statue of Liberty"

CITY=SEVILLE

echo I really like to go to $CITY to see ${!CITY}

If I change the value of CITY, it will pick another monument from the list before.

Prevent a delayed website because of a crashed server

November 1, 2013 No comments

When we are generating content for our website dinamically, sometimes information comes from outside, so the information we serve not only depends on our server, we have to retrieve it from another place, and this may not be controlled by us. Sometimes, server crashes, and you can get a timeout (over 20 or 30 seconds later), so our website will be terribly slow.

What we are going to do is to reduce this timeout. Imagine to situations:

  • We get the information, and transform it in anyway, we can apply caches to it, so the next time we need this information, can serve it as we downloaded it before. But maybe when we try to download again the information, the site doesn’t work. It’s worth checking.
  • We include external images or resources and they may not be ready. So we check the main server before using it. If the server is ok, we can let the user download the resource from outside; if it’s not we can put a placeholder resource, a backup or whatever.

We can check the availability with curl:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<?php
$url="URL de nuestro servidor";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 1);
curl_setopt($ch, CURLOPT_NOBODY, true); // Excluimos el cuerpo de la web
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // Devolvemos el contenido a la variable
$data = curl_exec($ch);
if ($data)
    echo "It works";
else
    echo "Does not work: ".curl_error($ch);
curl_close($ch);
?>

So, if the foreign server lasts more than 1 second (CURLOPT_TIMEOUT param), we can say it doesn’t work (it may be overloaded, not working or it can be a connection problem, we don’t mind, we only know it doesn’t work properly.) But when working internally (for example to download contents we will format and cache later, inside a cron job… we can give a higher value here.
Foto: Orin Zebest (Flickr) CC-by

Top