Archive

Posts Tagged ‘c’

Regular expressions in C++ with Glib::Regex

November 30, 2013 No comments

In our programming life, there are some “before and after”s, and one of them is when we learn to use regular expressions… and they are like Twitter, you start with it, follow some famous people and a couple of friends, write a test tweet and a “how wonderful life is” tweet, and forget it. But when it’s your time, you can’t stop using it. So regular expressions or “regex” are the same, when you discover them you say: “Oh! It’s nice!”, or “I could do a lot with it”, but after some time (maybe weeks, months or years), when you have a strings problem, the first solution you try is a regex.

One of the common commands using regex is grep, of course this system is too good to be used only in one place. This is the reason why lots of programming languages have functions or classes to use them easily, for example, PHP had ereg_* in the past, now we use preg_*, in Javascript we use RegExp class, in Java we can even use the String class to parse regex, and so on.

But when working in C++ we don’t have native solutions for that, at least in std, ok C++11 has, but we don’t always have a C++11 compiler ready. We have to use libraries as Boost or Glib to support them, if we don’t want to do it by hand.

We are going to do it with Glib. Imagine we are making a template. Some keywords will be replaced with calculated values. Keywords will begin and end with a %, so we want to get the position of these keywords, and which keyword has been discovered:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <glibmm/regex.h>
#include <glibmm/ustring.h>
#include <iostream>

using namespace std;
using namespace Glib;

int main()
{
  Glib::ustring str1 = "Hi %name%, your friend %friend% told me you are a %job%.";
  cout << "Original string: "<< str1 << endl;

  Glib::RefPtr<Regex> myr = Regex::create("%[a-z]*%");
  MatchInfo minfo;

  myr->match(str1, minfo);
  int start, end;
  int i=0;

  while (minfo.matches())
    {
      cout << "Word: " << minfo.fetch(0)<<endl;
      if (minfo.fetch_pos(0, start, end))
        {
          cout << "   Start:  "<<start<<endl<<"   End: "<<end<<endl;
        }
      minfo.next();
      ++i;
    }
    cout << "Occurrences: "<<i<<endl;
}

To compile it, we must have glibmm installed, then:

$ g++ -o regex1 regex1.cpp `pkg-config –libs –cflags glibmm-2.4`

In this piece of code, we can see, the regex “%[a-z]*%” has been applied, so we can get lowercase letters from a to z enclosed between % symbols. In the sample string we’ve found 3 occurrences, printing on screen start position, end position and the matched string for each one.

It can be enough for many cases, but this example will return strings like %name% or %friend%, which in certain cases it is not useful, we want name or friend, ok, we can handle that, but we can get those values with regex too applying a parenthesis in the regex, enclosing what we want, this way: “%([a-z]*)%”, in other words, we are interested in this part of the string. But we will obtain several values. One of them will be the old string, the entire match and not only the part we are interested in. But if we change the code a little bit, we’ll be able to get it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <glibmm/regex.h>
#include <glibmm/ustring.h>
#include <iostream>

using namespace std;
using namespace Glib;

int main()
{
  Glib::ustring str1 = "Hi %name%, your friend %friend% told me you are a %job%.";
  cout << "Original string: "<< str1 << endl;

  Glib::RefPtr<Regex> myr = Regex::create("%([a-z]*)%");
  MatchInfo minfo;

  myr->match(str1, minfo);
  int start, end;
  int i=0;

  while (minfo.matches())
    {
      cout << "Match "<< i+1 << ": "<<endl;

      for (unsigned j = 0; j< minfo.get_match_count(); ++j)
    {
      cout << "Word ("<<j<<"): " << minfo.fetch(j)<<endl;
      if (minfo.fetch_pos(j, start, end))
        {
          cout << "   Start:  "<<start<<endl<<"   End: "<<end<<endl;
        }
    }
      minfo.next();
      ++i;
    }
    cout << "Occurrences: "<<i<<endl;
}

In this case, we are iterating get_match_count() times, so we will get the number of strings returned by each match of the expression (expressions can be so complex, and we can add more parenthesis). Calling minfo.fetch(1) we will get the strings: “name”, “friend” and “job”.

But, to write a better example, let’s parse a simple XML tag. As regex we are taking: “<([\\w:]*)( [^<>]*)?>([^<>]*)</\\1>“, that means:

  • Symbol <
  • a word, letters and numbers
  • maybe a space and several characters, different than < and >
  • Symbol >
  • Several characters. Neither < nor >
  • Symbols < and /
  • The same word found in the beginning
  • Symbol >

Then our text string will be: “Sample text

And the result will be:

Original String: <MyTag id=”123″>Sample text</MyTag>
Match 1:
Word (0): <MyTag id=”123″>Sample Text</MyTag>
Start: 0
End: 51
Word (1): MyTag
Start: 1
End: 6
Word (2): id=”123″
Start: 6
End: 15
Word (3): Sample text
Start: 16
End: 43
Occurrences: 1

(Note: Start and End position, won’t match reality, they were taken by another example)

So, with this little regex we have parsed this XML tag, this would be useful in little projects.

Foto: li xiang (Flickr) CC-by

Get the HOME directory in C

October 23, 2013 No comments

In our programs, it’s usual to know where is the user home directory, to read/store configuration files, to search for something, or to know if the program is installed globally or locally.

There is a short function to get it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <pwd.h>

char *getHomeDir()
{
  static char *home = NULL;
 
  if (!home)
    {
      home = getenv("HOME") ;
    }
  if (!home)
    {
      struct passwd *pw = getpwuid(getuid());
      if (pw)
          home = pw->pw_dir ;
    }
  return home;
}

int main(int argc, char *argv[])
{
  printf ("HOME: %s\n", getHomeDir());

  return EXIT_SUCCESS;
}

If we look at getHomeDir() function, the directory we’re looking for is stored in a static variable, as we are not freeing this variable, we will get the value of this variable when we ask again (the user isn’t changing his home). We will ask the system just for the first time.

In the other hand, we have two ways of getting the PATH, the first is with the HOME environment variable, most of the times it will be defined and we can get it, but if it’s not there, we can get this information from /etc/passwd. There is much more information that what we want and we can have a look at the manual of getpwuid() to know what we can get (the user password is not included, muahaha, it isn’t stored there anymore, and if it is in an old or embedded system, you will probably see a hash).

Foto: Neetesh Gupta (Flickr) CC-by.

stermp.h, trying to port conio.h to Linux

October 22, 2013 No comments

This time I want to rescue an old project. I started it long ago. These days I’ve been reading some source codes in facebook using conio.h so I hope this could be interesting for anyone.

Of course there are some libraries that allow us to to write strings in colors and get/set position on screen and keys without echoing and pressing Enter, or we can do it without them, using ANSI codes directly but we would have to do a lot of changes in the source code.

I tried to keep the name of the functions the same, we use:

  • clrscr() : To clean screen
  • textbackground(color) : To change background color
  • textcolor(color) : To change text color
  • gotoxy(x,y) : Go to specific position
  • wherex() : To get X position
  • wherey() : To get Y position
  • getch() : To get a key press without ENTER
  • getche() : Like getch but echoing character on screen
  • kbhit() : To know if a key has been pressed without stopping execution. Returns true or false

We also have some additional stuff like:

  • wherexy() : Returns X,Y position in a struct
  • kbhit2() : Gets a key code if pressed without stopping execution
  • kbhit_pre() : Prepares to do lots of kbhits() to increase performance
  • restore_terminal_color() : Restores terminal color
  • screenheight() : Gets screen height.
  • screenwidth() : Gets screen width.

I tried also to keep color names the same. Let’s see an example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#include <stdio.h>
#include <time.h>
#include "stermp.h"

void update_time()
{
  struct tm *tm;
  time_t _time;
  char text[50];
  textcolor(YELLOW);
  _time=time(NULL);
  tm = localtime(&_time);
  strftime(text,50,"%d/%m/%Y %H:%M:%S", tm);
  gotoxy(1,1);
  printf("%s    ", text);
}

int main()
{
  int x,y;
  int width, height;
  int key;
  term_init();

  width = screenwidth();
  height = screenheight();
  /* Rellenamos de verde la pantalla */
  textbackground(GREEN);
  clrscr();

  textbackground(BLUE);
  /* Rellenamos de azul la primera fila */
  for (x=0; x<width; x++)
    printf(" ");

  gotoxy(1,height);
  /* Rellenamos de azul la última fila */
  for (x=0; x<width; x++)
    printf(" ");

  gotoxy(2,2);
  while ((key=kbhit2())==0)
      update_time();

  printf("You have pressed: %d\n", key);

  term_defaults();
 
}

We can see I’m calling term_init() and term_defaults() but they are just to restore terminal after the execution ends.
You can download the source code on github. Just include stermp.h in your code and include stermp.h and stermp.c in your project.

Hello world!

October 9, 2013 1 comment

Not just WordPress’ default post title. As a programmer, I’ve written hundreds of “hello world” programs to test compilers, to test libraries, to test new programming languages… that’s an easy way to know things work and to know I can make it work.

I will write here about programming (my favourite languages are C, C++ and PHP, so I will talk about them. Maybe I’ll write something about another languages too); Unix-like operating systems, I use GNU/Linux everyday and I really like the way it works and how customizable it is (but most things are applicable to other *nixes).

So I welcome youto my new blog, hoping you find it useful, and leave some comments here, and share my posts if you like them.

 

Photo: Angel Raul Ravelo Rodriguez (Flickr) Licensed: CC-by.

3XBU27AYR3ZJ code for Technorati

Top