It has been a very long time since I posted anything so I thought I would share this small snippet.

Recently I had to extract some data which was stored as xml in a field within a mysql database table. To explain briefly, this data was actually saved as part of an XMPP (jabber) chat message by an openfire server. Openfire is a real time collaboration server which includes jabber functionality. To store the chat history Openfire records the chat message as an xml chunk in a field named body in a particular database table ofMucConversationLog. I needed to extract certain data that our system had communicated as a chat message through openfire, to do this I used a combination of regular expressions and string manipulation.

Ok, lets say I have messages that are either in format A or B as below:

  <child1>some text</child1>
  <child2>some more text</child2>

  <child1>some text</child1>
  <child2>some more text</child2>

Each row in the database can contain xml in the format of A or B and there are many rows. If I need to extract only the contents of the first child of message format a, I could use the following SQL code to do it (assuming I’ve created a temporary table called temptable to store these values):
[code lang=”sql”]INSERT INTO temptable (child1value)
SUBSTRING(body, LOCATE(‘‘, body)+8,
‘, body) – LOCATE(‘‘, body) – 8)
FROM ofMucConversationLog WHERE body REGEXP ‘*’;[/code]
Unfortunately the REGEXP operator can only be used for testing true or false and so is probably only useful in a WHERE clause. There are no capturing options like in other languages (php, perl java etc.), however you can use it in combination with the substring and locate functions to get the job done, although it is admittedly tedious.

1 comment so far

Add Your Comment
  1. Great detailed information, I just bookmarked you on my google reader.

    Sent from my iPad 4G