Obtaining ansi SQL from RBR.

Recently I needed to replicate between MySQL and another database technology. You might say, why on earth would you want to do something like that, but believe me there are reasons and definitely not (to go away from MySQL to some other DB technology like Oracle or SQL server). Unsurprisingly there are quite a few different tools to do it from any platform towards MySQL but very few which do it the other way round, just to name a couple: Golden Gate and DSCallards.

Whilst not going into their tools (you can find more information on their websites), HIT from DSCallards needs its software to run on Windows and Golden Gate is an expensive beast which was too much for what I needed, thus I decided to have a look at doing the job myself. Although it might look an overkill to do so, it took me a few hours to find a solution and implement it and a couple more to test it and here is a simple description.

MySQL replication can be SBR (statement based), RBR (row based) or a mixture of both. Now despite the fact that the mixture provides the best performance, it would be the most complicated in order to achieve a home made solution, and SBR in my opinion would have also been a bit of a headache to make sure queries didn’t contain non ansi sql through the use of functions like now(), sysdate() and many others. I therefore decided that RBR would be the option of choice.

Although I wonder how many of you ever read an RBR binlog using `mysqlbinlog mysql-bin.000004`, it would be something like:

#090526 14:09:13 server id 1  end_log_pos 1420  Query   thread_id=192   exec_time=0     error_code=0
SET TIMESTAMP=1243343353/*!*/;
BEGIN
/*!*/;
# at 1420
# at 1464
#090526 14:09:13 server id 1  end_log_pos 1464  Table_map: `test`.`t2` mapped to number 21
#090526 14:09:13 server id 1  end_log_pos 1498  Write_rows: table id 21 flags: STMT_END_F

BINLOG ‘
+ekbShMBAAAALAAAALgFAAAAABUAAAAAAAAABHRlc3QAAnQyAAID/gL+AQM=
+ekbShcBAAAAIgAAANoFAAAQABUAAAAAAAEAAv/+AQAAAA==
‘/*!*/;
# at 1498
#090526 14:09:13 server id 1  end_log_pos 1567  Query   thread_id=192   exec_time=0     error_code=0
SET TIMESTAMP=1243343353/*!*/;
COMMIT
/*!*/;
# at 1567
#090526 14:09:38 server id 1  end_log_pos 1635  Query   thread_id=192   exec_time=0     error_code=0
SET TIMESTAMP=1243343378/*!*/;
BEGIN
/*!*/;
# at 1635
# at 1679
#090526 14:09:38 server id 1  end_log_pos 1679  Table_map: `test`.`t2` mapped to number 21
#090526 14:09:38 server id 1  end_log_pos 1733  Update_rows: table id 21 flags: STMT_END_F

BINLOG ‘
EuobShMBAAAALAAAAI8GAAAAABUAAAAAAAAABHRlc3QAAnQyAAID/gL+AQM=
EuobShgBAAAANgAAAMUGAAAQABUAAAAAAAEAAv///AQAAAABZPwCAAAAAWT+AQAAAP4CAAAA
‘/*!*/;
# at 1733
#090526 14:09:38 server id 1  end_log_pos 1802  Query   thread_id=192   exec_time=0     error_code=0
SET TIMESTAMP=1243343378/*!*/;
COMMIT

Now that isn’t the most readable text you ever seen right? As Giuseppe said “This is more difficult to read than ancient Etruscan. If you are a DBA, you curse and look for help.” But the replication guys at mysql created a nice -v for us to add to mysqlbinlog thus issuing `mysqlbinlog -v mysql-bin.000004` would result in the following:

#090526 14:09:13 server id 1  end_log_pos 1420  Query   thread_id=192   exec_time=0     error_code=0
SET TIMESTAMP=1243343353/*!*/;
BEGIN
/*!*/;
# at 1420
# at 1464
#090526 14:09:13 server id 1  end_log_pos 1464  Table_map: `test`.`t2` mapped to number 21
#090526 14:09:13 server id 1  end_log_pos 1498  Write_rows: table id 21 flags: STMT_END_F

BINLOG ‘
+ekbShMBAAAALAAAALgFAAAAABUAAAAAAAAABHRlc3QAAnQyAAID/gL+AQM=
+ekbShcBAAAAIgAAANoFAAAQABUAAAAAAAEAAv/+AQAAAA==
‘/*!*/;
### INSERT INTO test.t2
### SET
###   @1=1
###   @2=NULL
# at 1498
#090526 14:09:13 server id 1  end_log_pos 1567  Query   thread_id=192   exec_time=0     error_code=0
SET TIMESTAMP=1243343353/*!*/;
COMMIT
/*!*/;
# at 1567
#090526 14:09:38 server id 1  end_log_pos 1635  Query   thread_id=192   exec_time=0     error_code=0
SET TIMESTAMP=1243343378/*!*/;
BEGIN
/*!*/;
# at 1635
# at 1679
#090526 14:09:38 server id 1  end_log_pos 1679  Table_map: `test`.`t2` mapped to number 21
#090526 14:09:38 server id 1  end_log_pos 1733  Update_rows: table id 21 flags: STMT_END_F

BINLOG ‘
EuobShMBAAAALAAAAI8GAAAAABUAAAAAAAAABHRlc3QAAnQyAAID/gL+AQM=
EuobShgBAAAANgAAAMUGAAAQABUAAAAAAAEAAv///AQAAAABZPwCAAAAAWT+AQAAAP4CAAAA
‘/*!*/;
### UPDATE test.t2
### WHERE
###   @1=4
###   @2=’d’
### SET
###   @1=2
###   @2=’d’
### UPDATE test.t2
### WHERE
###   @1=1
###   @2=NULL
### SET
###   @1=2
###   @2=NULL
# at 1733
#090526 14:09:38 server id 1  end_log_pos 1802  Query   thread_id=192   exec_time=0     error_code=0
SET TIMESTAMP=1243343378/*!*/;
COMMIT

The exact same output with some decently readable output. The problem at this point is that the output is not really something any other technology would undestand (not even feeding it to MySQL would work! ….

At this point I needed to do some compromises in order to reduce the complexity of this job, i.e. I will not be creating and altering tables / databases or indexes during runtime (this I can do without as I can do the same things on the slave manually when I need to do anything like that on the master) and the replication won’t be in real time i.e. the slave will be fed the sql periodically through a script. The last compromise wasn’t actually a compromise but a decision based on speed of coding as I’m more proficient in bash than perl and as such I decided to go with bash as a proof of concept that this thing can be done. This would never happen in reality as it would be much slower as compared to coding the same thing in perl or C / C++ (choice is up to you).

Now parsing the binary logs is not rocket science is it? As you can see there are three hashes ‘###’ in front of the readable query so a simple grep is fine. A few subsitutions and text processing and you’ll end up with:

darrencassar@mysqlpreacher $ /home/dcassar/sandbox/5.1.30/bin/mysqlbinlog  | sed ‘s/^ *//g’ | tr ‘\015\012′ ‘\020 ‘ | sed ‘s/ INSERT/;\nINSERT/g’ | sed ‘s/ DELETE/;\nDELETE/g’ | sed ‘s/ UPDATE/;\nUPDATE/g’ | sed ‘${/^$/!s/$/;\
> /;}’
INSERT INTO test.t1 SET @1=1 @2=’a’;
DELETE FROM test.t1 WHERE @1=1 @2=’a’;
INSERT INTO test.t2 SET @1=2 @2=’d’;
UPDATE test.t2 WHERE @1=2 @2=’d’ SET @1=4 @2=’d’;
INSERT INTO test.t2 SET @1=1 @2=NULL;
UPDATE test.t2 WHERE @1=4 @2=’d’ SET @1=2 @2=’d’;
UPDATE test.t2 WHERE @1=1 @2=NULL SET @1=2 @2=NULL ;
darrencassar@mysqlpreacher $

That is more readable but still not correctly formatted for our ANSI SQL slave.

`INSERT INTO test.t1 SET @1=1 @2=’a’;` would need to be replaced with `INSERT INTO test.t1 (cola, colb) values(1,’a’);` or `INSERT INTO test.t1 values(1,’a’);`,
`DELETE FROM test.t1 WHERE @1=1 @2=’a’;` would need to be replaced with `DELETE FROM test.t1 WHERE cola=1 AND colb=’a’;` and
`UPDATE test.t2 WHERE @1=2 @2=’d’ SET @1=4 @2=’d’;` would have to become `UPDATE test.t2 SET cola=4 , colb=’d’ WHERE cola=2 AND colb=’d’;`

The above means we need to replace @1 and @2 with the appropriate column names done using a lookup table in my case by placing a files in data/dbname each bearing names of the different tables and listing each column in order:
I.E. for a database named test and table named table1 having columns cola and colb I used the database folder named test in the mysql data folder and placed a file named table1, the contents of which were:

cola
colb

The following piece of code does the job of replacing those dreadful @1, @2 etc with proper table names:

for (( i=1; i<=`wc -l $dbname/$tbname | cut -d ” ” -f 1`; i++ ))
do
columnname=`awk ‘NR==a’ a=$i $dbname/$tbname`
LINE_TEMP=`echo $LINE_TEMP | sed ‘s/@’$i’=/AND ‘$columnname’=/g’ | sed ‘s/SET AND/SET/g’ | sed ‘s/WHERE AND/WHERE/g’`
done

If you were thinking how I extracted dbname and tbname, here it is:

dbname=`echo $LINE | cut -d ” ” -f$col | cut -d “.” -f 1`
tbname=`echo $LINE | cut -d ” ” -f$col | cut -d “.” -f 2`

where $LINE is each line extracted using the first command parsing mysqlbinary output and the variable col is 3 for delete and 2 for update which reflects the positioning of the database name and table name inside the extracted lines.

A rather ugly way of re-ordering the update command yet functional is:

echo $LINE_TEMP | cut -d “.” -f2 | sed ‘s/’$tbname’ //’ | sed ‘s/;//’ > templinefile
cat templinefile | sed ‘s/ SET /\nSET /g’ > templinefile2
LINE_TEMP=”UPDATE $dbname.$tbname `tail -1 templinefile2  | sed ‘s/ is NULL/=NULL/g’ | sed ‘s/AND/,/g’` `head -1 templinefile2`;”

The last thing to remember is taking care of `=NULL` and replace it by `is NULL`.

LINE_TEMP=`echo $LINE_TEMP | sed ‘s/=NULL/ is NULL/g’`

As I said this was a fast proof of concept rather than a full fledged optimized script doing the job! The total length of code is 37 lines (excluding comments but with correct and nead formatting).

Enjoy
Darren

VN:F [1.9.22_1171]
Rating: 9.7/10 (3 votes cast)
VN:F [1.9.22_1171]
Rating: +3 (from 3 votes)

Replicating from MySQL to *, 9.7 out of 10 based on 3 ratings

  1. [...] Replicating from MySQL to * 29May2009 Kategori: Mysql Yazar: Mysql Administrator [...]

  2. [...]  Replicating from MySQL to (0 visite) [...]

  3. [...] Cassar, MySQL Preacher, delved into the particulars of replicating from MySQL to *. “Unsurprisingly there are quite a few different tools to do it from any platform towards [...]

  4. [...] Replicating from MySQL [...]

  5. Blowmyhead says:

    [...] Replicating from MySQL to (10 visite) [...]

*