Blogs

Although NoSQL data storage is a new trend to storing unstructured data, relational data storage in my view can still be a good choice for such data. It depends how one configure and use the relational table. However, single node relational data storage is unable to scale with the increasing amount of data. If there is a need for your organization to keep historical structured data in a "centralized" data store, then perhaps one choice is to migrate to a scalable solution. One possible choice is Postgres-XC.

Postgres-XC is an open source project cluster solution that provides:

Write-scalability
Write-scalable means Postgres-XC can be configured with as many database servers as you want and handle much more writes (updating SQL statements) which single database server cannot do.
Synchronous update
Synchronous means any database update from any database server is immediately visible to any other transactions running in different masters.
Symmetrical update
Symmetric means you can have more than one data base servers which provide single database view.
Transparency
Transparent means you don't have to worry about how your data is stored in more than one database servers internally.

It is a collection if tightly coupled database components which can be installed in more than one hardware or virtual machines. They store your data in a distributed way, that is, partitioned or replicated way at your choice for each table. Postgres-XC determines where the target data is stored and issue corresponding queries to servers with the target data. Each database server provides uniform data view to your applications. Any database update from any server is immediately visible to applications connecting the database from other servers. This feature is called "synchronous multi-master" capability and this is the most significant feature of Postgres-XC. Details about Postgres-XC can be found at documentation site here.

Single Node Configuration
In this case study, Lubuntu 14.04 was used. You can of course install Postgres-XC that is pre-built not just for Linux OS. Here, I have simply installed Postgres-XC packaged for Ubuntu-based OS. The default installation will create a GTM, a Coordinator, and 2 Datanodes. I would like to make it better by including a GTM-Proxy. The following first provide the steps to creating a single node cluster. Once a single node cluster is up and running, you can simply adapt the configuration to other node in the cluster.

Install from repository
Use the following command to install from repository.
sudo apt-get install postgres-xc
The version of Postgres-XC at of current time of writing this article is 1.1 (Software Packages in "trusty", Subsection database). The default installation come pre-configured with 1 GTM, 1 Coordinator, and 2 Datanode. Data folders for these components are initialized and configuration files are placed accordingly to different folders in /etc/postgres-xc/.

After installation complete, you can issue the following command to check out the location of Postgres-XC files installed by the package.
dpkg -L postgres-xc
Note where the startup script and configuration files in the default installation. Do also take note of the sample configuration file provided. You may need to use them if you mess up and did not make a backup of the original files in /etc/postgres-xc/ subfolders. The startup script of postgres-XC is located at /etc/init.d/postgres-xc.
... /usr/share/postgres-xc/gtm_proxy.conf.sample ... /etc/init.d/postgres-xc /etc/postgres-xc/datanode /etc/postgres-xc/gtm /etc/postgres-xc/coordinator ...
Initialize GTM-Proxy data folder
GTM-Proxy is not a mandatory component of Postgres-XC cluster but it can be used to group messages between GTM and cluster nodes, thus reducing workload and the number of packages exchanged through network. This step will put a GTM-Proxy to sit in-between the GTM and the other components (Coordinator and Datanode).

The following will create a data folder for GTM-Proxy. Take note to execute initgtm by postgres-xc user. postgres-xc user is the default user created by package to run the original set of Postgres-XC components.
su postgres-xc initgtm -Z gtm_proxy -D /var/lib/postgres-xc/GTM_PROXY
The configuration file for GTM-Proxy gtm_proxy.conf is created after initgtm command under GTM-Proxy data folder /var/lib/postgres-xc/GTM_PROXY. It is up to you to decide whether you want to standardize the way default installation place the configuration files for other components. For me, I decided to copy gtm_proxy.conf to /etc/postgres-xc/gtm_proxy/GTM_PROXY/gtm_proxy.conf, and then create a symbolic link to /etc/postgres-xc/gtm_proxy/GTM_PROXY/gtm_proxy.conf in GTM-Proxy data folder /var/lib/postgres-xc/GTM_PROXY. The data folders for GTM, Coordinator, and 2 Datanodes are located in /var/lib/postgres-xc/. The configuration files for those components are located under different subfolders in /etc/postgres-xc/. For Coordinator, the configuration files are located in /etc/postgres-xc/coordinator/CN. For Datanode 1, its configuration files are located in /etc/postgres-xc/datanode/DN1. For Datanode 2, its configuration files are located in /etc/postgres-xc/datanode/DN2. For GTM, its configuration file is located in /etc/postgres-xc/gtm/GTM. You will see that the startup script /etc/init.d/postgres-xc refer to these folders. Thus, it is easier to modify the startup script by placing the GTM-Proxy data folder under /var/lib/postgres-xc/GTM_PROXY, and GTM-Proxy configuration file under /etc/postgres-xc/gtm_proxy/GTM_PROXY.
Update GTM and GTM-Proxy configuration files
- /var/lib/postgres-xc/GTM/gtm.conf
  The default GTM listens to port 6666. Coordinator and Datanodes connect to this port. To make less changes to configuration files of Coordinator and Datanodes, make GTM-Proxy listens on port 6666 instead. Here, GTM is changed to listen to port 6668 as suggested in the gtm_proxy.conf.
  Ensure the nodename of GTM is unique if you intend to create standby GTM in your cluster too.
  Ensure GTM listens to '*' address instead of 'localhost'. Here I use GTM-Proxy, so this is to allow other GTM-Proxy or standby GTM nodes in the cluster to connect to this active GTM.
  You will notice that to configure standby GTM, you simply need to change the file gtm.conf.
- /var/lib/postgres-xc/GTM_PROXY/gtm_proxy.conf
  Change the listening port of GTM-Proxy to 6666 instead. Coordinator and Datanodes will connect to GTM via GTM-Proxy through this port. Point GTM-Proxy to GTM by changing the two parameters gtm_host and gtm_port in GTM-Proxy configuration file.
  Change nodename of GTM-Proxy accordingly to a unique name.
Modify startup script /etc/init.d/postgres-xc
The following is the modified startup script. You can of course remove this script if you intend to start up the components manually using the postgres-xc user. Just remember to start up the sequence in the right order. If you have multiple nodes in different physical computing nodes, do start up the GTM node first. Coordinators should be the last to start up.
For shutdown, coordinators should be the first to shutdown. GTM should be the last to shutdown. If you shutdown GTM before coordinator you will encounter problem when shutting down coordinator. I have to use pg_ctl kill QUIT process_id_of_coordinator to terminate it. To forcefully terminate GTM or GTM-Proxy you can use kill command. It is recommended to gtm_ctl command for GTM or GTM-Proxy.

#!/bin/sh set -e #set -x -v ### BEGIN INIT INFO # Provides: postgres-xc # Required-Start: $local_fs $remote_fs $network $time # Required-Stop: $local_fs $remote_fs $network $time # Should-Start: $syslog # Should-Stop: $syslog # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 # Short-Description: Postgres-XC RDBMS server ### END INIT INFO PGXC_DATA=/var/lib/postgres-xc #PGXC_LOG=/var/log/postgres-xc PGXC_RUN=/var/run/postgresql . /lib/lsb/init-functions if [ -r /etc/default/postgres-xc ]; then . /etc/default/postgres-xc fi # create socket directory [ -d $PGXC_RUN ] || mkdir -p $PGXC_RUN chown postgres-xc.postgres-xc $PGXC_RUN chmod 2775 $PGXC_RUN check_if_running() { status=0 case "$1" in gtm) pidfile=gtm.pid sleep 2 ;; gtm_proxy) # include gtm_proxy pidfile=gtm_proxy.pid sleep 2 ;; postgres) pidfile=postmaster.pid confport=$(grep -si '^port *=' $PGXC_DATA/$2/postgresql.conf | cut -f2 -d= | sed -e 's/#.*$//' -e 's/ //g') port=${confport:=5432} # wait for server to come up, but no more than 10 seconds i=1 [ -z "$3" ] && while [ ! -S $PGXC_RUN/.s.PGSQL.$port ]; do i=$((i+1)) sleep 1 [ $i -gt 10 ] && break done ;; esac # check if it is really running if [ -r $PGXC_DATA/$2/$pidfile ]; then PID=`head -n 1 $PGXC_DATA/$2/$pidfile` || true if [ -z "$PID" ]; then status=1 else [ "`ps h -o comm -p $PID`" != $1 ] && status=1 fi else status=1 fi return 0 } do_ctl() { ACTION=$1 STATUS=0 status=0 [ "$ACTION" = "stop" ] && { $FAST_STOP && STOP_MODE="-m smart" # modified to smart stop } for i in 1 2 3 4; do # modified # modified the start up sequence if [ "$ACTION" = "start" ] || [ "$ACTION" = "status" ]; then case $i in 1) TYPE=gtm ;; 2) TYPE=gtm_proxy ;; 3) TYPE=datanode ;; 4) TYPE=coordinator ;; esac elif [ "$ACTION" = "stop" ]; then # modified the stop sequence (reverse of start up sequence) case $i in 4) TYPE=gtm ;; 3) TYPE=gtm_proxy ;; 2) TYPE=datanode ;; 1) TYPE=coordinator ;; esac fi case $TYPE in gtm) PG_CTL=gtm_ctl PG_START=gtm ;; gtm_proxy) PG_CTL=gtm_ctl PG_START=gtm_proxy ;; coordinator|datanode) PG_CTL=pg_ctl PG_START=postgres ;; esac log_daemon_msg "$2 Postgres-XC ${TYPE}" for NODE in $(ls /etc/postgres-xc/$TYPE/); do [ -f /etc/postgres-xc/$TYPE/$NODE/run ] && { # if [ "$ACTION" = "reload" ] && [ "$TYPE" = "gtm" ]; then # ACTION="restart" # fi [ "$ACTION" = "stop" ] && check_if_running $PG_START $NODE stop if [ $status -eq 1 ]; then log_progress_msg "$NODE is not running" else # remove -l option, as starting gtm_proxy has problem # log options are specified in configuration file start-stop-daemon -c postgres-xc \ -Sx /usr/bin/$PG_CTL -- $ACTION $STOP_MODE \ -D $PGXC_DATA/$NODE -Z $TYPE # ERRMSG=$(start-stop-daemon -c postgres-xc \ # -Sx /usr/bin/$PG_CTL -- $ACTION $STOP_MODE \ # -D $PGXC_DATA/$NODE -Z $TYPE \ # -l $PGXC_LOG/datanode.log # STATUS=$? # if $FAST_STOP && [ $STATUS -gt 0 ] && # [ "$ACTION" = "stop" ]; then # STATUS=0 # start-stop-daemon -c postgres-xc \ # -Sx /usr/bin/$PG_CTL -- stop -mi \ # -D $PGXC_DATA/$NODE -Z $TYPE \ # -l $PGXC_LOG/datanode.log || STATUS=$? # fi) REPORT=$NODE [ "$ACTION" = "status" ] && REPORT=$ERRMSG log_progress_msg "$REPORT" if [ $STATUS -eq 0 ] && [ "$ACTION" = "start" ]; then check_if_running $PG_START $NODE STATUS=$status fi fi } done log_end_msg $STATUS done } case "$1" in start) do_ctl start Starting ;; # restart) # do_ctl restart Restarting # ;; status) do_ctl status Status ;; stop) do_ctl stop Stopping ;; # reload) # do_ctl reload Reloading # ;; # force-reload) # $0 restart # ;; *) # echo "Usage: $0 {start|stop|restart|reload|force-reload|status}" echo "Usage: $0 {start|stop|status}" exit 1 ;; esac exit 0
Note:
- The original script /etc/init.d/postgres-xc starts up components in incorrect order. The right startup sequence should be GTM, Datanode, then Coordinator. Here, I have included a GTM-Proxy, so the startup order is GTM, GTM-Proxy, Datanode, then Coordinator. The shutdown sequence should be reverse.
- As of writing, I have problem starting up GTM-Proxy with -l option. Thus, the startup script has been modified to start all components without -l option.

Note:

Ensure every node in cluster has unique name. The name of every Coordinator and Datanode should correspond to the records in pg_catalog.pgxc_node relational table in postgres database.
Install Samba so that cluster node can be referred via netbios name instead of actual IP address.
Update pg_hba.conf for every Coordinator and Datanode.
Ensure coordinator know where to locate Datanodes and other Coordinators. You can start coordinator in restoremode using
pg_ctl -Z restoremode -D /var/lib/postgres-xc/CN/ start
then you can update the cluster node information. The following commands will connect to the coordinator at port 5432 and database postgres. It first display all the node information.
psql -h <IP of coordinator> -U postgres-xc -d postgres select * from pg_catalog.pgxc_node order by node_name;
It is recommend to remove all nodes and insert the records using create node SQL command because create node will automatically create the node_id. The node information must be correct and in sync for all Coordinators. Otherwise, you will have data consistency error.

Multiple Nodes Configuration

This multiple nodes configuration is simply done by cloning and adapting the node configuration files.

Here, I have 4 computing nodes in cluster. 1 GTM-Proxy, 1 Coordinator, 2 Datanodes for each PC. I configure PC1 to start active GTM. The rest of PCs have standby GTM.

First, update the configuration file of each component in all PCs.
Start GTM as normal and then GTM-PROXY. Then start Coordinators in restoremode. Update the node information of Coordinators as follow. Here I have 4 PCs. It is recommended to delete all node information and then recreate them. This will ensure information is in sync with the rest of other Coordinators. Once done, stop Coordinators in restoremode.
psql -h <IP of coordinator> -U postgres-xc -d postgres delete from pg_catalog.pgxc_node; create node "PC1-cn" with (type='coordinator', host='192.168.0.10', port=5432); create node "PC1-dn1" with (type='datanode', host='192.168.0.10', port=15432); create node "PC1-dn2" with (type='datanode', host='192.168.0.10', port=15433); create node "PC2-cn" with (type='coordinator', host='192.168.0.11', port=5432); create node "PC2-dn1" with (type='datanode', host='192.168.0.11', port=15432); create node "PC2-dn2" with (type='datanode', host='192.168.0.11', port=15433); create node "PC3-cn" with (type='coordinator', host='192.168.0.12', port=5432); create node "PC3-dn1" with (type='datanode', host='192.168.0.12', port=15432); create node "PC3-dn2" with (type='datanode', host='192.168.0.12', port=15433); create node "PC4-cn" with (type='coordinator', host='192.168.0.13', port=5432); create node "PC4-dn1" with (type='datanode', host='192.168.0.13', port=15432); create node "PC4-dn2" with (type='datanode', host='192.168.0.13', port=15433);
Start DataNodes in normal mode.
Start Coordinator in normal mode.

That is all. Have fun.

Blogs

Thursday 11 December 2014

Postgres-XC on Lubuntu 14.04 LTS

Tuesday 6 May 2014

Interfacing Python with C/C++ (Passing Python List into C function)

Thursday 17 April 2014

Stochastical Learning of Hand Written Digits