目录结构

注:提前言明 本文借鉴了以下博主、书籍或网站的内容,其列表如下:

1、GreenPlum中文官网首页,点击前往 2、GreenPlum git仓库,点击前往 或 本人GreenPlum gitee仓库,点击前往 3、PostgreSQL数据库仓库链接,点击前往 4、YouTube greenplum视频主页,点击前往 5、Bilibili greenplum视频主页,点击前往

1、本文内容全部来源于开源社区 GitHub和以上博主的贡献,本文也免费开源(可能会存在问题,评论区等待大佬们的指正) 2、本文目的:开源共享 抛砖引玉 一起学习 3、本文不提供任何资源 不存在任何交易 与任何组织和机构无关 4、大家可以根据需要自行 复制粘贴以及作为其他个人用途,但是不允许转载 不允许商用 (写作不易,还请见谅 )

打造Greenplum数据库内核开发环境

文章快速说明索引学习资料相关分享打造内核开发环境配置环境相关参数SSH设置密码免密

文章快速说明索引

学习目标:

开源、多云、并行的大数据平台。Greenplum 是全球领先的大数据分析引擎,专为分析、机器学习和AI而打造!接下来一段时间,除了PostgreSQL数据库的相关内容分享 也会把GreenPlum数据库的学习心得及相关知识总结记录下来!后面我可能会去gp工作,不管怎么说 gp都是非常值得一学和大力推广!

学习内容:(详见目录)

1、打造Greenplum数据库内核开发环境

学习时间:

2022年08月01日 15:40:48

学习产出:

1、PostgreSQL数据库基础知识回顾 1个 2、CSDN 技术博客 1篇

学习资料相关分享

这块可以参见本人之前的博客:

GreenPlum的学习心得和知识总结(一)|GreenPlum数据库源码编译安装及学习资料汇总,点击前往

因为在下是做PostgreSQL数据库内核开发的,这里只介绍一本书:

《Greenplum:从大数据战略到实现》,点击前往

打造内核开发环境

在上一篇,我们介绍了GreenPlum数据库的源码编译安装办法,今天我们要像gp原厂核心研发那样 搭建一个属于自己的内核开发环境,如下:

开发环境: Thinkpad下的VMware虚拟机(Ubuntu)编辑代码: Vim + Vscode编译器+调试器: GCC+GDB+Vscode代码阅读: ThinkPad下的Vscode

据我所知,原厂的开发会配备Mac本,但我这里没有Mac本(其实也没用过),因此我们今天的环境都是在Thinkpad下完成!

第一步:在ubuntu官方网站,点击前往下载镜像:

https://releases.ubuntu.com/22.04/ubuntu-22.04-desktop-amd64.iso

第二步:配置虚拟机,如下:

第三步:安装VMware Tools

sudo apt upgrade

sudo apt install open-vm-tools-desktop -y

sudo reboot

第四步:安装开发工具,如下:

gpadmin@gpadmin-virtual-machine:~/桌面$ sudo apt install gcc g++

sudo apt install vim

sudo apt-get install build-essential

sudo apt install libreadline-dev

sudo apt install zlib1g-dev

sudo apt install bison flex -y

sudo apt install libzstd-dev

sudo apt-get install libssl-dev

sudo apt-get install autoconf

sudo apt-get install libapr1 libapr1-dev

sudo apt-get install libevent-dev

sudo apt-get install pkg-config

sudo apt-get install libxerces-c-dev

sudo apt-get install python-pip

sudo apt install python3-pip

pip install --upgrade pip

sudo perl -MCPAN -e install Spiffy

第五步:拉取代码 编译安装,如下:

sudo apt install git

git clone https://gitee.com/lucky912_admin/gpdb.git

然后执行相关配置,如下:

gpadmin@gpadmin-virtual-machine:~/桌面/gpdb$ ./README.Ubuntu.bash

gpadmin@gpadmin-virtual-machine:~/桌面/gpdb$ ./configure --prefix=/home/gpadmin/gpdbtest --enable-debug --with-libxml --with-perl --with-python --with-gssapi

## 调整编译优化级别

sed -i 's/-O3/-O0/g' src/Makefile.global

make -j 3 -s install

gpadmin@gpadmin-virtual-machine:~/gpdbtest/bin$ source ../greenplum_path.sh

gpadmin@gpadmin-virtual-machine:~/gpdbtest/bin$

gpadmin@gpadmin-virtual-machine:~/gpdbtest/bin$ pg_config --version

PostgreSQL 12beta2

gpadmin@gpadmin-virtual-machine:~/gpdbtest/bin$

接下来安装Python相关依赖,如下:

gpadmin@gpadmin-virtual-machine:~/桌面/gpdb$ pip install -r python-dependencies.txt

配置环境相关参数

接下来就是要建⽴集群了,在此之前还需要进⾏⼀些系统配置。这部分有⽂档可以查:https://gpdb.docs.pivotal.io/6-9/installguide/prepos.html#topic3 ⽐较重要的是:

/etc/sysctl.conf 设置内核参数/etc/security/limits.conf 设置资源限制

gpadmin@gpadmin-virtual-machine:~/桌面/gpdb$ sudo vim /etc/sysctl.conf

[sudo] gpadmin 的密码:

gpadmin@gpadmin-virtual-machine:~/桌面/gpdb$ sudo vim /etc/security/limits.confgpadmin@gpadmin-virtual-machine:~/桌面/gpdb$

gpadmin@gpadmin-virtual-machine:~/桌面/gpdb$ sudo sysctl -p

kernel.shmall = 197951838

kernel.shmmax = 810810728448

kernel.shmmni = 4096

vm.overcommit_memory = 2 # See Segment Host Memory

vm.overcommit_ratio = 95 # See Segment Host Memory

net.ipv4.ip_local_port_range = 10000 65535 # See Port Settings

kernel.sem = 500 2048000 200 4096

kernel.sysrq = 1

kernel.core_uses_pid = 1

kernel.msgmnb = 65536

kernel.msgmax = 65536

kernel.msgmni = 2048

net.ipv4.tcp_syncookies = 1

net.ipv4.conf.default.accept_source_route = 0

net.ipv4.tcp_max_syn_backlog = 4096

net.ipv4.conf.all.arp_filter = 1

net.core.netdev_max_backlog = 10000

net.core.rmem_max = 2097152

net.core.wmem_max = 2097152

vm.swappiness = 10

vm.zone_reclaim_mode = 0

vm.dirty_expire_centisecs = 500

vm.dirty_writeback_centisecs = 100

vm.dirty_background_ratio = 0 # See System Memory

vm.dirty_ratio = 0

vm.dirty_background_bytes = 1610612736

vm.dirty_bytes = 4294967296

gpadmin@gpadmin-virtual-machine:~/桌面/gpdb$

具体内容,如下:

sysctl.conf,修改后运⾏ sudo sysctl -p ⽣效

#kernel.sysrq=438

# kernel.shmall = _PHYS_PAGES / 2 # See Shared Memory Pages

kernel.shmall = 197951838

# kernel.shmmax = kernel.shmall * PAGE_SIZE

kernel.shmmax = 810810728448

kernel.shmmni = 4096

vm.overcommit_memory = 2 # See Segment Host Memory

vm.overcommit_ratio = 95 # See Segment Host Memory

net.ipv4.ip_local_port_range = 10000 65535 # See Port Settings

kernel.sem = 500 2048000 200 4096

kernel.sysrq = 1

kernel.core_uses_pid = 1

kernel.msgmnb = 65536

kernel.msgmax = 65536

kernel.msgmni = 2048

net.ipv4.tcp_syncookies = 1

net.ipv4.conf.default.accept_source_route = 0

net.ipv4.tcp_max_syn_backlog = 4096

net.ipv4.conf.all.arp_filter = 1

net.core.netdev_max_backlog = 10000

net.core.rmem_max = 2097152

net.core.wmem_max = 2097152

vm.swappiness = 10

vm.zone_reclaim_mode = 0

vm.dirty_expire_centisecs = 500

vm.dirty_writeback_centisecs = 100

vm.dirty_background_ratio = 0 # See System Memory

vm.dirty_ratio = 0

vm.dirty_background_bytes = 1610612736

vm.dirty_bytes = 4294967296

limits.conf,修改后登出再登⼊就可以⽣效:

* soft nofile 524288

* hard nofile 524288

* soft nproc 131072

* hard nproc 131072

注:若是遇到 SMBus Host controller not enabled,导致虚拟机无法开机的情况,这里参考这位老哥的博客:

SMBus Host controller not enabled,点击前往

SSH设置密码免密

在开始之前,我们使用Vscode进行连接,如下:

关于这个,可以参考我之前的博客:

VScode Remote-SSH远程编辑和调试Linux文件代码,点击前往

因为Greenplum是分布式数据库,命令⾏⼯具需要登陆到各个主机上操作,这⼀步需要设置SSH免密。如果编译并且加载了环境变量的话,可以⽤ gpssh-exkeys 这个⼯具,如下:

gpssh-exkeys -h host1 host2 ...

这个命令让 host1, host2,... 之间互信。平时开发只有⼀台主机的时候,可以只让登陆到本机免密。

gpssh-exkeys -h `hostname`

gpadmin@gpadmin0:~/桌面/gpdb$ cat /etc/hostname

gpadmin0

gpadmin@gpadmin0:~/桌面/gpdb$ sudo vim /etc/hosts

[sudo] gpadmin 的密码:

gpadmin@gpadmin0:~/桌面/gpdb$ cat vim /etc/hosts

cat: vim: 没有那个文件或目录

127.0.0.1 localhost

127.0.1.1 gpadmin-virtual-machine

127.0.0.1 gpadmin0

# The following lines are desirable for IPv6 capable hosts

::1 ip6-localhost ip6-loopback

fe00::0 ip6-localnet

ff00::0 ip6-mcastprefix

ff02::1 ip6-allnodes

ff02::2 ip6-allrouters

gpadmin@gpadmin0:~/桌面/gpdb$

下面我们就来初始化一个简单的实例(1master+3primary+3mirror),如下:

gpadmin@gpadmin0:~$ mkdir -p gpdata/master

gpadmin@gpadmin0:~$ mkdir -p gpdata/primary

gpadmin@gpadmin0:~$ mkdir -p gpdata/mirror

gpadmin@gpadmin0:~$

gpadmin@gpadmin0:~/gpdbtest/bin$ source ../greenplum_path.sh

gpadmin@gpadmin0:~/gpdbtest/bin$ vim cluster.conf

gpadmin@gpadmin0:~/gpdbtest/bin$

gpadmin@gpadmin0:~/gpdbtest/bin$ vim hostfile

gpadmin@gpadmin0:~/gpdbtest/bin$

gpadmin@gpadmin0:~/gpdbtest/bin$ gpssh-exkeys -h gpadmin0

[STEP 1 of 5] create local ID and authorize on local host

[STEP 2 of 5] keyscan all hosts and update known_hosts file

[STEP 3 of 5] retrieving credentials from remote hosts

[STEP 4 of 5] determine common authentication file content

[STEP 5 of 5] copy authentication files to all remote hosts

[INFO] completed successfully

gpadmin@gpadmin0:~/gpdbtest/bin$

初始化如下:

状态显示,如下:

相关文件内容,如下:

gpadmin@gpadmin0:~/gpdbtest/bin$ cat cluster.conf

ARRAY_NAME="Open Source Greenplum"

CLUSTER_NAME="gpdb"

SEG_PREFIX=gp

PORT_BASE=40000

DATA_DIRECTORY=(/home/gpadmin/gpdata/primary /home/gpadmin/gpdata/primary /home/gpadmin/gpdata/primary)

MASTER_DIRECTORY=/home/gpadmin/gpdata/master

MASTER_HOSTNAME=gpadmin0

MASTER_PORT=5432

IP_ALLOW=0.0.0.0/0

TRUSTED_SHELL=/usr/bin/ssh

CHECK_POINT_SEGMENTS=8

ENCODING=UNICODE

MIRROR_PORT_BASE=7000

MIRROR_DATA_DIRECTORY=(/home/gpadmin/gpdata/mirror /home/gpadmin/gpdata/mirror /home/gpadmin/gpdata/mirror)

GP_RESOURCE_MANAGER="group"

gpadmin@gpadmin0:~/gpdbtest/bin$

gpadmin@gpadmin0:~/gpdbtest/bin$ cat hostfile

gpadmin0

gpadmin@gpadmin0:~/gpdbtest/bin$

建立连接,如下:

gpadmin@gpadmin0:~/gpdbtest/bin$ ./psql -d postgres

psql (12beta2)

Type "help" for help.

postgres=# select version();

version

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

PostgreSQL 12beta2 (Greenplum Database 7.0.0-alpha.0+dev.15592.ge2b23ac456 build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, 64-bit compiled on Aug 1 2022 18:22:31

(1 row)

postgres=#

看一个例子,如下:

postgres=# create table test (id int primary key, curtime timestamp);

CREATE TABLE

postgres=# insert into test values (1, now());

INSERT 0 1

postgres=# insert into test values (2, now());

INSERT 0 1

postgres=# table test ;

id | curtime

----+----------------------------

2 | 2022-08-01 22:11:37.90595

1 | 2022-08-01 22:11:22.379281

(2 rows)

postgres=# insert into test values (2, now());

ERROR: duplicate key value violates unique constraint "test_pkey" (seg0 127.0.0.1:40000 pid=16444)

DETAIL: Key (id)=(2) already exists.

postgres=#

postgres=# insert into test values (2,now()) on conflict(id) do update set id = 2,curtime=now();

ERROR: modification of distribution columns in OnConflictUpdate is not supported

postgres=#

postgres=# insert into test values (2,now()) on conflict(id) do update set id = test.id,curtime=now();

ERROR: modification of distribution columns in OnConflictUpdate is not supported

postgres=#

postgres=#

我们修改Greenplum数据库内核源码,重编数据库 重启服务,如下:

gpadmin@gpadmin0:~/gpdbtest/bin$ ./psql -d postgres

psql (12beta2)

Type "help" for help.

postgres=# table test ;

id | curtime

----+----------------------------

2 | 2022-08-01 22:11:37.90595

1 | 2022-08-01 22:11:22.379281

(2 rows)

postgres=# select gp_segment_id, * from test;

gp_segment_id | id | curtime

---------------+----+----------------------------

1 | 1 | 2022-08-01 22:11:22.379281

0 | 2 | 2022-08-01 22:11:37.90595

(2 rows)

postgres=# insert into test values (2,now()) on conflict(id) do update set id = test.id,curtime=now();

INSERT 0 1

postgres=# select gp_segment_id, * from test;

gp_segment_id | id | curtime

---------------+----+----------------------------

1 | 1 | 2022-08-01 22:11:22.379281

0 | 2 | 2022-08-01 22:18:52.06303

(2 rows)

postgres=# insert into test values (2,now()) on conflict(id) do update set id = 2,curtime=now();

ERROR: modification of distribution columns in OnConflictUpdate is not supported

postgres=#

postgres=# insert into test values (2,now()) on conflict(id) do update set id = excluded.id,curtime=now();

INSERT 0 1

postgres=# select gp_segment_id, * from test;

gp_segment_id | id | curtime

---------------+----+----------------------------

1 | 1 | 2022-08-01 22:11:22.379281

0 | 2 | 2022-08-01 22:19:43.485647

(2 rows)

postgres=#

\q

gpadmin@gpadmin0:~/gpdbtest/bin$

推荐阅读

评论可见,请评论后查看内容,谢谢!!!评论后请刷新页面。