[20260429]21c下设置pre_page_sga=true利用hugepages的疑问3.txt

[复制链接]
发表于 4 天前 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。

您需要 登录 才可以下载或查看,没有账号?立即注册

×
[20260429]21c下设置pre_page_sga=true利用hugepages的疑问3.txt

--//增补测试,近来的测试21c下设置pre_page_sga=true利用hugepages时,总发现存在小量的hugepages没有分配,跟11g存在差异,11g
--//是全部门配。

--//利用kimi提问,在徐徐开导以后,给出的解答如下:

真正缘故因由:12c+ 的 SGA 共享内存段分配方式变了

从 Oracle 11.2.0.3 / 12c 开始,Oracle 为了更机动地利用 HugePages,改变了 SGA 的共享内存分配战略:

11g:SGA 通常作为单个一连共享内存段分配,pre_page_sga=true 启动时一次性 touch 整个段,全部页都被映射,因此 HugePages_Rsvd 为 0。
12c+:SGA 被拆分为多个共享内存段(ipcs -m 可见多个段,常见为一个主段 + 一个较小的辅助段)。每个段独立在 HugePage(2MB)边界
上对齐,段与段之间会产生少量间隙页。这些间隙页已被内核预留(计入 Rsvd),但 pre_page_sga的 touch 操纵只覆盖实际数据页,不
会读写纯粹用于对齐边界的间隙页,于是残留为 HugePages_Rsvd。

你看到的 7 个页(14MB) 正是这种多段对齐开销——在 11g 的单段模式下不会产生,但在 12c+ 的多段模式下一定出现。

--//昨天通过测试如下:
$ grep -i hugepages /proc/meminfo
AnonHugePages:     40960 kB
HugePages_Total:     530
HugePages_Free:        7
HugePages_Rsvd:        7
HugePages_Surp:        0
Hugepagesize:       2048 kB
--//HugePages_Rsvd=7,另有7个hugepages没有touch,为什么?

$ cat /proc/$(pgrep pmon)/maps | grep "rw-s"
60000000-60a00000 rw-s 00000000 00:0c 0                                  /SYSV00000000 (deleted)
61000000-a2000000 rw-s 00000000 00:0c 32769                              /SYSV00000000 (deleted)
a2000000-a2800000 rw-s 00000000 00:0c 65538                              /SYSV00000000 (deleted)
a3000000-a3200000 rw-s 00000000 00:0c 98307                              /SYSVafa94c20 (deleted)
7f3764b20000-7f3764b21000 rw-s 00000000 08:11 18861347                   /u01/app/oracle/dbs/hc_book.dat

--//假如段与段之间会产生少量间隙页,看看间歇有多大?
--//看看第1行与第2行的共享内存段的间歇:
--//0x61000000-0x60a00000   = 0x600000 = 6291456
--//6291456/2/1024/1024 = 3

--//第2行与第3行的共享内存段不存在间隙.

--//第3行与第4行的共享内存段存在间隙.:
--//0xa3000000-0xa2800000 = 0x800000 = 8388608
--//8388608/2/1024/1024 = 4
--//3+4确实便是7,当时测试完成有种测试仅仅是偶合.

--//假如如许实际须要HugePages_Total= 530+7 = 537.
--//总之有太多的疑问,自己也想通过修改参数sga_target之类的参数,验证以上判定是否准确.
--//换一个方法修改内核参数kernel.shmmax 看看.

1.测试前预备:
# cat  /etc/sysctl.d/98-oracle.conf
fs.file-max = 6815744
kernel.sem = 250 32000 100 128
kernel.shmmni = 4096
kernel.shmall = 1073741824
#kernel.shmmax = 4398046511104
kernel.shmmax = 268435456
kernel.panic_on_oops = 1
net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048576
net.ipv4.conf.all.rp_filter = 2
net.ipv4.conf.default.rp_filter = 2
fs.aio-max-nr = 1048576
net.ipv4.ip_local_port_range = 9000 65500

#vm.nr_hugepages = 530
#vm.nr_overcommit_hugepages = 512
vm.nr_hugepages = 530
vm.nr_overcommit_hugepages = 50

--//分析:开始设置kernel.shmmax = 4398046511104,单位字节相称于4T,理论我的测试呆板不会有这么大的内存,设置相称于最大共享内
--//存段4T,如今修改为256*1024*1024 = 268435456,即256M.

2.测试:
--//起首使内核参数收效。
# sysctl -p  /etc/sysctl.d/98-oracle.conf
fs.file-max = 6815744
kernel.sem = 250 32000 100 128
kernel.shmmni = 4096
kernel.shmall = 1073741824
kernel.shmmax = 268435456
kernel.panic_on_oops = 1
net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048576
net.ipv4.conf.all.rp_filter = 2
net.ipv4.conf.default.rp_filter = 2
fs.aio-max-nr = 1048576
net.ipv4.ip_local_port_range = 9000 65500
vm.nr_hugepages = 530
vm.nr_overcommit_hugepages = 50

--//启动数据库
SYS@book> startup
ORACLE instance started.

Total System Global Area 1107294056 bytes
Fixed Size                  9684840 bytes
Variable Size             654311424 bytes
Database Buffers          436207616 bytes
Redo Buffers                7090176 bytes
Database mounted.
Database opened.

SYS@book> @ hidez ^pre_page_sga|^use_large_pages
NUM N_HEX CON_ID NAME            DESCRIPTION                                    DEFAULT_VALUE SESSION_VALUE SYSTEM_VALUE ISSES ISSYS_MOD
--- ----- ------ --------------- ---------------------------------------------- ------------- ------------- ------------ ----- ---------
180    B4      0 use_large_pages Use large pages if available (TRUE/FALSE/ONLY) FALSE         ONLY          ONLY         FALSE FALSE
193    C1      0 pre_page_sga    pre-page sga for process                       TRUE          TRUE          TRUE         FALSE FALSE

--//看看hugepages的利用环境:
# grep -i hugepage /proc/meminfo
AnonHugePages:     32768 kB
HugePages_Total:     530
HugePages_Free:       28
HugePages_Rsvd:       28
HugePages_Surp:        0
Hugepagesize:       2048 kB
--//这次与前面差别HugePages_Rsvd:28。

# ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 0          oracle     600        10485760   57
0x00000000 32769      oracle     600        268435456  57
0x00000000 65538      oracle     600        268435456  57
0x00000000 98307      oracle     600        268435456  57
0x00000000 131076     oracle     600        268435456  57
0x00000000 163845     oracle     600        16777216   57
0x00000000 196614     oracle     600        8388608    57
0xafa94c20 229383     oracle     600        2097152    57

--//10485760 /2/1024/1024 = 5
--//268435456/2/1024/1024 = 128
--//268435456/2/1024/1024 = 128
--//268435456/2/1024/1024 = 128
--//268435456/2/1024/1024 = 128
--//16777216 /2/1024/1024 = 8
--//8388608  /2/1024/1024 = 4
--//2097152  /2/1024/1024 = 1
--//Sum = 530

# ipcs -mu
------ Shared Memory Status --------
segments allocated 8
pages allocated 271360
pages resident  257024
pages swapped   0
Swap performance: 0 attempts     0 successes
--//分成8个共享内存段。

# cat /proc/$(pgrep pmon)/maps | grep rw-s | nl
    1  60000000-60a00000 rw-s 00000000 00:0c 0                 /SYSV00000000 (deleted)
    2  61000000-71000000 rw-s 00000000 00:0c 32769             /SYSV00000000 (deleted)
    3  71000000-81000000 rw-s 00000000 00:0c 65538             /SYSV00000000 (deleted)
    4  81000000-91000000 rw-s 00000000 00:0c 98307             /SYSV00000000 (deleted)
    5  91000000-a1000000 rw-s 00000000 00:0c 131076            /SYSV00000000 (deleted)
    6  a1000000-a2000000 rw-s 00000000 00:0c 163845            /SYSV00000000 (deleted)
    7  a2000000-a2800000 rw-s 00000000 00:0c 196614            /SYSV00000000 (deleted)
    8  a3000000-a3200000 rw-s 00000000 00:0c 229383            /SYSVafa94c20 (deleted)
    9  7f9f6222a000-7f9f6222b000 rw-s 00000000 08:11 18861347  /u01/app/oracle/dbs/hc_book.dat
--//细致看第2,3,4,5,6,7之间共享内存段使一连的不存在安定。
--//0x61000000-0x60a00000 = 6291456,6291456/2/1024/1024 = 3
--//0xa3000000-0xa2800000 = 8388608, 8388608/2/1024/1024 = 4
--//可以发现kimi给出的分析就错的离谱了,HugePages_Rsvd=28只能以为oracle 21c改变了touch内存的方法。

3.继续:
--//查了一些资料,发现另有1个隐含参数_touch_sga_pages_during_allocation(11g下没有该参数)。
SYS@book> @ hidez _touch_sga_pages_during_allocation
SYS@book> @ pr
==============================
NUM                           : 179
N_HEX                         :    B3
CON_ID                        : 0
NAME                          : _touch_sga_pages_during_allocation
DESCRIPTION                   : touch SGA pages during allocation
DEFAULT_VALUE                 : TRUE
SESSION_VALUE                 : FALSE
SYSTEM_VALUE                  : FALSE
ISSES_MODIFIABLE              : FALSE
ISSYS_MODIFIABLE              : FALSE
PL/SQL procedure successfully completed.

$ cat /u01/app/oracle/dbs/initbook.ora
SPFILE='/u01/app/oracle/dbs/spfilebook.ora'
_touch_sga_pages_during_allocation=true

SYS@book> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

SYS@book> startup pfile=/u01/app/oracle/dbs/initbook.ora
ORACLE instance started.
Total System Global Area 1107294056 bytes
Fixed Size                  9684840 bytes
Variable Size             654311424 bytes
Database Buffers          436207616 bytes
Redo Buffers                7090176 bytes
Database mounted.
Database opened.

# grep -i hugepage /proc/meminfo
AnonHugePages:    120832 kB
HugePages_Total:     530
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
--//这次全部门配。视乎该参数_touch_sga_pages_during_allocation才会touch全部页面表。

# ipcs -m --human
------ Shared Memory Segments --------
key        shmid      owner      perms      size       nattch     status
0x00000000 262144     oracle     600           10M     56
0x00000000 294913     oracle     600          256M     56
0x00000000 327682     oracle     600          256M     56
0x00000000 360451     oracle     600          256M     56
0x00000000 393220     oracle     600          256M     56
0x00000000 425989     oracle     600           16M     56
0x00000000 458758     oracle     600            8M     56
0xafa94c20 491527     oracle     600            2M     56

# cat /proc/$(pgrep pmon)/maps | grep rw-s | nl
     1  60000000-60a00000 rw-s 00000000 00:0c 262144             /SYSV00000000 (deleted)
     2  61000000-71000000 rw-s 00000000 00:0c 294913             /SYSV00000000 (deleted)
     3  71000000-81000000 rw-s 00000000 00:0c 327682             /SYSV00000000 (deleted)
     4  81000000-91000000 rw-s 00000000 00:0c 360451             /SYSV00000000 (deleted)
     5  91000000-a1000000 rw-s 00000000 00:0c 393220             /SYSV00000000 (deleted)
     6  a1000000-a2000000 rw-s 00000000 00:0c 425989             /SYSV00000000 (deleted)
     7  a2000000-a2800000 rw-s 00000000 00:0c 458758             /SYSV00000000 (deleted)
     8  a3000000-a3200000 rw-s 00000000 00:0c 491527             /SYSVafa94c20 (deleted)
     9  7f4ef6faf000-7f4ef6fb0000 rw-s 00000000 08:11 18861347   /u01/app/oracle/dbs/hc_book.dat

4.小结:
--//kimi,deepseek检索并不优劣常靠谱,依赖它查询一些通常的标题非常正确。复杂的标题不可。
--//设置pre_page_sga=true利用hugepages的疑问不再穷究,已经超出自己的本事范围.
--//顺着kimi给的相干链接: https://fritshoogland.wordpress.com/2016/05/27/oracle-sga-memory-allocation-on-startup/
--//内里提到设置sga_target=10T,启动非常痴钝,对方设置PRE_PAGE_SGA=false,而_touch_sga_pages_during_allocation=true。
--//而且还提到在oracle 12.1.0.2版本缺省为true。这里也提供线索,大概在oracle 12.1.0.2版本缺省PRE_PAGE_SGA=false(存疑),
--//_touch_sga_pages_during_allocation=true。而21c的版本有反了过来。

--//转抄此中一段内容https://fritshoogland.wordpress.com/2016/05/27/oracle-sga-memory-allocation-on-startup/

At this point the reason for having _TOUCH_SGA_PAGES_DURING_ALLOCATION should be clear. The question I had on this point
is: but how about PRE_PAGE_SGA? In essence, this parameter is supposed to more or less solve the same issue, having the
SGA pages being touched at startup to prevent paging for foreground sessions.
此时,设置_TOUCH_SGA_pages_DURING_ALLOCATION的来由应该很清晰了。我在此处的疑问是:PRE_PAGE_SGA呢?本质上,这个参数的作用
与之雷同,即在启动时触达 SGA 页面,以克制为前台会话举行分页。

BTW, if you read about PRE_PAGE_SGA in the online documentation, it tells a reason for using PRE_PAGE_SGA, which is not
true (page table entries are prebuilt for the SGA pages), and it indicates the paging (=page faults) are done at
startup, which also is not true. It also claims 'every process that starts must access every page in the SGA', again
this is not true.
趁便说一句,假如你查阅在线文档中关于PRE_PAGE_SGA的分析,它会给出利用PRE_PAGE_SGA的来由,但这个来由并不建立(页表项是为
SGA 页面预先构建的),同时指出分页(即页错误)是在启动时完成的,这同样不建立。文档还声称'每个启动的历程都必须访问 SGA 中的
每个页面',这一点同样不建立。

From what I can see, what happens when PRE_PAGE_SGA is set to true, is that a background process is started, that starts
touching all SGA pages AFTER the instance has started and is open for usage. The background process I witnessed is
'sa00'. When recording the backtraces of that process, I see:
据我观察,当PRE_PAGE_SGA设置为true时,体系会启动一个背景历程,该历程会在实例启动后开始扫描全部已打开的 SGA 页面。我观察
到的背景历程名为'sa00'。在纪录该历程的回溯日志日志时,我看到:
--//我的测试在21c实际上PRE_PAGE_SGA=false,也会启动背景历程sa00。

The kernel paging functions are exactly the same as we have seen several times now. It's clear the functions executed by
this process are specifically for the prepage functionality. The pre-paging as done on behalf of
_TOUCH_SGA_PAGES_DURING_ALLOCATION=TRUE is done as part of the SGA creation and allocation (as can be seen by the Oracle
function names). PRE_PAGE_SGA seems to be a 'workaround' if you don't want to spend the long time paging on startup, but
still want to page the memory as soon as possible after startup. Needless to say, this is not the same as
_TOUCH_SGA_PAGES_DURING_ALLOCATION=TRUE, PRE_PAGE_SGA paging is done serially by a single process after startup when the
database is open for usage. So normal foreground process that encounter non-paged memory, which means they use it before
the sa00 process pages it, still need to do the paging.
内核分页功能与我们之前多次讨论的内容完全划一。显然,该历程实验的功能专门用于预分页操纵。当启用
_TOUCH_SGA_PAGES_DURING_ALLOCATION=TRUE时,预分页操纵会作为 SGA 创建和分配过程的一部门完成(从Oracle函数名称即可看出)。
PRE_PAGE_SGA好像是为克制启动时耗时过长的分页操纵而计划的办理方案,但同时仍渴望在启动后尽快完成内存分页。须要分析的是,这
与TOUCH_SGA_PAGES_DURING_ALLOCATION=TRUE的环境差别——后者是在数据库开放利用后,由启动时的单个历程串行实验的。因此,碰到
未分页内存的通例前台历程(即在sa00历程完身分页前利用该内存的历程)仍需实验分页操纵。

Conclusion
结论

If you want to allocate a large SGA with Oracle 12.1.0.2 (but may apply to earlier versions too), the startup time could
be significant. The reason for that is the bequeathing session pages the memory on startup. This can be turned off by
setting the undocumented parameter _TOUCH_SGA_PAGES_DURING_ALLOCATION to FALSE. As a result, foreground (normal user)
sessions need to do the paging. You can set PRE_PAGE_SGA parameter to TRUE to do paging, however the paging is done by a
single process (sa00) that serially pages the memory after startup. Foreground processes that encounter non-paged
memory, which means they use it before the sa00 process could page it, need to page it theirselves.
若想在Oracle 12.1.0.2版本中分配大 SGA(该方法同样实用于早期版本),体系启动时间大概会显着增长。这是由于体系启动时会主动进
行内存分页。通过将未公开参数_TOUCH_SGA_pages_DURING_ALLOCATION设置为false,可关闭此功能。此时前台历程(平常用户操纵)须要
自行实验内存分页。若需启用分页功能,可将PRE_PAGE_SGA参数设为TRUE,但此时分页将由单一历程(sa00)在体系启动后次序实验。当前
台历程碰到未分页内存(即在sa00历程完身分页前利用该内存)时,必须自行实验分页操纵。
--//我的测试在21c实际上PRE_PAGE_SGA=false,也会启动背景历程sa00。

5.收尾:
--//修改/etc/sysctl.d/98-oracle.conf文件,过程略。


免责声明:如果侵犯了您的权益,请联系站长及时删除侵权内容,谢谢合作!qidao123.com:ToB企服之家,中国第一个企服评测及软件市场,开放入驻,技术点评得现金.
回复

使用道具 举报

登录后关闭弹窗

登录参与点评抽奖  加入IT实名职场社区
去登录
快速回复 返回顶部 返回列表