2016-11-14

[OpenTSDB技术]OpenTSDB简介与使用技巧

参考文章
OpenTSDB简介
- 数据库Schema
  它主要有两个表：tsdb-uid和tsdb，
  - tsdb-uid：描述指标（metrics）相关的元数据
  - tsdb：存储时间序列数据
- 主要概念
  - metrics：需要收集的一个指标
    - 监控项如cpu_io，可以将metrics写作
      1
      2
      * tag：用于区分每个指标中数据
      * 对于监控项```bmc.stat.cpu```，要区分不同机器，可以设置tag：```host=hostname
- 说明
  - metrics和tag的关系
    - metrics和tag之间并没有必然的从属关系
    - 但对于一条数据来说，应该含有一个metrics和N>1个tag，这样的数据才是有意义的
  - metrics与tag的存储结构
    - metrics和tag统一放在了tsdb-uid表中存储
    - 格式为：RowKey(自增ID，3字节数组）：name:metrics,name:tagk,name:tagv
    - 同时对它们之间的反向关联关系也作了展开存储：把表间关联关系展开，以JOIN的结果为RowKey存储数据
- 数据存储
  - 栗子A：
    - 2个metrics：proc.stat.cpu和proc.stat.mem，
    - 1条记录: proc.stat.cpu 1297574486 54.2 host=foo type=user
    - 数据表结构：
      - tsdb-uid：
        
        保存了metric、tagk、tagv的映射关系
        
        第一条记录：
        
        rowkey为\x00，含3个qualifier：id:metrics,id:tagk,id:tagv，其值分别是已经添加的所有指标、标签名和标签值的数量
        
        第一条记录是系统生成和维护的
        
        在OpenTSDB的存储中，对于每一个metric、tagk或者tagv都存在从0开始的计数器，每来一个新的metric、tagk或者tagv，对应的计数器就会加1
        
        这里有两个metrics：cpu、mem，两个key：host、type，两个value：foo、user，所以 rowkey为\x00的三个数据的value都是2
        
        其他记录：
        
        每一个metric、tagk、tagv…键值对组合（{metrics:proc.stat.mem,tagk:type,tagv:user}），当新增的组合创建的时候，原来没有的metric或tag会被分配一个唯一标识叫做UID，这个UID会根据现有表的（一个条件）决定是复用还是新增
        
        不同的键值对组合在一起可以创建一个序列的UID，即TSUID
        
        当data point写到TSD时，UID是自动分配的。你也可以手动分配UID，前提是auto metric被设置为true
      - tsdb：
        
        Rowkey（TSUID）：
        
        指标UID（三字节整数）
        
        + 数据生成时间（取整点时间）
        
        + 标签1-Key的UID（三字节整数）
        
        + 标签1-Vlaue的UID（三字节整数）
        
        +…
        
        + 标签N-Key的UID
        
        + 标签N-Vlaue的UID
        
        时间处理：
        
        数据时间戳：
        
        2011-02-13 13:21:26 = 1297574486
        
        Rowkey 包含信息：
        
        MWeP = 01001101 01010111 01100101 01010000 = 1297573200 = 2011-02-13 13:00:00 (截取整点小时位)
        
        CF:Q 包含信息：
        
        PK = 01010000 01101011 = 1286 (从整点小时到记录时间的秒偏差,1286秒正是21分钟26秒)
        
        结合Rowkey与CF:Q：
        
        1297573200+1286=1297574486
        
        一行数据：
        为了方便后期更进一步的节省空间。OpenTSDB将一个小时的数据，保存在一行里面，即一小时，一个统计项相同标签组合的数据，只有一个Rowkey，每秒钟的数据都会存为一列，大大提高查询的速度：
        
        Rowkey
        
        统计项 + 整点时间段 + 标签组合
        
        Column
        
        时刻
        
        TS
        
        同一时刻统计数据的版本
        
        Value
        
        统计值
      - tsdb-meta：元数据
        用来存储时间序列索引和元数据的表。这也是一个可选特性，默认是不开启的，可以通过配置文件来启用该特性，
      - tsdb-tree：树形表
        以树状层次关系来表示metric的结构，只有在配置文件开启该特性后，才会使用此表
- 存取数据
  - 写入数据
```
$ curl -X POST -H “Content-Type: application/json” http://localhost:4242/api/put -d @test.json
$ vim test.json
[
    {
        "metric": "mysql.innodb.row_lock_time",
        "timestamp": 1435716527,
        "value": 1234,
        "tags": {
           "host": "web01",
           "dc": "beijing"
        }
    },
    {
        "metric": "mysql.innodb.row_lock_time",
        "timestamp": 1435716529,
        "value": 2345,
        "tags": {
           "host": "web01",
           "dc": "beijing"
        }
    },
    {
        "metric": "mysql.innodb.row_lock_time",
        "timestamp": 1435716627,
        "value": 3456,
        "tags": {
           "host": "web02",
           "dc": "beijing"
        }
    },
    {
        "metric": "mysql.innodb.row_lock_time",
        "timestamp": 1435716727,
        "value": 6789,
        "tags": {
           "host": "web01",
           "dc": "tianjin"
        }
    }
]
```
  - 查询数据
    查询数据可以使用query接口，它既可以使用get的query string方式，也可以使用post方式以JSON格式指定查询条件，这里使用post接口，对刚才保存的数据进行说明
```
$ curl -s -X POST -H "Content-Type: application/json" http://localhost:4242/api/query -d @search.json
$ vim search.json
{
    "start": 1435716527,
    "queries": [
        {
            "metric": "mysql.innodb.row_lock_time",
            "aggregator": "avg",
            "filiters": {
                "host": "*",
                "dc": "beijing"
            }
        }
    ]
}
```
- 使用技巧
  - 针对Hot Spot的应对策略
    - rowkey开始位置挑选了自身的一个理想的业务字段“metrics”来作为打破Hot Spot的“哈希”字段
  - 三字节存储带来的问题
    metric也好，tagk或者tagv也好，uid只有3个字节，这是 OpenTSDB 的默认配置，三个字节，应该能表示1600多万的不同数据，这对metric名或者tagk来说足够长了，对tagv来说就不一定了，比如tagv是ip地址的话，或者电话号码，那么这个字段就不够长了，这时可以通过修改源代码来重新编译 OpenTSDB 就可以了，同时要注意的是，重编以后，老数据就不能直接使用了，需要导出后重新导入
  - 采集程序：
    OpenTSDB已经自带了收集监控数据的一些脚本。主要由2个部分组成，TCollector和一些具体的collectors。
    - TCollector是1个客户端进程，它主要是收集具体的collectors收集到的监控数据，然后负责将数据推送到TSDB；
    - TCollector会处理和TSD的连接和协议处理，具体的collectors负责收集数据即可
    - 目前OpenTSDB提供了一些已实现的collectors
    - 在一个服务器上部署1个TCollector之后，它会把这些collectors启动起来，collectors收集到监控数据之后，输出到stdout，TCollector接收后推送到OpenTSDB

2016-04-16

vim+plugins成为python ide

1. 安装vim和配置.vimrc

debian默认不带vim，因此运行以下命令安装vim
1
$ apt-get install vim

下载vim产检管理工具vundle并添加管理目录

1 2	$ git clone https://github.com/gmarik/Vundle.vim.git ~/.vim/bundle/Vundle.vim $ mkdir -p ~/vundle/plugin_installed

vim配置文件~/.vimrc并保存：

" enable what?
set nocompatible

" filetype off?
filetype off

" 4 vundle
" set runtime path to include Vundle
set rtp+=~/.vim/bundle/Vundle.vim

" pass a path where plugin should be install
call vundle#begin('~/vundle/plugin_installed')

" let Vundle manage Vundle
Plugin 'gmarik/Vundle.vim'
" 4 folding the indent
Plugin 'tmhedberg/SimpylFold'
" 4 indent by pep8
Plugin 'vim-scripts/indentpython.vim'
" 4 the syntax checking when saving
Plugin 'scrooloose/syntastic'
" 4 pep8 style checking
Plugin 'nvie/vim-flake8'
" 4 background color
Plugin 'altercation/vim-colors-solarized'
" 4 file tree init input :NERDTree in vim
Plugin 'scrooloose/nerdtree'
" 4 search file
Plugin 'kien/ctrlp.vim'
" Add all the plugins here

" All of the plugins must add before this line
call vundle#end()

" filetype again?
filetype plugin indent on

" end setting 4 vundle

" 4 split screen
" enable region of split
set splitbelow
set splitright

" quick key
" with key ctrl+j switch 2 left screen
nnoremap <C-J> <C-W><C-J>
nnoremap <C-K> <C-W><C-K>
nnoremap <C-L> <C-W><C-L>
nnoremap <C-H> <C-W><C-H>
nnoremap <space> za

" end setting 4 split screen

" 4 folding
set foldmethod=indent
set foldlevel=99
" enable the comment of folding part
let g:SimpylFold_docstring_preview=1
" end setting 4 folding

" show line number
set nu

" show visual line under the cursor's line
set cursorline

" show the match part of the pair for [] {} and ()
set showmatch

" 4 python
" enable all python syntax highlineting features only 4 .py file
au BufNewFile,BufRead *.py
" set tab 2 have 4 spaces
            \ set ts=4
            \ set softtabstop=4
" shift lines by 4 spaces
            \ set shiftwidth=4
            \ set textwidth=79
" set auto indent
            \ set autoindent
" expand tabs into spaces
            \ set expandtab
            \ set fileformat=unix
        \set encoding=utf-8
let python_hightlight_all = 1
syntax on

if has('gui_runing')
    set background=dark
    colorscheme solarized
endif
" to change bgc by F5
call togglebg#map("<F5>")

保存退出，重新打开并在vim下运行命令:PluginInstall完成插件安装

参考

如果想用代码不全，还可以增加YouCompleteMe插件
参考文献来自于这个和那个网站

2016-03-01

github使用hexo搭建个人博客

1. 环境准备

1.1. 安装node.js

官网下载最新node.js版本
win按步骤安装即可，linux则解压后配置.bashrc，命令行测试是否安装成功：
1
2
$ node -v
$ npm -v

1.2. github准备

创建新的Repository，Repository name的格式必须如：your_user_name.github.io，其中your_user_name为你的github账户用户名

2. hexo搭建博客

2.1. 安装hexo

打开terminal，输入命令：
1
$ npm install -g hexo
创建一个文件夹如：
1
$ mkdir ~/blog
在新创建的文件夹中输入init命令，Hexo随后会自动在目标文件夹建立网站所需要的文件。然后按照提示，运行 npm install：
1
2
3
4
5
$ cd ~/blog
$ hexo init
INFO Copying data
INFO You are almost done! Dont forget to run `npm install` before you start b logging with Hexo!
$ npm install

2.2. 本地启动

安装hexo server
1
$ npm install hexo-server

启动hexo server，完成本地部署

$ cd ~/blog
$ hexo server
INFO  Start processing
INFO  Hexo is running at http://localhost:4000/. Press Ctrl+C to stop.

创建新博文

1
2
3

$ cd ~/blog
$ hexo new "My First Post"
INFO  File created at ~blog/source/_posts/My-First-Post.md

注意：
在hexo new “My First Post” 时，如果没有停止运行hexo server（按Ctrl+C将hexo server停掉），则博文会被创建两次，所以在hexo new文章时，需要stop server。

编辑博文
hexo new “My First Post”会在~blog/source/_posts目录下生成一个markdown文件：My-First-Post.md我们可以使用一个支持markdown语法的编辑器（比如 Sublime Text 3与插件OmniMarkupPreviewer）来编辑该文件。
生成博文静态页面
1
2
$ cd ~/blog
$ hexo generate

该命令执行完后，会在 ~/blog/public/目录下生成一系列html，css等文件。

2.3. 发布到github

部署准备
部署到Github前需要配置_config.yml文件，首先找到下面的内容，并修改：
1
2
3
4
# Deployment
## Docs: http://hexo.io/docs/deployment.html
deploy:
type:

# Deployment
## Docs: https://hexo.io/docs/deployment.html
deploy:
        type: git
        repo: https://github.com/your_user_name/your_user_name.github.io.git 
        branch: master
        message: "blog deploy"

注意Repository格式：
SSH：git@github.com:your_usr_name/your_usr_name.github.io.git 这时候需要添加ssh key到Settings->Deploy keys->Add deploy key处（详情参考)
HTTPS：https://github.com/your_usr_name/your_usr_name.github.io.git 则每次需要填写github用户名和密码

部署博客（每次部署都需要执行），部署成功后访问博客首页

$ hexo clean
INFO  Deleted database.
INFO  Deleted public folder.
$ hexo generate
...
INFO  30 files generated in 3.8 s
$ hexo deploy

2.4. 其他说明

本文参考自Hexo搭建Github静态博客

2016-03-01

Hello World

Welcome to Hexo! This is your very first post. Check documentation for more info. If you get any problems when using Hexo, you can find the answer in troubleshooting or you can ask me on GitHub.

Quick Start

Create a new post

1	$ hexo new "My New Post"

More info: Writing

Run server

1	$ hexo server

More info: Server

Generate static files

1	$ hexo generate

More info: Generating

Deploy to remote sites

1	$ hexo deploy

More info: Deployment

程序员的部落格

技术、理想与火