# README
#+OPTIONS: \n:t
- Background [[https://www.mockaroo.com/][mockaroo]] is one commerial tool for mock data generation. And I tried to use it even though it's one commerial one. But finally I gave it up because I need to big data import. Also found there is another open source [[https://generatedata.com/][generatedata]] which also generated data for you. The reason I did not decide to use it is that I need one API not GUI. My idea is that I generate the data and import to TiDB directly with tidb lightning.
- Install #+BEGIN_SRC OhMyTiUP$ wget https://github.com/luyomo/mockdata/releases/download/v0.0.6/mockdata_0.0.6_linux_amd64.tar.gz OhMyTiUP$ tar xvf mockdata_0.0.6_linux_amd64.tar.gz OhMyTiUP$ mv bin/* /usr/local/bin/ #+END_SRC
- Support function
- ID ID sequence, primary key.
- String template String from template
- uuid
- Data from string list
- Random int
- Random date
- Random string
- Json data The json is generated from the template in which defined function generate the data for json. The idea is from [[https://json-generator.com/][json generate]]. Will continue to add more and more functions.
-
DDL #+BEGIN_SRC CREATE TABLE
test01
(id
bigint(20) NOT NULL,payer
varchar(64) DEFAULT NULL,receiver
varchar(64) DEFAULT NULL,amount
bigint(20) DEFAULT NULL,payment_uuid
varchar(64) DEFAULT NULL,payment_type
varchar(32) DEFAULT NULL,payment_date
date DEFAULT NULL,user_id
varchar(32) DEFAULT NULL,access_content
json DEFAULT NULL, PRIMARY KEY (id
) /*T![clustered_index] CLUSTERED */ ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin; #+END_SRC -
Data config file #+BEGIN_SRC ROWS: 5000 COLUMNS:
IDX: 01 Name: id DataType: int Function: sequence
IDX: 02 Name: payer DataType: string Function: template Parameters: - key: text value: User{{.Label}}
IDX: 03 Name: receiver DataType: string Function: template Parameters: - key: name value: RandomUserID - key: format value: "user%010d" - key: min value: 1 - key: max value: 1000
IDX: 04 Name: amount DataType: int Function: random Max: 10000 Min: 1
- IDX: 05 Name: payment_uuid DataType: string Function: uuid
- IDX: 06
Name: payment_type
DataType: string
Function: list
Values:
- international
- national
- others
- IDX: 07
Name: payment_date
DataType: Date
Function: RandomDate
Parameters:
- key: min value: 2022-01-01
- key: max value: 2022-06-01
- IDX: 08
Name: user_id
DataType: string
Function: RandomString
Parameters:
- key: min value: 6
- key: max value: 8
- IDX: 09
Name: access_content
DataType: json
Function: Template
Parameters:
- key: content value: '{"access_type": "{{$BROWSER}}", "ip_address": "{{$IPADDR}}"}' #+END_SRC
-
Command | Parameter | Comment | |-----------+------------------------------------------------| | config | The data config file to generate the data | | output | The file to be outputed to | | rows | Number of rows to be generated for each thread | | threads | Number of threads |
[[./png/001.png]] [[./png/002.png]]
-
Example ** Run the mockdata command #+BEGIN_SRC $ mockdata --threads 1 --loop 1 --config /tmp/config.yaml --output /tmp/mockdata --file-name=test.test01 --rows 20 --host 10.0.154.155 --user=root --pd-ip=10.0.240.79 ----lightning-ver v8.0.0 #+END_SRC [[./png/003.png]] ** Check the result [[./png/004.png]]
-
Performance | Secnario | Data volume | Disk Size | Execution Time(s) | rows/s | Volumes/s | |--------------------------+-------------+-----------+-------------------+--------+-----------| | First test. Sinle thread | 5000000 | 223M | 129 | 38800 | 1.7MB | | parallel: 2 | 10000000 | 446M | 140 | 77600 | 3.4MB | | parallel: 10 | 50000000 | 2.3G | 240 | 208000 | 9.8MB | | parallel: 16 | 80000000 | 3.5G | 433 | 184757 | 8.2MB |
-
Reference ** Issues
- missing go.sum entry for module providing package #+BEGIN_SRC go mod tidy #+END_SRC
-
Performance test #+BEGIN_SRC OhMyTiUP$ ./bin/mockdata --threads 16 --loop 1 --config etc/data.config.yaml --output /tmp/mockdata --file-name=test.test01 --rows 20 --host=172.83.1.89 --user=root --pd-ip=172.83.1.241 --lightning-ver v8.0.0 #+END_SRC ** c5d.4xlarge | Number of rows | Execution time | Transaction | Threads | Size | |----------------+----------------+-------------+---------+------| | 160 millions | 29 | 100000 | 16 | 14G | | 160 millions | 23 | 100000 | 32 | 14G | | 160 millions | 21 | 200000 | 32 | 14G |
** c5a.8xlarge | Number of rows | Execution time | Transaction | Threads | Size | |----------------+----------------+-------------+---------+------| | 300 millions | 27 | 200000 | 64 | 26G | | 610 millions | 50 | 400000 | 64 | 53G |
- TODO
- Convert the data generation to distribution system to fasten the performance.
- Generate data to ttl directly for tikv-importer to improve the performance.
- Generate CSV file to S3
- Add the TUI from OhMyTiUP To mockdata
- event_month
#+BEING_SRC
time ./bin/mockdata --loop 100 --config etc/event_month.yaml --output /tmp/mockdata --file-name=test.event_month --rows 100000 --host=182.83.1.171 --user=root --pd-ip=182.83.1.118
real 144m8.529s
user 380m38.129s
sys 35m37.681s
MySQL [test]> select data_length/(102410241024) from information_schema.tables where table_name = 'event_month' \G data_length/(102410241024): 176.0461 1 row in set (0.008 sec)
MySQL [information_schema]> select * from table_storage_stats where table_schema = 'test' and table_name = 'event_month'; +--------------+-------------+----------+------------+--------------+--------------------+------------+------------+ | TABLE_SCHEMA | TABLE_NAME | TABLE_ID | PEER_COUNT | REGION_COUNT | EMPTY_REGION_COUNT | TABLE_SIZE | TABLE_KEYS | +--------------+-------------+----------+------------+--------------+--------------------+------------+------------+ | test | event_month | 126 | 3 | 2128 | 79 | 201236 | 160053659 | +--------------+-------------+----------+------------+--------------+--------------------+------------+------------+ 1 row in set (0.005 sec)
admin@ip-182-83-1-7:/tidb/tidb-data$ du -sh tikv-20160/
24G tikv-20160/
admin@ip-182-83-1-7:/tidb/tidb-data/tikv-20160$ du -sh *
0 LOCK
23G db
1.2M import
20K last_tikv.toml
23M raft-engine
0 raftdb.info
54M rocksdb.info
4.0K snap
1.1G space_placeholder_file
#+END_SRC
| Disk Size | Table Size | Compress ratio | |-------------+------------+----------------| | 23GB*3=69GB | 170GB | 1:8 |