Categorygithub.com/ponyo877/dummy_data_generator
repositorypackage
0.0.0-20240421073819-cc17458a0ced
Repository: https://github.com/ponyo877/dummy_data_generator.git
Documentation: pkg.go.dev

# Packages

No description provided by the author

# README

Dummy Data Generator CLI

This CLI tool allows you to efficiently generate a large amount of dummy data in a database. It supports both PostgreSQL and MySQL and provides a flexible configuration file to specify which tables and columns to populate.

Installation

To install the CLI tool, run the following command:

go install github.com/ponyo877/dummy_data_generator

Features

  • Generate a substantial amount of dummy data in a database.
  • Supports both PostgreSQL and MySQL.
  • Customize data generation through a configuration file.
  • Track progress with a visual progress bar.

Configuration

FieldDescription
tablenameName of the table where the data will be generated.
recordcountTotal number of records to be generated.
bufferBuffer size for generating records (useful for optimizing performance).
columnsList of columns with their respective configurations.
columns[].nameName of the column.
columns[].typeData type of the column (e.g., number, varchar, timestamp).
columns[].ruleGeneration rule for the column.
columns[].rule.typeDummy rule type (e.g., unique, const, pattern, random)
columns[].rule.format[type: unique only] Dummy data format (e.g., UUID(varchar), ULID(varchar), NOW(timestamp))
columns[].rule.value[type: const only] Dummy data const value
columns[].rule.minstart of sequential value
columns[].rule.max[type: pattern only] end of sequential value
columns[].rule.min_time[type: random (timestamp) only] minimum value for random timestamp
columns[].rule.max_time[type: random (timestamp) only] maximum value for random timestamp
columns[].patterns[].value[type: pattern only] repeated value
columns[].patterns[].times[type: pattern only] value of how many times to repeat

Example Rules:

  • type: unique: Generates unique values. sequential number(default), current_timestamp(format: NOW), UUID and ULID is supported
  • type: const: Assigns a constant value.
  • type: pattern: Generates values based on specified patterns. If you specify [{value: A, times: 2}, {value: B, times: 1}], it will create repeated values like [A,A,B,A,A,B,...] and so on. And if you specify {Min: 1, Max: 5}, it will create repeated values like [1,2,3,4,5,1,2,3,...] and so on.
  • type: random: Generates random values between two values; min_time and max_time. Only timestamp data type is available as of now. If you specify {min_time: '2024-01-01 00:00:00', max_time: '2024-03-31 23:59:59'}, it will yeild random timestamps between them like '2024-02-01 01:23:45' but not '2023-12-31 23:59:59' or '2024-04-01 00:00:00'.
Example
tablename: sample_tbl
recordcount: 1000000
buffer: 1000
columns:
# The 'id' column is a string in ULID format, ensuring all values are unique.
- name: id
  type: varchar
  rule:
    type: unique
    format: ULID
# The 'sex' column will contain the strings "male", "female", and "NA" in a 3:2:1 ratio.
- name: sex
  type: varchar
  rule:
    type: pattern
    patterns:
    - value: male
      times: 3
    - value: female
      times: 2
    - value: NA
      times: 1
# The 'created_at' column will have the fixed value "2024-01-01 00:00:00".
- name: created_at
  type: timestamp
  rule:
    type: const
    value: '2024-01-01 00:00:00'

Sub Command

Sub CommandDescription
dummy_data_generator cntshow number of record
dummy_data_generator gengenerate dummy data

Option

OptionDescriptionDefault Value
-c, --configconfiguration file for dummy data. You can provide multiple configuration files using wildcards
(e.g., -c "cfg_*.yaml") or by comma-separating them (e.g., -c cfg_1.yaml,cfg_2.yaml).
config.yaml
-d, --databasename of the database to use.mydb
-u, --dbuserdatabase user name.root
-e, --enginedatabase engine to use. Supports postgres and mysql.postgres
-h, --hostdatabase server host or socket directory.127.0.0.1
-p, --passworddatabase password to use when connecting to the server.password
-P, --portdatabase server port.5432

Usage Examples

  • Example 1: Check current number of records. (MySQL)
$ dummy_data_generator cnt -e mysql -h 127.0.0.1 -u root -P 5432 -p password -c sample_1.yaml,sample_2.yaml
+--------+-------+
| TABLE  | COUNT |
+--------+-------+
| table1 |     0 |
| table2 |     0 |
+--------+-------+
  • Example 2: Generate dummy data to target table designated config file. (PostgreSQL, all default value without config)
$ dummy_data_generator gen -c "sample_*.yaml"
table1: 534000 / 1000000 in progress  [=====================>-------------------]  53 %
table2: 533000 / 1000000 in progress  [=====================>-------------------]  53 %
table3:    10000 / 10000 done!       [=========================================]