关于mysql:实现一个简单的Database4译文

10次阅读

共计 10015 个字符,预计需要花费 26 分钟才能阅读完成。

前文回顾

实现一个简略的 Database1(译文)

实现一个简略的 Database2(译文)

实现一个简略的 Database3(译文)


译注:cstsck 在 github 保护了一个简略的、相似 SQLite 的数据库实现,通过这个简略的我的项目,能够很好的了解数据库是如何运行的。本文是第四篇,次要是应用 rspec 对目前实现的性能进行测试并解决测试呈现 BUG

译注:cstsck 在 github 保护了一个简略的、相似 sqlite 的数据库实现,通过这个简略的我的项目,能够很好的了解数据库是如何运行的。本文是第四篇,次要是应用 rspec 对目前实现的性能进行测试并解决测试呈现 BUG

Part 4 咱们的第一个测试(和 BUG)

咱们曾经取得插入数据到数据库并打印所有数据的能力。当初来测试一下目前 microseconds 已有的性能。<br/>
我要应用 rspec 来写我的测试,因为我对 rspec 很相熟,它的语法也相当易读。<br/>
_译注:rsepec 是一个基于 Ruby 的测试框架,语法非常简单,能够很不便的测试各种可执行程序,判断输入_<br/>
我定义一个短小的 help 来发送一个帮忙命令列表到数据库,而后对输入进行断言。

describe 'database' do
  def run_script(commands)
    raw_output = nil
    IO.popen("./db", "r+") do |pipe|
      commands.each do |command|
        pipe.puts command
      end

      pipe.close_write

      # Read entire output
      raw_output = pipe.gets(nil)
    end
    raw_output.split("\n")
  end

  it 'inserts and retrieves a row' do
    result = run_script(["insert 1 user1 [email protected]",
      "select",
      ".exit",
    ])
    expect(result).to match_array([
      "db > Executed.",
      "db > (1, user1, [email protected])",
      "Executed.",
      "db >",
    ])
  end
end

这个简略的测试是确认咱们的输出可能获取返回后果。并确保能通过测试:

bundle exec rspec
.

Finished in 0.00871 seconds (files took 0.09506 seconds to load)
1 example, 0 failures

当初测试插入更多行数据到数据库是可行的:

it 'prints error message when table is full' do
  script = (1..1401).map do |i|
    "insert #{i} user#{i} person#{i}@example.com"
  end
  script << ".exit"
  result = run_script(script)
  expect(result[-2]).to eq('db > Error: Table full.')
end

再次运行测试:

bundle exec rspec
..

Finished in 0.01553 seconds (files took 0.08156 seconds to load)
2 examples, 0 failures

妙啊,测试通过了!咱们的数据库当初可能 hold 住 1400 行数据,这是因为咱们设置 pages 最大数量是 100 页,每页能够寄存 14 行数据。

查看咱们目前写的的代码,我意识到咱们可能没有正确处理存储文本字段。很容易用上面的例子测试进去(插入边界长度的字符串):

it 'allows inserting strings that are the maximum length' do
  long_username = "a"*32
  long_email = "a"*255
  script = ["insert 1 #{long_username} #{long_email}",
    "select",
    ".exit",
  ]
  result = run_script(script)
  expect(result).to match_array([
    "db > Executed.",
    "db > (1, #{long_username}, #{long_email})",
    "Executed.",
    "db >",
  ])
end

测试失败了:

Failures:

  1) database allows inserting strings that are the maximum length
     Failure/Error: raw_output.split("\n")

     ArgumentError:
       invalid byte sequence in UTF-8
     # ./spec/main_spec.rb:14:in `split'# ./spec/main_spec.rb:14:in `run_script'
     # ./spec/main_spec.rb:48:in `block (2 levels) in <top (required)>'

如果是咱们本人人工来测试,当咱们打印行数据时,会看到有一些奇怪的字符(例子中,我把很长的字符串进行了缩写):

db > insert 1 aaaaa... aaaaa...
Executed.
db > select
(1, aaaaa...aaa\�, aaaaa...aaa\�)
Executed.
db >

产生了什么?如果看一下代码中咱们定义的 Row 构造,咱们的确为 username 字段调配了 32 个字节长度,为 email 字段调配 255 个字节长度。然而 C 语言的 strings 是以一个 null 字符来作为结尾的,这个字符咱们没有为它调配空间。解决办法就是多调配一个额定的字节(来寄存这个 null 字符):

const uint32_t COLUMN_EMAIL_SIZE = 255;
typedef struct {
  uint32_t id;
-  char username[COLUMN_USERNAME_SIZE];
-  char email[COLUMN_EMAIL_SIZE];
+  char username[COLUMN_USERNAME_SIZE + 1];
+  char email[COLUMN_EMAIL_SIZE + 1];
} Row;

这样的确解决了下面的问题(从新运行下面插入边界长度字符串的测试):

bundle exec rspec
...

Finished in 0.0188 seconds (files took 0.08516 seconds to load)
3 examples, 0 failures

咱们不容许插入的 username 或者 email 的长度超过固定的列的长度。这样的超出长度要求的 spec 测试看起来就像上面这样:

it 'prints error message if strings are too long' do
  long_username = "a"*33
  long_email = "a"*256
  script = ["insert 1 #{long_username} #{long_email}",
    "select",
    ".exit",
  ]
  result = run_script(script)
  expect(result).to match_array([
    "db > String is too long.",
    "db > Executed.",
    "db >",
  ])
end

我了可能反对下面这种成果,咱们须要降级咱们的解析器(parser)。揭示一下,咱们当初应用的是 scanf():

if (strncmp(input_buffer->buffer, "insert", 6) == 0) {
  statement->type = STATEMENT_INSERT;
  int args_assigned = sscanf(input_buffer->buffer, "insert %d %s %s", &(statement->row_to_insert.id),
      statement->row_to_insert.username, statement->row_to_insert.email);
  if (args_assigned < 3) {return PREPARE_SYNTAX_ERROR;}
  return PREPARE_SUCCESS;
}

然而 scanf()有一些毛病。如果读取的 string 大于正在读取它的缓存(buffer),就会引起缓存溢出(buffer overflow)并写入到意想不到的中央。所以咱们须要在拷贝 string 到 Row 构造前查看每一个 string 的长度。为了查看 string 长度,咱们须要用空格宰割输出。

我应用 strtok() 来做这些。如果你看到过程就会感觉它很容易了解:<br/>
译注: strtok: 字符串处理函数,char strtok (char str, const char * delimiters ); 合成字符串为一组字符串。str 为要合成的字符,delimiters 为分隔符字符(如果传入字符串,则传入的字符串中每个字符均为宰割符)

+PrepareResult prepare_insert(InputBuffer* input_buffer, Statement* statement) {
+  statement->type = STATEMENT_INSERT;
+
+  char* keyword = strtok(input_buffer->buffer, " ");
+  char* id_string = strtok(NULL, " ");
+  char* username = strtok(NULL, " ");
+  char* email = strtok(NULL, " ");
+
+  if (id_string == NULL || username == NULL || email == NULL) {
+    return PREPARE_SYNTAX_ERROR;
+  }
+
+  int id = atoi(id_string);
+  if (strlen(username) > COLUMN_USERNAME_SIZE) {
+    return PREPARE_STRING_TOO_LONG;
+  }
+  if (strlen(email) > COLUMN_EMAIL_SIZE) {
+    return PREPARE_STRING_TOO_LONG;
+  }
+
+  statement->row_to_insert.id = id;
+  strcpy(statement->row_to_insert.username, username);
+  strcpy(statement->row_to_insert.email, email);
+
+  return PREPARE_SUCCESS;
+}
+
 PrepareResult prepare_statement(InputBuffer* input_buffer,
                                 Statement* statement) {if (strncmp(input_buffer->buffer, "insert", 6) == 0) {+    return prepare_insert(input_buffer, statement);
-    statement->type = STATEMENT_INSERT;
-    int args_assigned = sscanf(-        input_buffer->buffer, "insert %d %s %s", &(statement->row_to_insert.id),
-        statement->row_to_insert.username, statement->row_to_insert.email);
-    if (args_assigned < 3) {
-      return PREPARE_SYNTAX_ERROR;
-    }
-    return PREPARE_SUCCESS;
   }

每当输出到一个分隔符时(在咱们的例子中就是空格),就在输出缓冲(input buffer)上间断调用 strtok(),把它分解成子字符串。它返回一个指向子字符串开始地位的指针。

咱们能够在每个文本值上调用 strlen(),看看它是否太长(strlen()函数,获取字符串的长度)。

咱们能够像解决其余错误码一样处理错误:

enum PrepareResult_t {
  PREPARE_SUCCESS,
+  PREPARE_STRING_TOO_LONG,
  PREPARE_SYNTAX_ERROR,
  PREPARE_UNRECOGNIZED_STATEMENT
};
switch (prepare_statement(input_buffer, &statement)) {case (PREPARE_SUCCESS):
    break;
+  case (PREPARE_STRING_TOO_LONG):
+    printf("String is too long.\n");
+    continue;
  case (PREPARE_SYNTAX_ERROR):
    printf("Syntax error. Could not parse statement.\n");
    continue;

这样就能通过测试了。

bundle exec rspec
....

Finished in 0.02284 seconds (files took 0.116 seconds to load)
4 examples, 0 failures

到了这里,咱们无妨再多解决一个谬误状况(id 值插入一个负值):

it 'prints an error message if id is negative' do
  script = ["insert -1 cstack [email protected]",
    "select",
    ".exit",
  ]
  result = run_script(script)
  expect(result).to match_array([
    "db > ID must be positive.",
    "db > Executed.",
    "db >",
  ])
end
enum PrepareResult_t {
  PREPARE_SUCCESS,
+  PREPARE_NEGATIVE_ID,
  PREPARE_STRING_TOO_LONG,
  PREPARE_SYNTAX_ERROR,
  PREPARE_UNRECOGNIZED_STATEMENT
@@ -148,9 +147,6 @@ PrepareResult prepare_insert(InputBuffer* input_buffer, Statement* statement) { }

  int id = atoi(id_string);
+  if (id < 0) {
+    return PREPARE_NEGATIVE_ID;
+  }
  if (strlen(username) > COLUMN_USERNAME_SIZE) {return PREPARE_STRING_TOO_LONG;}
@@ -230,9 +226,6 @@ int main(int argc, char* argv[]) {switch (prepare_statement(input_buffer, &statement)) {case (PREPARE_SUCCESS):
        break;
+      case (PREPARE_NEGATIVE_ID):
+        printf("ID must be positive.\n");
+        continue;
      case (PREPARE_STRING_TOO_LONG):
        printf("String is too long.\n");
        continue;

好了,测试做的差不多了。接下来是十分重要的性能:长久化!咱们要实现保留咱们的数据库到一个文件,再把它从文件中读取进去。(目前它还在内存当中)

当初它越来越牛了。

上面是和上一部分代码,批改地位的比照:

@@ -22,6 +22,8 @@

 enum PrepareResult_t {
   PREPARE_SUCCESS,
+  PREPARE_NEGATIVE_ID,
+  PREPARE_STRING_TOO_LONG,
   PREPARE_SYNTAX_ERROR,
   PREPARE_UNRECOGNIZED_STATEMENT
  };
@@ -34,8 +36,8 @@
 #define COLUMN_EMAIL_SIZE 255
 typedef struct {
   uint32_t id;
-  char username[COLUMN_USERNAME_SIZE];
-  char email[COLUMN_EMAIL_SIZE];
+  char username[COLUMN_USERNAME_SIZE + 1];
+  char email[COLUMN_EMAIL_SIZE + 1];
 } Row;

@@ -150,18 +152,40 @@ MetaCommandResult do_meta_command(InputBuffer* input_buffer, Table *table) {}}

-PrepareResult prepare_statement(InputBuffer* input_buffer,
-                                Statement* statement) {-  if (strncmp(input_buffer->buffer, "insert", 6) == 0) {+PrepareResult prepare_insert(InputBuffer* input_buffer, Statement* statement) {
   statement->type = STATEMENT_INSERT;
-  int args_assigned = sscanf(-     input_buffer->buffer, "insert %d %s %s", &(statement->row_to_insert.id),
-     statement->row_to_insert.username, statement->row_to_insert.email
-     );
-  if (args_assigned < 3) {
+
+  char* keyword = strtok(input_buffer->buffer, " ");
+  char* id_string = strtok(NULL, " ");
+  char* username = strtok(NULL, " ");
+  char* email = strtok(NULL, " ");
+
+  if (id_string == NULL || username == NULL || email == NULL) {return PREPARE_SYNTAX_ERROR;}
+
+  int id = atoi(id_string);
+  if (id < 0) {
+     return PREPARE_NEGATIVE_ID;
+  }
+  if (strlen(username) > COLUMN_USERNAME_SIZE) {
+     return PREPARE_STRING_TOO_LONG;
+  }
+  if (strlen(email) > COLUMN_EMAIL_SIZE) {
+     return PREPARE_STRING_TOO_LONG;
+  }
+
+  statement->row_to_insert.id = id;
+  strcpy(statement->row_to_insert.username, username);
+  strcpy(statement->row_to_insert.email, email);
+
   return PREPARE_SUCCESS;
+
+}
+PrepareResult prepare_statement(InputBuffer* input_buffer,
+                                Statement* statement) {+  if (strncmp(input_buffer->buffer, "insert", 6) == 0) {+      return prepare_insert(input_buffer, statement);
   }
   if (strcmp(input_buffer->buffer, "select") == 0) {
     statement->type = STATEMENT_SELECT;
@@ -223,6 +247,12 @@ int main(int argc, char* argv[]) {switch (prepare_statement(input_buffer, &statement)) {case (PREPARE_SUCCESS):
         break;
+      case (PREPARE_NEGATIVE_ID):
+    printf("ID must be positive.\n");
+    continue;
+      case (PREPARE_STRING_TOO_LONG):
+    printf("String is too long.\n");
+    continue;
       case (PREPARE_SYNTAX_ERROR):
     printf("Syntax error. Could not parse statement.\n");
     continue;

并且咱们增加了一些测试:

+describe 'database' do
+  def run_script(commands)
+    raw_output = nil
+    IO.popen("./db", "r+") do |pipe|
+      commands.each do |command|
+        pipe.puts command
+      end
+
+      pipe.close_write
+
+      # Read entire output
+      raw_output = pipe.gets(nil)
+    end
+    raw_output.split("\n")
+  end
+
+  it 'inserts and retrieves a row' do
+    result = run_script([+      "insert 1 user1 [email protected]",
+      "select",
+      ".exit",
+    ])
+    expect(result).to match_array([
+      "db > Executed.",
+      "db > (1, user1, [email protected])",
+      "Executed.",
+      "db >",
+    ])
+  end
+
+  it 'prints error message when table is full' do
+    script = (1..1401).map do |i|
+      "insert #{i} user#{i} person#{i}@example.com"
+    end
+    script << ".exit"
+    result = run_script(script)
+    expect(result[-2]).to eq('db > Error: Table full.')
+  end
+
+  it 'allows inserting strings that are the maximum length' do
+    long_username = "a"*32
+    long_email = "a"*255
+    script = [+      "insert 1 #{long_username} #{long_email}",
+      "select",
+      ".exit",
+    ]
+    result = run_script(script)
+    expect(result).to match_array([
+      "db > Executed.",
+      "db > (1, #{long_username}, #{long_email})",
+      "Executed.",
+      "db >",
+    ])
+  end
+
+  it 'prints error message if strings are too long' do
+    long_username = "a"*33
+    long_email = "a"*256
+    script = [+      "insert 1 #{long_username} #{long_email}",
+      "select",
+      ".exit",
+    ]
+    result = run_script(script)
+    expect(result).to match_array([
+      "db > String is too long.",
+      "db > Executed.",
+      "db >",
+    ])
+  end
+
+  it 'prints an error message if id is negative' do
+    script = [+      "insert -1 cstack [email protected]",
+      "select",
+      ".exit",
+    ]
+    result = run_script(script)
+    expect(result).to match_array([
+      "db > ID must be positive.",
+      "db > Executed.",
+      "db >",
+    ])
+  end
+end

Enjoy GreatSQL :)

## 对于 GreatSQL

GreatSQL 是由万里数据库保护的 MySQL 分支,专一于晋升 MGR 可靠性及性能,反对 InnoDB 并行查问个性,是实用于金融级利用的 MySQL 分支版本。

相干链接:GreatSQL 社区 Gitee GitHub Bilibili

GreatSQL 社区:

欢送来 GreatSQL 社区发帖发问
https://greatsql.cn/

技术交换群:

微信:扫码增加 GreatSQL 社区助手 微信好友,发送验证信息 加群

正文完
 0