Mysql DBA I Revision

These are my notes from revising for the Mysql DBA 1 exam for version 5. These notes may not be complete are accurate, If you are also revising for this exam, please buy the excellent book.

Mysql Architecture

From a high level perspective, mysql uses a Client/Server model. The Server process reads/writes data on disk, while the client process sends requests to the server and receives result sets back. There is also a third category of mysql binary ‘utility programs’, these manipulate data files directly, bypassing the server process (eg myisamchk).

Within the server process (mysqld) there are two main tiers, upper (SQL processing, query optimisation) and lower (Storage Engine). The storage engine layer is pluggable, so various storage engines can be used.

The SQL layer provides a consistent interface to heterogeneous storage engines, however there are some leaks in the abstraction. SQL statements such as create table and alter table have engine clauses that can specify the engine type and give engine specific parameters. Other SQL statements that vary by engine:

  • Transactional statements (eg commit, rollback) only have an effect it the underlying storage is transactional
  • Available index types vary by storage engine, eg full-text and spatial indexes are only available when using myisam.

Note that the mysql client would be a Native C API connector in this diagram.

Important Charcteristics of MySQL

  • Supported on multiple operating systems, runs on many verisons of:
    • Windows
    • Unix
    • Linux
  • Works in distributed environments (eg Client/Server)
  • Cross platform interoperability (eg Windows Client, Linux Server)

Client Server Communication

The client and server can communicate via four different protocols:

ProtocolLocal/RemoteOSNotes
TCP/IPLocal or RemoteAll (unless started with --skip-networking)
Unix SocketLocalUnix likeFaster than TCP/IP
Named PipeLocalWindows (-nt servers started with --enable-named-pipe)Usually Slower than TCP/IP
Shared MemoryLocalWindows (start with --shared-memory)

Note: Windows ODBC connections are likely to be slower than standard C API connections, wherewas JDBC connections should be roughly the same speed.

Disk Space Usage in Mysql

  • Mysql Data directory (Default: /var/lib/mysql)
    • ibdata – innodb tablespace files, when file_per_table is not in use
    • ib_logfile* – innodb log files – an intent log, used to store statements until they are committed to the tablespace files.
    • Database directories: Each database has a directory where it’s data is stored, this is a subdirectory of the mysql datadir. Each database directory may contain:
      • .FRM – Format/Schema for the Table
      • .MYI – Indexes for myisam tables
      • .MYD – Table data for myisam tables
      • .ibd – Table data for innodb tables when innodb_file_per_table is set. Note that if file per table is not in use, then the data will be stored in the ibdata files in the parent directory (mysql data dir). This is important if you are backing up databases by copying each database directory.
      • .opt – database configuration information, including default character set
      • .tr* – trigger data
    • mysql-slow.log – log of slow queries
    • mysql_bin.000* – binary logs for replication and point in time restore.
    • *-relay-bin.* – relay logs retrieved from the master, pending execution on this slave.
  • System Log Directory (Default /var/log)
    • mysqld.log – daemon log.
  • System status directory (eg /var/run)
    • PID file
    • Socket file
  • System configuration directory
    • my.cnf

Role of logs in chained replication

Memory Usage

  • Connection handler threads – each of these requires memory for it’s heap
  • Buffers
    • Grant table buffers – all priviledges are held in memory to speed up access checks.
    • Key buffer – holds indexes for myisam tables (mysql relies on OS disk cache for caching myisam table data)
    • Table cache – Holds file descriptors for open tables, so that tables don’t have to repetitively closed and reopened.
    • Query Cache – stores the results of queries in memory, so that an identical query can be served from memory
    • Host Cache – application level DNS cache
    • Innodb Logs – Transaction Statements are recorded in here until the transaction is committed, then they are flushed to innodb logs on disk, before being written to the tablespace file.
  • Memory storage engine – Stores all table data in memory
  • Temporary tables – used for sorting, and joins.

Starting, Stopping, and Configuring MySQL

Distributions of mysql:

  • Windows
    • Essential – installer package
    • Complete – installer package∑
    • No-install – files for manual installation
  • Linux/Unix like
    • Distribution packages – eg RPM/DEB
    • Tar files – for manual installation
  • Source code – roll your own.
Why not to compile your own:
  • The binary Mysql distribution may use non standard versions of system libraries (eg allowing higher number of connections).
  • The binary distribution may have been compiled with a commercial compiler that is more efficient than publicly available.
Why you might want to compile your own
  • No binary distribution available for your OS
  • Wanting to disable/add/tweak some feature before compiling.
  • Wanting to run the very latest development code

Mysql as an operating system service

  • Windows
    • Install: mysld –install service_name [–defaults=file=…]
    • Remove: mysqld –remove service_name
    • Start: net start service_name
    • Stop: net stop service_name
  • Linux / RHEL
    • Install: chkconfig mysqld on
    • Remove: chkconfig mysqld off
    • Start: service mysqld strt
    • Stop: service mysqld stop / mysqladmin shutdown.
Options for starting mysql on Unix:
  • Execute mysqld – usually only used for debugging
  • mysqld_safe – shell script that sets up the mysql error log, then launches mysqld. Will relaunch mysqld if it terminates
  • msql.server / init script (as above)
  • msqld_multi – perl script for starting/stopping multiple instances of mysql on a single server, stores all the config in a single file, with sections for each instance

Importing TZ data

Windows: Not necessary, populated as part of the install procedure.

Linux:

mysql_tzinfo_to_sql /usr/share/zoneinfo |mysql -u root mysql

Client Programs for DBA Work

  • Clients (these connect to the server)
    • mysql client – cli interface to mysql
    • mysqlimport (like ‘load data infile’ import tabulated data)
    • mysqldump – Generate sql statements for replicating current state of database. (can produce tabulated data with –tab)
    • mysqladmin – perform various admin activities – create databases, flush logs, kill connections, set passwords etc. Can shutdown the server using a non-sql feature of the client/server protocol.
  • Direct manipulation utilities – these manipulate data files directly
    • myisamchk (checks/repairs/optimises myisam tables)
    • myisampack (compress myisam tables, makes them readonly)
  • Graphical Utilities
    • Mysql Administrator

Character Set Support

When utf-8 is used, fixed length char columns must assume that all characters require 3 bytes. So char(10) will require 30 bytes when utf-8 is used. varchar(10) would only require 10 bytes if all the characters were latin – may be up to 30 if accents/symbols etc are used.

Locking

Statements use implicit locking, eg a select will cause the server to acquire a read/shared lock on all the source tables. An insert will cause the server to require a write/exclusive lock. Locks can be managed explicitly using ‘lock tables’ and ‘unlock tables’. Row level locks can be acquired explicitly using ‘select … for update;’ for example:

start transaction;
select * from db.table where foo='bah' for update;
... do something else with locked data..
commit;

Advisory locks aren’t explicitly linked to any data, they are just strings that can be acquired and released. For advisory locks to be effective, all clients my acquire the relevant locks before modifying data – a rogue client could ignore locking. Example of advisory locking, the second argument to get_lock is a timeout in seconds, for which the thread will wait for the lock.

select get_lock('test_lock','2');
select release_lock('test_lock');

Other lock advisory locking functions:

  • is_free_lock – determine if a lock is currently available
  • is_used_lock – get the connection id of the client who holds the lock, or null if it is available.

Storage Engines

  • MyISAM – default,
    • full text searching
    • spatial data
    • can prioritise queries using ‘select HIGH_PRIORITY * from …’ or LOW_PRIORITY, DELAYED
    • Row formats:
      • Fixed Format: All rows are the same number of bits, can be looked up using a fixed offset. Take more space than dynamic rows.
      • Dynamic format: Each row occupies as much space as it requires. Prone to fragmentation
      • Compressed: Compressed, read-only data
  • InnoDB
    • transactional -ACID compliant (atomicity, consistency, isolation, durability)
    • multi versioning, row level locking
      • Reads can happen simultaneously
      • Writes can happen simultaneously as long as they affect different rows
      • If a transaction is modifying a row, another transaction cannot modify it until the first commits (unless the isolation level is read uncommitted).
    • referential integrity eg foreign keys
    • Data and indexes are stored in table space files. So one tablespace file contains data from multiple databases/tables. It is possible to have a tablespace created per table using the innodb_file_per_table option.
    • writes are written to innodb log files before the changes are committed to the tablespace files. After a crash, these log files are used for recovery.
  • Memory – transient, in-memory tables
  • Merge – present multiple myisam datafiles as a single table -useful for compressing archive data while still allowing new data to be inserted.
  • Federated – data stored in another mysql instance
  • Blackhole – data written to /dev/null
  • Clustered – data held in memory on cluster nodes

Table Maintenance

The INFORMATION_SCHEMA Database

select table_name, engine from information_schema.tables

Data Backup and Recover Methods