Useful MySQL Queries for a Linguist Data Manager

Details: Category: Generalia; Published: 15 May 2021

Allow MYSQLDUMP Permissions to a user on given databases:
GRANT SELECT, LOCK TABLES, SHOW VIEW ON crwork.* TO 'crworkmysql'@'%';

Then the userwill be able to dump the database using the following command:
mysqldump --no-tablespaces -u MySQLusername -p dbname > dbname.sql

Add some characters in the beginning of a cell. For example, in the following we are adding "xxyy_" at the start of the row of columnName:

update dbName.tableName set columnName = CONCAT('xxyy', ctid) where id = 5;

DELETE first few characters from a cell. We use "substr" function within MySQL to do this. For example, the following query deletes first 3 characters from the columnName:

update dbName.tableName set columnName=substr(columnName,4);

Concatenate / combine multiple cell content with a separator and group them by another column .

SELECT distinct(groupByColumnName), GROUP_CONCAT(contentColumnToCombine SEPARATOR '; ') FROM `dbname`.`tableName` group by groupByColumnName;

Find fields/words with regex having one character in between, for example all words containing .aXe.
SELECT * FROM database.table where headword REGEXP '.*a.e.*'; and pron1 like '%end%' and charlen < 6;

Best method to backup and restore all DBs in MySQL. This ensures that the dbs are separate and does not have bulk issues.
Step 1. get a list of all the databases
Step 2: Create a batch file to dump all the DBs, one by one, using the following command:
mysqldump -u user -pPassword dbname > dbname.sql
Step 3: Create a batch file to restore all the DBs, one by one, using the following command:
mysql -u user -pPassword dbname < dbname.sql

Update a column with the character count of another column within the same table.

update dbName.tableName a set a.columnToUpdate=char_length(a.columnToCountCharacterOf) where a.columnToCountCharacterOf=a.columnToCountCharacterOf;

Load tab delimited data into a table:

Remove tabs from a given mysql field:

update dbName.TableName set ColumnName = replace(ColumnName,'\t','');

Enable Local Data load after logging into the MySQL prompt: SET GLOBAL local_infile=ON;

LOAD DATA LOCAL INFILE 'C:/director/importdata.txt' INTO TABLE dbName.tableName FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n';

select distinct(lang), (sum(textlength)) FROM dbName.tableName group by lang;

Count number of words in a field:
SELECT LENGTH(langtext) - LENGTH(REPLACE(langtext, ' ', ''))+1, id FROM dbName.tableName INTO OUTFILE "D:\\outdirector\\length.txt" limit 10;

Find Language wise segments:
SELECT langColumn, count(langColumn) FROM dbName.tableName group by lang;

FIND LANGUAGE WISE SUM OF DURATION:
select distinct(lang), sec_to_time(sum(time_to_sec(`duration`)) + sum(microsecond(`duration`))/1000000) FROM dbName.tableName group by lang;

Clone a table:
CREATE TABLE IF NOT EXISTS offices_bk SELECT * FROM offices;

Sort by character length of a column:
select *, char_length(columnFieldCharLenghth) len FROM database.table where freq > 1 order by len DESC;

Sort time/duration in Ascending or Descending Order:
SELECT * FROM dbName.tableName ORDER BY duration ASC/DESC;

Calculate SUM of total time/duration (with milliseconds):
select distinct(lang), sec_to_time(sum(time_to_sec(`duration`)) + sum(microsecond(`duration`))/1000000) FROM dbName.tableName where remarks <> 'XXX' group by lang;

Calculate SUM of total time/duration (without milliseconds):
SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(`duration`))) AS totaltime FROM dbName.tableName where lang = 'XX';

Calculate Average of Time/Duration:
SELECT SEC_TO_TIME(AVG(TIME_TO_SEC(`duration`))) FROM dbName.tableName where texttype = 'Sentence-S';

Delete/Remove rows:
DELETE FROM dbname.tablename where column= 'xx';

Drop/Delete/Remove a column in a table:
ALTER TABLE dbname.tablename DROP COLUMN `ColumnName`;

Copy content of one column1 to column2:
UPDATE table SET column2 = column1;

Find fields that begin with 'XX':
SELECT ... WHERE stuff LIKE 'XX%';

Find multiple values:
select * from iceberg.swords where id in (40, 145);

Find substring case sensitive:
SELECT * FROM iceberg.swords where transcription like BINARY '%e N . v%';

create table WorkTypes (id int(50) not null auto_increment primary key,HospitalName varchar(255),DocsAvailable varchar(255),DB_Worktype varchar(255),WTName_L1 varchar(255),WTName_L2 varchar(255),PT_Type varchar(255),comments varchar(255),WTID_ETS varchar(255),WT_Name_ETS varchar(255),TemplateFName varchar(255));

LOAD DATA LOCAL INFILE 'C:/director/importData.txt' INTO TABLE dbName.tableName FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n';

LOAD DATA LOCAL INFILE 'E:/directory/Load.txt' INTO TABLE dbName.tableName CHARACTER SET UTF8 FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\r\n';

E:/directory/keywords.txt"
create table vclasses (id int(50) not null auto_increment primary key,word varchar(255), morph varchar(255), vclass varchar(255));

Select column fields containing a full word:
select * from table name where ColumnName REGEXP '[[:<:]]FullWord[[:>:]]';

SELECT a.* FROM ppost.ppostlatest a,ppost.icdwords b where `a`.`word` =BINARY `b`.`word` limit 20000;

SELECT a.* FROM ppost.totalunqaed a,ppost.wordstocompare b where `a`.`word` =BINARY `b`.`word` limit 20000;

Find and replace in a MySQL Column:
UPDATE `ppost`.`totaltagged` SET date = replace(date,'20130111 (Viva 5K)','20130111');

Query to update a field:
UPDATE `ppost`.`totaltagged` SET assigned_tag = 'JJ/NN' where id = '57064';

Query to append some new data in the same format as the current table
Use the same as the load file

Query to find out the case sensitive duplicates inside a table column:
select word, count(word) cnt FROM `ppost`.`totaltagged` group by binary word having cnt > 1 order by cnt limit 200000;

Query to find out distinct entries in a column where the column is "FileName":
SELECT distinct(FileName) FROM `DirStruct`.`nonemrcnotes` limit 1000000;

//Query to duplicate a table with another name:
//Here "DirStruct.nonemrcnotesedited" is the name of the new table while the table being copied is "DirStruct.nonemrcnotes"

CREATE TABLE DirStruct.nonemrcnotesedited SELECT * FROM DirStruct.nonemrcnotes;

//Count frequency of fields in a column. Here the column name is "dir1"
SELECT dir1, count(dir1) FROM `DirStruct`.`nonemrcnotesedited` group by dir1;

//count occurrence of a some unique group of columns against another column
SELECT DocName,HospitalName, Specialty1, DocSpecialty, WorkType, WTName1, WTName2, count(FileLocation)from `DirStruct`.`corpsamp` group by DocName,HospitalName, Specialty1, DocSpecialty, WorkType, WTName1, WTName2 limit 5000;

//Select Unique Rows in a table
SELECT DISTINCT DocName,Specialty1,DocSpecialty,HospitalName,WorkType,WTName1,WTName2 FROM `DirStruct`.`corpsamp` limit 50000;

SELECT HospitalName, Specialty1, WorkType, WTName1, count(FileLocation)from `DirStruct`.`corpsamp`
group by HospitalName, Specialty1, WorkType, WTName1 limit 5000;

SELECT DISTINCT Specialty1,HospitalName,WorkType,WTName1 FROM `DirStruct`.`corpsamp` limit 50000;

SELECT count(DISTINCT HospitalName),Specialty1 FROM `DirStruct`.`corpsamp` where Specialty1 in ('IM_Cardiology','IM_Pulmonology','Emergency Medicine','Oncology','IM_Nephrology','IM_Internal Medicine General','Surgery','Otorhinolaryngology ','Radiology','IM_Gastroenterology','Orthopedics','Obstetrics & Gynecology','Pathology','Obstetrics','Urology','IM_Neurology','IM_Hematology','IM_Pediatrics','Anesthesiology','Psychiatry','Podiatry','Opthalmology','IM_Endocrinology','Family Medicine','Vascular and Thoracic Surgery','IM_Infectious Diseases','IM_Physical Medicnie and Rehabilitation','IM_Geriatrics','IM_After Hours Care','IM_General Medicine','IM_Oncology','Neurosurgery','IM_Pain Management','IM_Rheumatology','Hospitalist','IM_Hospice Care and Palliative Medicine','IM_Physician Assistant','Nurse Practitioner','IM_Occupational Medicine','Ignored') group by Specialty1;

Merge TwoTables:
insert into 50k1_new2 (fileRefs,freqs,id,parses,sents,sentsCleaned) select a.fileRefs,a.freqs,a.id,a.parses,a.sents,b.sentsCleaned from 50k1 a,50k1_cleaned b where a.id=b.id;

Backup a table from command line:
mysqldump -u root -p posann whole_docs > whole_docs.sql

Backup all MySQL Databases into one file:
mysqldump -u root -p --all-databases > all_dbs.sql

In case the data size is in several GBs, a faster method to backup would be as follows:

mysqldump -u root -p --quick --max_allowed_packet=512M --all-databases | gzip > 20230804_all_dbs.s1l.gz

Please note that it uses gzip utility on windows to dump it faster, so it should be installed and in the environment.

Backup one MySQL Databse:
mysqldump database_name > database_name.sql

Restore a table from command line:
mysql -u root -p DatabaseName < path\TableName.sql

Insert column fields with another table in the database:
Update tempstatus.20140429_TemporalEval T, tempstatus.gold_data_doc_properties as G set T.AdmitDate = G.AdmitDate, T.DictnDate = G.DictnDate, T.DocType = G.DocType, T.DocTypeId = G.DocTypeId, T.ServiceDate = G.ServiceDate where G.fileName = T.fileName;

Find and Replace a String inside a field in a MYSQL table:
update dictionary.englishhindi set hindimeaning = replace(hindimeaning,'अंगूठालगाकरपलटनागन्दा','अंगूठा लगा कर पलटना गन्दा');

Number of words in a field in each row of a column where column name = "langtext":
SELECT id, LENGTH(langtext) - LENGTH(REPLACE(langtext, ' ', ''))+1 FROM dbName.tableName where lang = 'Bengali';

Does not equal to operator in MYSQL is: <>
SELECT count(*) FROM umls_14.rxnorm_fromumls_may2014 where tty <> 'OCD' ;
will return all rows where tty is not "OCD".

Export a query output to a file:
SELECT * FROM moviesite.movie INTO OUTFILE "C:\\TEMP\\test3.txt" ;

Select Rows where one column does not contain the value of another column:
SELECT * FROM tableName WHERE column1 NOT LIKE CONCAT('%', column2, '%');

Deleting a row:
DELETE FROM tableName (WHERE clause);

Combine multiple columns into one column:
update tableName set combined = CONCAT(column1, 'ANY_SEPARATOR', column2,);

Find fields ending in spaces:
SELECT * FROM tableName WHERE columnName REGEXP '[[.space.]]+$';

update ktaxon.contexts_h_sorted
set combined =
CONCAT(cat, '@', subcat, '@', context_h_sorted, '@', keyword, '@', comment);

Move a table to another database:
ALTER TABLE old_DB_Name.tableName RENAME new_DB_Name.tableName

Find lines/fields containing a list of words/full words:

SELECT * FROM ptw.proofing2 where tokens REGEXP '([[:<:]]|^)(

list|of|words|

)([[:>:]]|$)';

Useful MySQL Queries for a Linguist Data Manager

Ads