Chris Nizzardini, Salt Lake City Utah, Web Developer Specializing in LAMP+Ajax Since 2006

My Blog

Here is my awesome blog.

Archive for the ‘SQL’ Category

MySQL INNER JOIN on Multiple Parameters and Condtions

Posted by chris on August 19th, 2010 Comments(0)

Wow. This blog has gotten a bit lopsided as of late on the MySQL side of stuff. A symptom of all the MySQL wizardry I’ve been doing and learning at my new job. I found out today that you can add multiple parameters/conditions to an INNER JOIN statement and I feel kind of dumb for not knowing this before. Using multiple parameters on your inner join you can remove the need for nasty sub queries!

Take this query for instance:

SELECT DISTINCT
    wp_posts.*, wp_users.user_nicename, 
	((SELECT meta_value FROM wp_postmeta meta WHERE meta.meta_key = 'views' AND meta.post_id = wp_posts.ID)*1) as views
FROM
	article_to_categories
INNER JOIN
    wp_posts ON wp_posts.ID = article_to_categories.wp_post_id
INNER JOIN
    wp_users ON wp_users.ID = wp_posts.post_author
WHERE
    article_to_categories.categories_id = $id
ORDER BY
    views DESC

On a production server this query is sadly executing in 0.1 seconds. This is due almost entirely to that nasty little sub select in there. In my opinion, sub selects should be avoided at all cost. They are a symptom of at best an unoptimized query and at a worst a poorly designed database. Sometimes I’m a guilty party. Our goal is to get rid of the sub query:

SELECT meta_value FROM wp_postmeta meta WHERE meta.meta_key = 'views' AND meta.post_id = wp_posts.ID

We can do this by specifying multiple parameters and conditions on our join statement.

SELECT DISTINCT
    wp_posts.*, wp_users.user_nicename, (wp_postmeta.meta_value*1) as views 
FROM
    article_to_categories
INNER JOIN
	wp_posts ON wp_posts.ID = article_to_categories.wp_post_id 
INNER JOIN
	wp_users ON wp_users.ID = wp_posts.post_author 
INNER JOIN 
	wp_postmeta ON (wp_postmeta.post_id = wp_posts.ID AND wp_postmeta.meta_key = 'views') 
WHERE
    article_to_categories.categories_id = $id
ORDER BY
	views DESC

Notice that when joining on wp_postmeta I specify that I want all records matching wp_posts.ID and that match the meta_key views. By moving away from the sub select I was able to trim the query execution time to 0.03 seconds. Thats a massive improvement especially for a query that is run in bulk!

INNER JOIN 
	wp_postmeta ON (wp_postmeta.post_id = wp_posts.ID AND wp_postmeta.meta_key = 'views')

I’ve got a feeling I’ll need to go back through some of my older code and introduce some needed query optimizations. How about you?

In SQL (, )

What is PDO Error HY093

Posted by chris on August 11th, 2010 Comments(0)

I’ve been working with PDO at my new job and encountered this HY093 error. Infact that was the only information I received back from my PDO object. Usually there is a lot more information. My guess is that generally PDO is passing back the MySQL error directly from the MySQL server. However, HY093 is strictly an error issued by PDO.

In this case I received the HY093 error because I had improperly binded a data type to a variable.

I was using the form:

$db->prepare("SELECT * FROM tbl WHERE name = :name");
$pdo->bindValue(':name',$phpVar,PARAM_STR);
$pdo->execute();

Notice on line 2 I used “PARAM_STR” this should be PDO::PARAM_STR likewise had this been an integer it would have been PDO:PARAM_INT. Hopefully that helps some people.

In SQL, php (, )

Use MySQL HAVING to Get Past Error 1111 Invalid Use of Group Function

Posted by chris on June 22nd, 2010 Comments(0)

I was working on some new reporting features for a client this evening when I encountered MySQL Error 1111 Invalid Use of Group Function. I was a bit befuddled by this as I was simply using a SUM total in my WHERE clause. Of course after a bit of googling I found that I cannot use an aggregate function such as SUM or COUNT as a WHERE clause.

Here is my original query.

SELECT SQL_CALC_FOUND_ROWS 
  city.name AS city, state.abbrev AS state_abbrev, zip.zipcode, zip.city_id, COUNT(zip.zipcode) AS zip_count, SUM(assigned.is_primary) AS primary_count, SUM(assigned.is_backup) AS backup_count 
FROM 
  tbl_zipcode zip 
INNER JOIN 
  tbl_city city ON city.city_id = zip.city_id 
INNER JOIN 
  tbl_state state ON state.state_id = city.state_id 
LEFT JOIN 
  tbl_affiliate_assigned_location assigned ON assigned.zipcode = zip.zipcode 
WHERE 
  zip_count > primary_count 
GROUP BY 
  city.city_id 
ORDER BY 
  count(zip.zipcode) DESC, city.name ASC 
LIMIT 0,50

I fixed this using a HAVING clause on the GROUP BY.

SELECT SQL_CALC_FOUND_ROWS 
  city.name AS city, state.abbrev AS state_abbrev, zip.zipcode, zip.city_id, COUNT(zip.zipcode) AS zip_count, SUM(assigned.is_primary) AS primary_count, 
  SUM(assigned.is_backup) AS backup_count 
FROM 
  tbl_zipcode zip 
INNER JOIN 
  tbl_city city ON city.city_id = zip.city_id 
INNER JOIN 
  tbl_state state ON state.state_id = city.state_id 
LEFT JOIN 
  tbl_affiliate_assigned_location assigned ON assigned.zipcode = zip.zipcode 
GROUP BY 
  city.city_id HAVING zip_count > primary_count 
ORDER BY 
  count(zip.zipcode) DESC, city.name ASC 
LIMIT 0,50

The HAVING clause seems to be one of those things I use so rarely that I forget about it just before I need it again. Hopefully since I wrote on a blog on it this time it sticks around in my memory bank a bit longer. Hope this helped someone. I also found out that you can have multple HAVING clauses using the following syntax.

HAVING zip_count >  primary_count OR primary_count IS NULL
In SQL (, , , )

MySQL InnoDb inserts are slow, really slow!

Posted by chris on June 12th, 2010 Comments (6)

So I was doing some MySQL database optimization tests this morning when I got my database engines mixed up. Thats when I stumbled on something horrible….InnoDb is really slow when it comes to inserts! Like really slow. I performed 10,000 inserts on MyIsam and it executed in about 1/10th of a second. The same inserts on an InnoDb table took over 21 seconds! I was seriously dumbfounded by this. I knew MyIsam would be faster, but I figured it would be only nominally faster. Out of curiosity I did a single insert with MyIsam which executed in 0.000762939453125 seconds whereas the same insert in InnoDb executed in 0.032792091369629 seconds. I must say this is a bummer for anyone using InnoDb in a high-insert environment. I’ll have to see what the differences are on updates, selects, deletes, and joins.

In SQL (, , )

Faster Mass SQL Inserts With MySQL and PHP

Posted by chris on May 31st, 2010 Comments(0)

I was programming some code that needed to do a lot of inserts a few months back. I hypothesized that creating a big SQL insert statement would be faster than executing a bunch of small insert statements. I didn’t have time to create a test, but I wanted to go back and test this theory at some point. The idea was that even though there is a penalty for looping through an array and accessing a variable to build the SQL statement that it would still be faster than asking PHP to send a bunch of tiny querries.

My test server is my local machine running Ubuntu 10.04, dual 1.80Ghz core Intel processors with 2 GB of memory. Its running the latest stable release of PHP5, Apache2, and MySql5. I used the standard php mysql_connect function, not mysqli. The database engine used in this test was MyIsam. I performed 1000 inserts in the first test, I then altered the code to build one giant insert. I restarted the apache and mysql server after the first scenario.

Read the rest of this entry »

In SQL, php ()

MySQL engines, InnoDb versus MyISAM for web developers

Posted by chris on February 23rd, 2010 Comments (3)

Let me start out by saying I think MyISAM sucks. I hate it. It’s the default MySQL database engine, but its non-relational so a lot of people just start using it without exploring the other options. Hey thats okay, I did the same thing until some smart guys over at my last job introduced be to InnoDb. MyISAM is probably the best way to go for newer web developers just trying to cut their teeth on web application development. At some point its time to a pick up a book and learn how InnoDb can save you time, save you headaches, reduce the amount of code you write, and make the world a better place (okay that last one is a reach).

The best part about InnoDB is that its relational. Its transaction-safe too, but I’ll just focus on the relation side of things for now. What is a relation? A relation joins two tables together on a common value. Typically this is a parent-child relationship known as a one-to-many, but it can be a one-to-one relation too. Lets look at three tables.

tbl_profile
—————
profile_id
profile_name

tbl_profile_setting
———————–
profile_setting_id
profile_setting_name

tbl_profile_has_setting
—————————
profile_id
profile_setting_id

We have a profile table for storing whatever, then a profile can have settings. It doesn’t really matter what these settings are for the purpose of this article, but a profile can have multiple of these settings. You could have this same structure in MyISAM, but you would have to store the relations in your code. Your code is prone to errors. It happens, in fact is happens enough that I try to write as little code as possible. My goal is to leverage as much pre-written code as possible, because its been reviewed by more people and if I’m using that code I likely trust the source. InnoDB is an awesome example of this. Its widely deployed and written by people with more skills than me. No inferiority-complex here, thanks InnoDB.

For creating a relation. We’ll use phpMyAdmin. MySQL Administrator works great too and if your nutty enough you can look up the SQL for doing it in the mysql command line console. Go into the tbl_profile_has_setting table. In the structure tab you will see a link called Relation View. Click on this. You’ll notice a drop down next profile_id and profile_setting_id (these will only appear if you made these primary keys). You’ll need to create indexes on these two columns in the tbl_profile_has_setting table as well. Select the tbl_profile.profile_id and tbl_profile_setting.profile_setting_id for their respective columns. For the On Delete drop down select cascade.

What you’ve just done is create relations that have the following rules enforced by the database engine.

  1. When you delete a profile, its corresponding record(s) in tbl_profile_has_settings is deleted automajically
  2. When you delete a profile setting, its corresponding record(s) in tbl_profile_has_settings is deleted automajically
  3. When you add a record to tbl_profile_has_setting the profile_id and profile_setting_id must exist in their respective tables

Guess what, you don’t have to verify that the setting exists anymore when inserting into tbl_profile_has_setting and you don’t even need to worry about the profile existing. MySQL will return an error if these rules are violated. You now have referential integrity, clean data, and happy reports. You made all this possible just by creating the relation. So what did the cascade option do? That created rule 1 and 2 above. The auto-delete. Cascade should be used wisely as it can have devastating consequences (you records are automatically deleted), but when you implement a cascade this is normally what you want.

So why doesn’t everyone use InnoDb over MyISAM. There are several reasons:

  1. You need to be more knowledgeable to use it. This isn’t just throw data in grab it out anymore. It takes more thought and for bigger projects you’ll want to create ER diagrams to flowchart out your database.
  2. Performance penalty. Since you’ve offloaded the work to the database engine your database now runs slower. I scoff when people use this as an argument against InnoDb. If your application has gotten to be so successful that InnoDb is the sole reason of your slow down then congratulations, not many people are as successful as you. Plus InnoDb operates in a lower level language that is faster than the PHP code you are writing. Also most of your slowdowns in PHP web applications can be attributed to poorly written queries, bad database design to begin with, and lack of innovation to come up with solutions to improve speed.
  3. Harded to backup. Yes you can still use the mysqldump to backup your data, but you can’t copy the actaul database file like you could with MyIsam. This is a crappy form of database backup anyways. If you’re big enough to wear the mysqldump is no longer a sane method of doing backups then just stop being cheap and go buy the enterprise software to manage your data. Your data is important to you right?

Hope this helps some people and I hope it offends some people as well. This is one of those things that I cannot find common ground on, its debated often between me and co-workers. On a side note, don’t let your domain expire while on vacation. You’ll lose your SERPs fast.

In SQL, php (, , )

mysql get total rows in query with sql_calc_found_rows

Posted by chris on August 13th, 2009 Comments(0)

The MySQL SQL_CALC_FOUND_ROWS function is a nice way to return how many rows were returned in the query. There has been a lot of discussion in the PHP.net entry on mysql_num_rows regarding this function. The debate centers around whether its more effecient to use MySQLs built in functionality or whether its more effecient to run the same query again using the COUNT() function.

For me, its hard to determine which way is better. Usually its better to leverage your database engine than code. There is not an easy way to tell how database cache plays into this either. I feel using SQL_CALC_FOUND_ROWS is the better option, it eliminates a few extra lines of code, and prevents you from having to update multiple queries. Whether there is a performance penalty in either case is debatable, if its even noticeable…

1
SELECT SQL_CALC_FOUND_ROWS * FROM tbl_customer WHERE entry_date > '2009-01-01';

In a separate query run this (I believe this is connection dependent, so they must be run in conjunction with each other within the life of the same connection).

1
SELECT FOUND_ROWS() AS totalRows;
In SQL (, , )

mysql with rollup for a easy grouped total columns in result set

Posted by chris on June 4th, 2009 Comments(0)

Using the WITH ROLLUP modifier in queries using GROUP BY will add an additional row to the result set which sums all columns. This prevents you from having you to write code which adds each column in your programming language.

Description from the Mysql Reference Manual

The GROUP BY clause allows a WITH ROLLUP modifier that causes extra rows to be added to the summary output. These rows represent higher-level (or super-aggregate) summary operations. ROLLUP thus allows you to answer questions at multiple levels of analysis with a single query. It can be used, for example, to provide support for OLAP (Online Analytical Processing) operations.

Example

1
2
3
4
5
6
7
8
SELECT 
	DATE(dateTime) AS order_date, count(*) AS shipments, sum(shipping_total) AS shipping_total, sum(hasShipAmt) AS hasShipAmt, sum(shipping_total)-sum(hasShipAmt) AS revenue  
FROM 
	tbl_my_orders 
WHERE 
	dateTime BETWEEN '$startDate' AND '$endDate 23:59:59' 
GROUP BY 
	order_date WITH ROLLUP
In SQL (, , , )

preventing sql injection attacks on mysql, php, lamp servers

Posted by chris on February 3rd, 2009 Comments(0)

I was recently auditing a clients website and was astounded when I was able to access any customers account by disabling JavaScript, using a wild card ‘%’ operator as the username, and guessing passwords. Unfortunately the developer of the customer login used a query that looked like this and did not clean any of the data once received by the server:

1
SELECT * FROM the_customer_table WHERE the_username LIKE "'.$_POST['the_username_field'].'" AND password="'.$_POST['the_password_field'].'"

Using the wild card operator it would return any username where the password matched whatever I passed in such as “password”, “happiness”, and some dirty words that I threw in for fun. The exploit could have been very bad actually as certain accounts have gift certificates associated with them. A hacker could have written a simple brute force script in any language and went to town compiling a list of valid accounts and then ordered hundreds to thousands of dollars worth of product. The exploit was patched by altering the query and additional adding an add_slashes() to all POST data for the login.

A proper login query should look like this:

1
2
3
4
5
6
7
8
$valid=false;
$sql = "SELECT the_username,the_password FROM the_customer_table WHERE the_username ="'.mysql_real_escape_string($_POST['the_username_field']).'" LIMIT 1";
$result = mysql_query($sql);
$row = mysql_fetch_assoc($result);
if($row['the password']==$_POST['the_password_field']){
$valid=true;
}
return $valid;

This will verify first that a valid customer account exists, then it will test the actual password against what was supplied by the end-user. This of course should occur after the POST data has been cleansed against SQL injection hacks and JavaScript hacks. Never use LIKE when you are searching for an exact match either and always use a LIMIT 1 when you are only expecting 1 record to be returned or 1 record to be affected.

In SQL ()

mysql removing characters from end of value

Posted by chris on December 4th, 2008 Comments(0)

Here’s a quick MySQL trick a co-worker helped me figure out today. This roughly equates to PHPs substr($var,0,-2) functionality.

1
SUBSTRING(field, 1, (LENGTH(field) - 2))

So if the fields value was 12345xx this would change the returned value to 12345.

In SQL ()