More Books
PHP 5 Unleashed
PHP 5 Unleashed
Table of Contents
Copyright
Lead Author
Contributing Authors
Acknowledgments
We Want to Hear from You!
Reader Services
Introduction
Organization of the Book
Part I. Working with PHP for General Web Development
Chapter 1. Basic PHP Development
How PHP Scripts Work
Basic PHP Syntax
Basic PHP Data Types
Variable Manipulation
Control Structures
User-Defined Functions
Dynamic Variables and Functions
Multiple File PHP Scripts
References
Strings in PHP
Comparing Strings
Advanced String Comparison
Search and Replacement
Formatting Strings
Strings and Locales
Formatting Date and Time Values
Summary
Chapter 2. Arrays
Basic Arrays
Implementing Arrays
More Array Materials
Chapter 3. Regular Expressions
The Basics of Regular Expressions
Limitations of the Basic Syntax
POSIX Regular Expressions
Perl-Compatible Regular Expressions (PCRE)
PCRE Modifiers
A Few Final Words
Chapter 4. Working with Forms in PHP
HTML Forms 101
Working with Form Submissions in PHP
Summary
Chapter 5. Advanced Form Techniques
Data Manipulation and Conversion
Form Data Integrity
Form Processing
Summary
Chapter 6. Persistent Data Using Sessions and Cookies
HTTP Cookies
PHP Sessions
Advanced Sessions
Summary
Chapter 7. Using Templates
The What and Why of Templates
The Smarty Template Engine
Summary
Part II. Advanced Web Development
Chapter 8. PEAR
What Is PEAR?
Getting and Installing PEAR
Using the PEAR Package Manager
Using the PEAR Website
Using PEAR Packages in Applications
Summary
Reference
Chapter 9. XSLT and Other XML Concerns
Relating XML to HTML
Using XSLT to Describe HTML Output Using XML Input
PHP4 and XSLT Using the DOM XML Module
PHP4 and XSLT Using the XSLT Module
PHP5 and XSLT
Accessing XML Data Using SimpleXML
Generating XML Documents Using PHP
Summary
References
Chapter 10. Debugging and Optimizations
Debugging Your PHP Scripts
Optimizing Your PHP Scripts
Summary
Chapter 11. User Authentication
Authenticating Users in PHP
Securing PHP Code
Summary
Chapter 12. Data Encryption
Shared Secret Versus Public Key
Shared Secret Algorithms
Public Key Cryptography
Using Public Keys in PHP
Summary
Chapter 13. Object-Oriented Programming in PHP
Why Objects?
Creating Basic Classes
Advanced Classes
Special Methods
Class Autoloading
Object Serialization
Exceptions
Iterators
Summary
Chapter 14. Error Handling
The PHP Error-Handling Model
What to Do About Errors
The Default Error Handler
Error Suppression
Custom Error Handlers
Causing Errors
Putting It All Together
Summary
Chapter 15. Working with HTML/XHTML Using Tidy
Introduction
Basic Tidy Usage
Tidy Configuration Options
Using the Tidy Parser
Applications of Tidy
Summary
Chapter 16. Writing Email in PHP
The MIME Protocol
Implementing MIME Email in PHP
Summary
Part III. Building Applications in PHP
Chapter 17. Using PHP for Console Scripting
Core CLI Differences
Working with PHP CLI
CLI Tools and Extensions
Summary
Chapter 18. SOAP and PHP
What Are Web Services?
Installation
Creating Web Services
Consuming Web Services
Looking for Web Services
Summary
Chapter 19. Building WAP-Enabled Websites
What Is WAP?
System Requirements
Introduction to WML
Serving WAP Content
Sample Applications
Summary
Part IV. I/O, System Calls, and PHP
Chapter 20. Working with the File System
Working with Files in PHP
File Permissions
File Access Support Functions
Summary
Chapter 21. Network I/O
DNS/Reverse DNS Lookups
Socket Programming
Network Helper Functions
Summary
Chapter 22. Accessing the Underlying OS from PHP
Introduction
Unix-Specific OS Functionality
Platform-Independent System Functions
A Brief Note About Security
Summary
Part V. Working with Data in PHP
Chapter 23. Introduction to Databases
Using the MySQL Client
Basic MySQL Usage
Summary
Chapter 24. Using MySQL with PHP
Performing Queries from PHP
A MySQLi Session Handler
What Is a Custom Session Handler?
Summary
Chapter 25. Using SQLite with PHP
What Makes SQLite Unique?
Basic SQLite Functionality
Working with PHP UDFs in SQLite
Odds and Ends
Summary
Chapter 26. PHP's dba Functions
Preparations and Settings
Creating a File-Based Database
Writing Data
Reading Data
Sample Application
Conclusion
Part VI. Graphical Output with PHP
Chapter 27. Working with Images
Basic Image Creation Using GD
Using the PHP/GD Drawing Functions
Working with Colors and Brushes
Using Fonts and Printing Strings
General Image Manipulation
Other Graphics Functions
Summary
Chapter 28. Printable Document Generation
A Note Regarding the Examples in This Chapter
Generating Dynamic RTF Documents
Generating Dynamic PDF Documents
Related Resources
Part VII. Appendixes
Appendix A. Installing PHP5 and MySQL
Installing PHP5
Installing MySQL and PHP Modules
Installing PEAR
Appendix B. HTTP Reference
What Is HTTP?
PHP Programming Libraries for HTTP Work
Understanding an HTTP Transaction
HTTP Client Methods
What Comes Back: Server Response Codes
HTTP Headers
Encoding
Identifying Clients and Servers
The "Referer"
Fetching Content from an HTTP Source
Media Types
Cookies: Preserving State and a Tasty Treat
Security and Authorization
Client-Side Caching of HTTP Content
Appendix C. Migrating Applications from PHP4 to PHP5
Configuration
Object-Oriented Programming (OOP)
New Behavior of Functions
Further Reading
Appendix D. Good Programming Techniques and Performance Issues
Common Style Mistakes
Common Security Concerns
Style and SecurityLogging
Summary
Appendix E. Resources and Mailing Lists
Relevant Websites
Mailing Lists and Newsgroups
Index
SYMBOL
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z

Public Key Cryptography

As previously illustrated, shared key encryption relies on each party having some prior association or trust with the other. To decrypt the message, Party B must already know the secret password used by Party A. If these two parties have never communicated before, the password must be exchanged using some other means. This exchange may take place in the form of a phone call, a letter, a separate email, or a face-to-face meeting. Unfortunately, each of these means has one or more significant drawbacks.

The most significant drawback to all of these, apart from speed, cost, location, and identity confirmation, is the simple requirement of human interaction. People would have to establish a new relationship with every other person they want to communicate with and maintain a library of shared secret passwords for each of those parties. If 10 people each had nine unique relationships with every other party, that's 90 distinct shared secrets. At 100 people, we're talking 9,900 unique shared secrets. Now imagine a website dealing with one million customers, or even more. There has to be a better way.

Recall from earlier that public key encryption is any form of cryptography where the process and/or key used to encrypt data is fundamentally different from the process and/or key used to decrypt data. At the heart of this kind of encryption is a mathematical property known as asymmetry. Following is a simple equation:

X + 7 = 10

We can quickly determine that X must be equal to 3. This equation is fully reversible and is therefore symmetric. However, looking at this next equation:

X^2 = 4

We cannot be certain what the value of X actually is. Is it 2, or 2? It's impossible to tell for certain, but at least the number of possible answers is few. This equation is partially reversible and therefore partially symmetric. Now consider the modulus operation:

X mod 7 = 3

Given this equation there are literally an infinite number of possible values for X: 3, 10, 17, 24, 31, 38, and so on. Each one of these numbers, when divided by 7, yields a remainder of 3. Because it's impossible to make even a probable guess at the value of X, this equation is considered nonreversible and fully asymmetric.

The RSA Algorithm

By far the most popular algorithm to take advantage of this property of asymmetric equations is the RSA algorithm. RSA was named for the first letter in the last names of its inventors: Ron Rivest, Adi Shamir, and Leonard Adleman. The RSA algorithm amplifies the indeterminacy of asymmetric equations by pairing them with very large numbers and provides a "backdoor" to decode this nonreversible equation through a fortunate side-effect of the very thing that makes the equation nonreversible in the first place.

At the core of RSA is a deceptively simple equation:

Figure 12.1. RSA Identity.


This states that for a given key (E), there exists a matching key (D) which, when one is raised to the power of the other is equivalent to 1 with respect to base N. To put that in PHP terms:

pow($E, $D) % $N == 1

What this means for cryptography is that for any given value of $T that is less than $N, the following two equations are simultaneously true:

$C == pow($T, $E) % $N;
$T == pow($C, $D) % $N;

In other words, an unencrypted (plain text) value of $T, raised to $E (encryption key) and reduced by the base $N, becomes an encrypted value (ciphertext) of $C. That value can then be decrypted by applying the same equation, this time with the decryption key $D returning to the unencrypted value of $T.

An important side note here is that it doesn't matter which of the two paired values we use for $E during encryption, so long as we use the opposite value during decryption for $D. But wait, where do all these values come from?

First we start with two very large prime numbers. How large is going to depend on how secure you want your data to be. What's effectively unbreakable by today's computers may take only a week to decrypt on cutting-edge hardware five years from now. However large you choose them, they must be unique. Using the same prime number twice is completely ineffective.

We'll call these numbers P and Q. The first number in our (E, D, N) set we can determine immediately:

N = P * Q

This is going to make N an extremely large number (as many digits as P and Q have put together). Next we're going to calculate a temporary variable F.

F = (P-1)(Q-1)

At this point we can pick any one of several values for E so long as it is greater than 1, less than N, and relatively prime with F. Relatively prime means that E and F share no prime factors in common. E is allowed to be prime, but it's not required by the algorithm.

Now that we've collected our encryption key and our modulus, we need only to come up with a suitable decryption key. As it happens, this is very easy to do so long as we also know the value of F. All we have to do is find an integer value for D less than N, such that

DE mod F = 1

Ordinarily, this equation alone would be asymmetric and we'd have an infinite number of values for D that satisfy this equation; fortunately, we've also stipulated that we're only interested in the value that is less than N and there will be only one of those. A programmatic approach to determining this value for D might be

<?php
function find_D($E, $N, $phi) {
  for($x = 1; $x < $N; $x++) {
    if (0 == ((($x * $phi) + 1) % $E)) {
      return (($x * $phi) + 1)/$E;
    }
  }
}
?>

This loop could easily be sped up to require far fewer iterations, but it should cover the basics. Now that we have our values for E, D, and N, we can toss out P, Q, and F, confident that we'll never need them again. In fact, we need to be sure to get rid of them because any one of those values, paired with our public key pair (E, N), would allow someone to determine our value for D using the exact methods we used to generate it in the first place.

Because E and N by themselves are not enough to calculate D, we can give out a copy of our public key to anyone who wants it. In fact, we can publish it on the Internet, write it on our business card, and rattle it off on our home answering machine. Given a public key, all anyone can really do with it is encrypt new data that can be decrypted only with our private key or decrypt data that happens to have been encrypted using our private key. As we'll cover in the next section, these are both activities that are not only acceptable, they're actually exactly what we want to happen.

NOTE

Publishing your public key is where the importance of a large keysize comes into play. Because P and Q are the only factors of N, an attacker could potentially derive these from your public key and use them to deduce the value of D, given sufficient time and computing resources. 1,024 bits is currently considered "secure enough," although many new implementations are favoring 2,048-, 4,096-, or even 8,192-bit keysizes.


Signing Versus Safeguarding

So let's say that everyone has our public key at this point. The first impact of this is that they can now encrypt messages to send to us using our public key and be reasonably assured that we, and only we, can decrypt it. If we've gotten their public key, we can send encrypted messages back with the assurance that only they can decrypt it.

The key difference between this approach and shared secret cryptography is that for a given group of, for example, 100 people, there are only 200 keys in total (each person's public and private keys). Furthermore, rather than each person having to know a unique shared secret for each of the individuals they communicate with, they need to know or remember only their own key pair. When Joe wants to send a message to Bob, he simply asks Bob for his public key, uses that to encrypt the message, and sends it. At this point, Joe can file Bob's public key away for future reference or just toss it out and ask again when the time comes.

All of the preceding information covers safeguarding data, but if anyone can get hold of our public key, how do we know that a given message actually came from the person who claimed to have sent it? The Internet is fundamentally an anonymous place, and just because we receive a message from someone claiming to be John Smith, it doesn't mean that John Smith actually sent the message. If only there were some way we could be sure that only John Smith could have possibly written the message.

Recall that public/private key pairs can be used in either direction. Not only can someone use a public key to encrypt data and a private key to decrypt it, they can also use their own private key to encrypt data and allow anyone else to decrypt it using their public key.

Right about now you may be asking, "What's the point of encrypting something when anyone in the world can ask for my public key to decrypt it?" The answer is that this time we're not interested in safeguarding the information; we just want to prove that we're the only ones who could have generated the information because we're the only ones with our private key. Consider the following email:

From: phb@example.com

To: bob@example.com

Date: Sat 24 Sep 2004 15:13:00 -0700 PST

Subject: Have a nice weekend bob


Bob,
 I authorize you to take the company jet to Maui this weekend.

Sincerely

P.H. Boss

---BEGIN SIGNATURE---

H2309uf2jbkb3bd3d93bhdb23b32@HFLJ#nj3fn23FBFLj32r23ERG@K3d

---END SIGNATURE---

When Bob receives this email, he can't believe his eyes so he tells his email program to check the signature against his boss's public key. Sure enough it decodes to

MD5-Hash: 19cdba92bef9d71e0a7b3f78d91dfe7

Which is the exact value computed from the text of the message. If someone had simply copied a legitimate signature from another one of the boss's emails, the hash values would not match. And because only the boss has the private key needed to generate a new signature from a hash value, he must have been the one to do it.

Man in the Middle

Public Key cryptography shares one critical flaw with Shared Secret cryptography: the first introduction. As mentioned earlier, if Bob intends to send Joe an encrypted message, he'll ask Joe for his public key. Joe will provide it, and Bob will use this public key to encrypt the message so that only Joe can read it.

Figure 12.2. Introductory key exchange.


Now, what if Hacker Hal has broken into one of the routers between Joe and Bob? From this position, Hal could see and intercept all the packets traded by Joe and Bob. If all Hal was doing was looking at the packets, he'd see nothing useful because the data was encrypted. However, as long as Hal had the right software in place when Bob asked Joe for his public key, Hal could intercept that request and send his own public key back instead. Then when Bob uses that public key to encrypt his message and send it, Hal could intercept this message again and be able to decrypt it with his own private key because it was originally encoded with his public key instead of Joe's.

Figure 12.3. Man in the Middle attack.


If the message from Bob to Joe required a replyfor example, an interactive chat sessionHacker Hal could simply ask Joe for his real public key and reencrypt the message himself using Joe's actual public key and reverse the process for sending replies back to Bob. This lets Hal spy on the conversation almost as though it had never been encrypted.

There are two workarounds for this problem. The first requires that parties who expect to communicate with each other introduce themselves in advance and keep a copy of the other party's key on file. Enthusiasts of secure email often hold events known as "key signing parties" for just this purpose. Although this prevents a man in the middle from offering a fake key, it has the crippling drawback of becoming unmanageable on a large scale or with unknown audiences.

The second approach involves both parties knowing and trusting a third party, which is sort of like being introduced by a common friend. Joe and Bob both know Alice. Joe trusts Alice when she says that the person you're talking to is Bob and this is his public key. Similarly, Bob trusts Alice when she introduces him to Joe.

The way this is implemented in the public key world is that Joe generates his public/private key pair, then takes his public key to Alice and says, "Would you sign this for me?" (We'll refer to this later as his Certificate Signing Request, or CSR). Alice then uses her private key to electronically sign Joe's public key the way P. H. Boss signed his own letter to Bob. Anyone receiving Joe's public key now knows that the ever trustworthy Alice has vouched for Joe's identity and has proclaimed that the Joe who made this public key and the real live Joe are one and the same.

In the land of Web browsers and servers using the secure HTTPs protocol, the role of Alice is played by any one of several major corporations, including VeriSign, Thawte, Equifax, and others.