More Books
PHP 5 Unleashed
PHP 5 Unleashed
Table of Contents
Copyright
Lead Author
Contributing Authors
Acknowledgments
We Want to Hear from You!
Reader Services
Introduction
Organization of the Book
Part I. Working with PHP for General Web Development
Chapter 1. Basic PHP Development
How PHP Scripts Work
Basic PHP Syntax
Basic PHP Data Types
Variable Manipulation
Control Structures
User-Defined Functions
Dynamic Variables and Functions
Multiple File PHP Scripts
References
Strings in PHP
Comparing Strings
Advanced String Comparison
Search and Replacement
Formatting Strings
Strings and Locales
Formatting Date and Time Values
Summary
Chapter 2. Arrays
Basic Arrays
Implementing Arrays
More Array Materials
Chapter 3. Regular Expressions
The Basics of Regular Expressions
Limitations of the Basic Syntax
POSIX Regular Expressions
Perl-Compatible Regular Expressions (PCRE)
PCRE Modifiers
A Few Final Words
Chapter 4. Working with Forms in PHP
HTML Forms 101
Working with Form Submissions in PHP
Summary
Chapter 5. Advanced Form Techniques
Data Manipulation and Conversion
Form Data Integrity
Form Processing
Summary
Chapter 6. Persistent Data Using Sessions and Cookies
HTTP Cookies
PHP Sessions
Advanced Sessions
Summary
Chapter 7. Using Templates
The What and Why of Templates
The Smarty Template Engine
Summary
Part II. Advanced Web Development
Chapter 8. PEAR
What Is PEAR?
Getting and Installing PEAR
Using the PEAR Package Manager
Using the PEAR Website
Using PEAR Packages in Applications
Summary
Reference
Chapter 9. XSLT and Other XML Concerns
Relating XML to HTML
Using XSLT to Describe HTML Output Using XML Input
PHP4 and XSLT Using the DOM XML Module
PHP4 and XSLT Using the XSLT Module
PHP5 and XSLT
Accessing XML Data Using SimpleXML
Generating XML Documents Using PHP
Summary
References
Chapter 10. Debugging and Optimizations
Debugging Your PHP Scripts
Optimizing Your PHP Scripts
Summary
Chapter 11. User Authentication
Authenticating Users in PHP
Securing PHP Code
Summary
Chapter 12. Data Encryption
Shared Secret Versus Public Key
Shared Secret Algorithms
Public Key Cryptography
Using Public Keys in PHP
Summary
Chapter 13. Object-Oriented Programming in PHP
Why Objects?
Creating Basic Classes
Advanced Classes
Special Methods
Class Autoloading
Object Serialization
Exceptions
Iterators
Summary
Chapter 14. Error Handling
The PHP Error-Handling Model
What to Do About Errors
The Default Error Handler
Error Suppression
Custom Error Handlers
Causing Errors
Putting It All Together
Summary
Chapter 15. Working with HTML/XHTML Using Tidy
Introduction
Basic Tidy Usage
Tidy Configuration Options
Using the Tidy Parser
Applications of Tidy
Summary
Chapter 16. Writing Email in PHP
The MIME Protocol
Implementing MIME Email in PHP
Summary
Part III. Building Applications in PHP
Chapter 17. Using PHP for Console Scripting
Core CLI Differences
Working with PHP CLI
CLI Tools and Extensions
Summary
Chapter 18. SOAP and PHP
What Are Web Services?
Installation
Creating Web Services
Consuming Web Services
Looking for Web Services
Summary
Chapter 19. Building WAP-Enabled Websites
What Is WAP?
System Requirements
Introduction to WML
Serving WAP Content
Sample Applications
Summary
Part IV. I/O, System Calls, and PHP
Chapter 20. Working with the File System
Working with Files in PHP
File Permissions
File Access Support Functions
Summary
Chapter 21. Network I/O
DNS/Reverse DNS Lookups
Socket Programming
Network Helper Functions
Summary
Chapter 22. Accessing the Underlying OS from PHP
Introduction
Unix-Specific OS Functionality
Platform-Independent System Functions
A Brief Note About Security
Summary
Part V. Working with Data in PHP
Chapter 23. Introduction to Databases
Using the MySQL Client
Basic MySQL Usage
Summary
Chapter 24. Using MySQL with PHP
Performing Queries from PHP
A MySQLi Session Handler
What Is a Custom Session Handler?
Summary
Chapter 25. Using SQLite with PHP
What Makes SQLite Unique?
Basic SQLite Functionality
Working with PHP UDFs in SQLite
Odds and Ends
Summary
Chapter 26. PHP's dba Functions
Preparations and Settings
Creating a File-Based Database
Writing Data
Reading Data
Sample Application
Conclusion
Part VI. Graphical Output with PHP
Chapter 27. Working with Images
Basic Image Creation Using GD
Using the PHP/GD Drawing Functions
Working with Colors and Brushes
Using Fonts and Printing Strings
General Image Manipulation
Other Graphics Functions
Summary
Chapter 28. Printable Document Generation
A Note Regarding the Examples in This Chapter
Generating Dynamic RTF Documents
Generating Dynamic PDF Documents
Related Resources
Part VII. Appendixes
Appendix A. Installing PHP5 and MySQL
Installing PHP5
Installing MySQL and PHP Modules
Installing PEAR
Appendix B. HTTP Reference
What Is HTTP?
PHP Programming Libraries for HTTP Work
Understanding an HTTP Transaction
HTTP Client Methods
What Comes Back: Server Response Codes
HTTP Headers
Encoding
Identifying Clients and Servers
The "Referer"
Fetching Content from an HTTP Source
Media Types
Cookies: Preserving State and a Tasty Treat
Security and Authorization
Client-Side Caching of HTTP Content
Appendix C. Migrating Applications from PHP4 to PHP5
Configuration
Object-Oriented Programming (OOP)
New Behavior of Functions
Further Reading
Appendix D. Good Programming Techniques and Performance Issues
Common Style Mistakes
Common Security Concerns
Style and SecurityLogging
Summary
Appendix E. Resources and Mailing Lists
Relevant Websites
Mailing Lists and Newsgroups
Index
SYMBOL
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z

Limitations of the Basic Syntax

Even though regular expressions are quite powerful because of the original rules, inherent limitations make their use impractical. For example, there is no regular expression that can be used to specify the concept of "any character." In addition, if you happen to have to specify a parenthesis or star as a regular expressionrather than as a special characteryou're pretty much out of luck.

As a result of these limitations, the practical implementations of regular expressions have grown to include a number of other rules:

  • The special character "^" is used to identify the beginning of the string.

  • The special character "$" is used to identify the end of the string.

  • The special character "." is used to identify the expression "any character."

  • Any nonnumeric character following the character "\" is interpreted literally (instead of being interpreted according to its regex meaning). Note that this escaping technique is relative to the regex compiler, and not to PHP itself. This means that you must ensure that an actual backslash character reaches the regex functions by escaping it as needed (that is, if you're using double quotes, you will need to input \\). Any regular expression followed by a "+" character is a regular expression composed of one or more instances of that regular expression.

  • Any regular expression followed by a "?" character is a regular expression composed of either zero or one instances of that regular expression.

  • Any regular expression followed by an expression of the type {min[,|,max]} is a regular expression composed of a variable number of instances of that regular expression. The min parameter indicates the minimum acceptable number of instances, whereas the max parameter, if present, indicates the maximum acceptable number of instances. If only the comma is available, no upper limit exists to the number of instances that can be found in the string. Finally, if only min is defined, it indicates the only acceptable number of instances.

  • Square brackets can be used to identify groups of characters acceptable for a given character position.

Let's start from the beginning. It's sometimes useful to be able to recognize whether a portion of a regular expression should appear at the beginning or at the end of a string. For example, suppose you're trying to determine whether a string represents a valid HTTP URL. The regex http:// would match both http://www.phparch.com, which is a valid URL, and nhttp://www.phparch.com, which is not (and could easily represent a typo on the user's part).

By using the "^" special character, you can indicate that the following regular expression should be matched only at the beginning of the string. Thus, the regex ^http:// will create a match only with the first of the two strings.

The same conceptalthough in reverseapplies to the end-of-string marker "$", which indicates that the regular expression preceding it must end exactly at the end of the string. For example, com$ will match "sams.com" but not "communication."

The special characters "+" and "?" work similarly to the Kleene Star, with the exception that they represent "at least one instance" and "either zero or one instances" of the regex they are attached to, respectively.

As I briefly mentioned earlier, having a "wildcard" that can be used to match any character is extremely useful in a wide range of scenarios, particularly considering that the "." character is considered a regular expression in its own right, so that it can be combined with the Kleene Star and any of the other modifiers. For example, the expression

.+@.+\..+

can be used to indicate:

At least one instance of any character, followed by

The "@" character, followed by

At least one instance of any character, followed by

The "." character, followed by

At least one instance of any character.

As you might have guessed, this expression is a very rough form of email address validation. Note how I have used the backslash character (\) to force the regex compiler to interpret the penultimate "." as a literal character, rather than as another instance of the "any character" regular expression.

However, that is a rather primitive way of checking for the validity of an email address. After all, only letters of the alphabet, the underscore character (_), the minus character (), and digits are allowed in the name, domain, and extension portion of an email. This is where the range denominators come into play.

As mentioned previously, anything within nonescaped square brackets represents a set of alternatives for a particular character position. For example, [abc] indicates either an "a", a "b", or a "c". However, representing something like "any character" by including every possible symbol in the square brackets would give birth to some ridiculously long regular expressionsand regex are complex enough as it is.

Luckily, it's possible to specify a "range" of characters by separating them with a dash. For example, [a-z] means "any lowercase character." You can also specify more than one range and combine them with individual characters by placing them side-by-side. For example, our email validation requirements can be satisfied by the expression [A-Za-z0-9_], which turns the overall regex into

[A-Za-z0-9_]+@[A-Za-z0-9_]+\.[A-Za-z0-9_]+

The range specifications that we have seen so far are all inclusivethat is, they tell the regex compiler which characters can be in the string. Sometimes, it's more convenient to use exclusive specifications, dictating that any character except the characters you specify are valid. This can be done by prepending a caret character (^) to the character specifications inside the square bracket. For example, [^A-Z] means "any character except any uppercase letter of the alphabet."

Going back to the email validation regex, it's still not as good as it could be. For example, we know for sure that a domain extension (for example, .ca or .com) must have a minimum of two characters (as in .ca) and a maximum of four (as in .info). We can therefore use the minimum-maximum length specifier that I introduced earlier to specify this additional requirement:

[A-Za-z0-9_]+@[A-Za-z0-9_]+\.[A-Za-z0-9_]{2,4}

Naturally, you may want to allow only email addresses that have a three-letter domain (such as .com). This can be accomplished by omitting the comma and max parameters from the length specifiers:

[A-Za-z0-9_]+@[A-Za-z0-9_]+\.[A-Za-z0-9_]{3}

If, on the other hand, you would like to leave the maximum number of characters open in anticipation of the fact that longer domain extensions may be introduced in the future, you could use the following regex:

[A-Za-z0-9_]+@[A-Za-z0-9_]+\.[A-Za-z0-9_]{3,}

This indicates that the last regex in the expression should be repeated at least a minimum of three times, with no fixed upper limit.