Mod Rewrite, .htaccess, and permalinks for gpEasy

Mod rewrite rules, permalinks, and the .htaccess file certainly cannot be claimed to be basic in any sense; however, permalinks are a base feature of gpEasy as of 1.7 therefore this is a topic that must be understood. That's why I've included this tutorial in the basics tutorial section, not that the information is at all basic. Nonetheless, I've structured this tutorial to be simple and straight forward. I'm not going to give you all possible options for mod rewrite rules but rather a set of options that you can apply to get "pretty permalinks" that follow standard conventions. By doing it this way we will head off problems down the road and make life easier on you when dealing with links for your site.

Basically all you really need to do is have a grasp of the topics discussed on this page and then at the end use the code on your own site. If it works for you then you are good to go, but you should be familiar with the what and why.

First lets get some definitions out of the way.

Mod Rewrite

Wikipedia defines mod rewrite in the following way,

A rewrite engine is software that modifies a web URL's appearance (URL rewriting). Rewritten URLs (sometimes known as short, fancy URLs, or search engine friendly - SEF) are used to provide shorter and more relevant-looking links to web pages. The technique adds a degree of separation between the files used to generate a web page and the URL that is presented to the world.

Often referred to as mod rewrite rules, basically what mod rewrite allows you to do is define how a URL is presented to the end user, thus you can do things like remove www. from the URL or index.php. These are two components that are often removed from an URL and gpEasy by default tries it's best to remove index.php however it may not always work.

.htaccess file

Wikipedia gives us a fairly descriptive idea of what an .htaccess file is,

In several web servers (most commonly Apache), .htaccess (hypertext access) is the default name of a directory-level configuration file that allows for decentralized management of web server configuration. The .htaccess file is placed inside the web tree, and is able to override a subset of the server's global configuration; the extent of this subset is defined by the web server administrator. The original purpose of .htaccess was to allow per-directory access control (e.g. requiring a password to access the content), hence the name. Nowadays .htaccess can override many other configuration settings, mostly related to content control, e.g. content type and character set, CGI handlers, etc.

In the Apache web server, the format of .htaccess is the same as the server's global configuration file; other web servers (such as Sun Java System Web Server and Zeus Web Server) implement the same syntax, even though their configuration files are very different. Directives in the .htaccess file apply to the current directory, and to all sub-directories (unless explicitly disabled in the server configuration), but for reasons of performance and security, cannot affect their parent directories.

The file name begins with a dot because dot-files are by convention hidden files on Unix-like operating systems.

During the tutorials you should have seen the .htaccess file however, we didn't really get into its details. Well, we are going to now but not too in-depth. As you can see from the description Wikipedia gives this is a very important file that can do a lot and a tutorial can't cover all it's abilities. Therefore, we will be sticking to some very basics that apply to your use of gpEasy.

Another thing to notice in the description is that .htaccess rules are most commonly used on Apache web servers. This is another reason why I recommend people get a host provider that uses cPanel and Apache. They are rather standard and therefore offer a more consistent experience for users. Therefore, this tutorial can really only be applied to those servers that support the .htaccess file and mod rewrite rules. Some servers, even Apache servers, can restrict your access to these functions; therefore, it is very important when looking for a host provider to ask them about your ability to use mod rewrite rules, user permissions, and the .htaccess file.

Permalinks

In essence a permalink is a permanent link that points to a web page.  It has become a rather standard term within content management systems. No need to really add anything other than that to what it means because that really is the essence of it, the link to your page.

gpEasy allows you to set a permalink structure that removes the index.php from the URL. This is done in gpEasy by going to the "Admin" menu and then clicking the  "permalinks" link. You will see two options, use index.php and hide index.php. By default gpEasy checks to see if your server has the proper read/write permissions and if the mod rewrite engine is on, if it is and you do have proper read/write permissions gpEasy will automatically create a .htaccess file that will remove index.php from your URL.

The .htaccess file will be found in the root folder of your gpEasy installation. Many servers and CMS systems will put a .htaccess file in every folder within your domain. This is to set specific rules for accessing those directories/folders; however, in this tutorial I'm going to teach you how to block access to all folders/directories from this root level .htaccess file.

The following code is what you will see inside the .htaccess file when gpEasy's permalinks are set to use index.php.

# BEGIN gpEasy
<IfModule mod_rewrite.c>
	
</IfModule>
# END gpEasy

As you can tell there isn't a lot to it. The lines with the # at the beginning are comments. The other two lines are the opening and closing tags and there are no mod rewrite rules between them.

The code inside the .htaccess file for a gpEasy installation at the root directory level with permalinks set to hide index.php will look like this,

# BEGIN gpEasy
<IfModule mod_rewrite.c>
	<IfModule mod_env.c>
	SetEnv gp_rewrite On
	</IfModule>
	RewriteEngine On
	RewriteBase "/"
	RewriteRule ^index\.php$ - [L]
	RewriteCond %{REQUEST_FILENAME} !-f
	RewriteCond %{REQUEST_FILENAME} !-d
	RewriteRule . "/index.php" [L]
</IfModule>
# END gpEasy

And if we put our gpEasy installation in a sub-folder (sub-directory) within our domain's folder such as a folder named "asubdirectory" then we would get the following code when hide index.php is set in the permalinks settings.

# BEGIN gpEasy
<IfModule mod_rewrite.c>
	<IfModule mod_env.c>
	SetEnv gp_rewrite On
	</IfModule>
	RewriteEngine On
	RewriteBase "/asubdirectory/"
	RewriteRule ^index\.php$ - [L]
	RewriteCond %{REQUEST_FILENAME} !-f
	RewriteCond %{REQUEST_FILENAME} !-d
	RewriteRule . "/asubdirectory/index.php" [L]
</IfModule>
# END gpEasy

The main thing to notice with this last set of code are the RewriteBase & RewriteRule lines. Since the gpEasy installation was in the subdirectory (sub-folder within our domain) named "asubdirectory" gpEasy needs to add the mod rewrite rules with that directory specified because your URLs will be written with that directory in them. It's a little confusing but know that if you are in any other folder than the root folder of your domain the directory will be in these lines we are discussing here.

What does this code mean?
Yes, this code looks like gibberish and nearly all mod rewrite rules do however, there are a few things we can glean.

Lines starting with a # are comment lines. They don't do anything. They are just there to tell you about the code itself.
Lines enclosed in <> are opening and closing tags. In this case gpEasy is using "If" statements in the opening and closing tags.
RewriteEngine ON: is a general rule telling the server to set the mod rewrite engine to on and what usually comes after this statement are mod rewrite rules and conditions.
RewriteBase: this mod rewrite rule defines the root of the installation. Notice in the code previous the base was "/" and in the latter it's "/asubdirectory/" - that's because the first installation is in the root of the domain and the latter is in the subdirectory named "asubdirectory"
RewriteRule: obviously this is a mod rewrite rule . . . it it looks confusing. Thankfully we don't need to know exactly what it means because we have all the mod rewrite rules we need and if there is one you want that isn't in this tutorial there are lots of brilliant programmers that are putting them on the net. You just need to find the one that works with your host provider's server.
RewriteCond: this is a mod rewrite rule condition. In other words, this is a conditional statement that defines how the rewrite rule should be applied. Once again we don't need to know the particulars.

That's the basics of what you will see in your standard gpEasy .htaccess file when using permalinks. Before we go onto other mod rewrite rules we need to talk about gpEasy's index.php file.

The index.php file

At the root level of your gpEasy installation you will find a file named "index.php." You should be familiar with this file from the gpEasy folder structure tutorial. The code in this file is rather simple but there is one important line that pertains to permalinks. Here's the code:

<?php
define('gpdebug',true);
//define('gp_indexphp',false);
//define('gptesting',true);
require_once('./include/main.php');

The lines with // at the beginning of them are lines that have been commented out. This means that they are currently NOT active. To make them active you would remove the // at the beginning of the line. Let's go through each line of code.

define('gpdebug',true);
This line is for setting gpEasy debugging mode on or off. Because this index.php is from an alpha install I'm working on the debug mode is not commented out and is on = true.

//define('gp_indexphp',false);
This line of code is currently commented out, and it is the line we are interested in because it concerns the permalinks and index.php being in the URL or not. If you set permalinks in gpEasy to "hide index.php" and it isn't working, index.php is still showing up when you view pages, the first thing to check is the .htaccess file. After checking that file and seeing that everything is OK, like you've seen above, then the index.php file is next and this line of code is the one we are looking for. You may need to remove the comment marks (//) from the beginning of this line to active it. The line of code tells the system that index.php is false and should not be displayed. In general this line is now commented out and doesn't need to be touched however, on some servers it may need to be un-commented. So, this is the first place after the .htaccess file you check if permalinks aren't working properly for you. If changing this line doesn't do anything then double check the .htaccess file to make sure the proper directory is specified and if you still have no luck then speak with your host server provider to make sure you have mod rewrite enabled and the proper permissions to use mod rewrite rules.

//define('gptesting',true);
This is another line that is usually commented out. It's for testing gpEasy and something you don't need to worry about.

require_once('./include/main.php');
This little line simply calls the main.php file to be accessed. Again, a line you should never worry about.

Canonicalization and Duplicate Content

If you do any search engine optimization (SEO) reading on the internet one of the first things SEO gurus will bring up is duplicate content. What they are saying is that search engines can and will index the same page with multiple URLs.

For Example:
http://trueacu.com/
http://www.trueacu.com/
are the same page but search engines index it as two separate pages or rather with two different URLs. Furthermore;
http://trueacu.com/gpEasy
http://trueacu.com/gpEasy/
http://www.trueacu.com/gpEasy
http://www.trueacu.com/gpEasy/
are all the same page as well and can and often will be indexed as different pages thus they are seen as "duplicate content" even though it really isn't. At one time this was a major problem with search engines. Whether or not search engines are now smarter and deal with this type of "duplicate content" better is debatable, and that's why SEO "experts" often suggest that you canonicalize your URLs. What does that mean? It means, to convert a URL that has more than one possible representation into a standard or single URL, and we do this with mod rewrites.

Currently by default gpEasy allows any one of the above URLs to take place. It all depends on how the person arrives at your site. If they arrive with www. in the URL then the links will present with www. in them, etc.

Well, I can't say if it helps in search engine results to canonicalize your URLs or not but I do like the consistency of one URL for one page, thus I'm going to give you my mod rewrite rules to make your URLs look "pretty." There are two basic mod rewrites I will be giving you, the first removes www. from the URL and the second ensures that non-directory pages do not have a trailing slash at the end of the URL, thus all of the above URLs pointing to the /gpEasy page will all result in one singe URL, http://trueacu.com/gpEasy, no matter how the person arrived at the site. You can try it with trueacu.com. Try and add the www. or trailing slash in the URL and you will see that it is removed, thus one page = one URL.

Remove www. and trailing slash from the URL

Finally, we can now move on to some mod rewrite rules to customize our URLs. For years people loved to include the www. in URLs when printing or saying them. Finally that has gone by the wayside and people usually just give the domain name. For example, we usually just say go to trueacu.com. We imply that it's a web address and www. isn't needed because the browser puts it in and the web is now smarter. So why even have it in your URL at all? You may read some SEO "experts" that claim having www. in the URL increase search engine rankings. I've seen no proof of that whatsoever. So I'm going to give you the mod rewrite rules to remove the www. from your site's URLs.

Removing the trailing slash from the end of a page/file fits with naming conventions. A directory, by naming convention, ends with a trailing slash but a file does not thus,
http://trueacu.com/gpEasy = a file
http://trueacu.com/gpEasy/ = a directory
Because content management systems now handle much of the URL responsibility and mod rewrite rules are very flexible you often see files that end with a trailing slash. Again, this is one of those things where you may read some SEO "expert's" advice that having it results in better search engine rankings. . . seriously?!?!

For our purposes we are going to stick with proper naming conventions and remove the trailing slash from all non-directory files.

Here's my .htaccess file for my gpEasy installation here at trueacu.com. It's clearly installed at the root level. Notice wherever "trueacu.com" is you will need to change that to your domain name and or subdirectory.
The Code:


Options +FollowSymLinks -MultiViews -Indexes
RewriteEngine on
# Redirect to remove trailing slashes unless requested URL resolves to an existing directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)/$ http://trueacu.com/$1 [R=301,L]
# Redirect to remove www from url
RewriteCond %{HTTP_HOST} ^www\.trueacu.com [NC]
RewriteRule ^(.*) http://trueacu.com/$1 [R=301,L]
# BEGIN gpEasy
<IfModule mod_rewrite.c>
	<IfModule mod_env.c>
	SetEnv gp_rewrite On
	</IfModule>
	RewriteEngine On
	RewriteBase "/"
	RewriteRule ^index\.php$ - [L]
	RewriteCond %{REQUEST_FILENAME} !-f
	RewriteCond %{REQUEST_FILENAME} !-d
	RewriteRule . "/index.php" [L]
</IfModule>
# END gpEasy

Let's go over the code starting from the top.

Options +FollowSymLinks -MultiViews -Indexes
The first line of code sets some specific options for all directories within the installation. The +FollowSymLinks tells the server to follow symlinks. Don't worry if you don't know what they are. You most likely won't have them, but if you use the multisite plugin you will. So, we want to make sure they are followed. The -MultiViews turns off a feature that tells the server to search for files in other languages. That's something we don't need the server doing as our site is a CMS and doesn't have pages in multiple languages. The pages are generated on the fly. So, we turn it off with the - before MultiViews. Finally, the -Indexes tells the server how we want directories handled in our installation, to display files in a directory or not. Putting the - before Indexes tells the server to NOT display files within a directory. So if you attempt to look into one of my directories here you will get a forbidden message. This doesn't stop links from displaying a file, say an image, within a directory but all the contents of a directory from being listed.

There is one caveat to the Indexes setting, if you have an index.html or home.html file in a directory then that file will be displayed regardless of this setting, thus I remove all index.html or home.html files from all directories within my gpEasy installation. gpEasy by default puts a blank index.html file in each and every directory. The developers do this to protect you from having the contents of a directory displayed by accident, and that's one way to do it. The other is to use the -Indexes in the .htaccess file. Nonetheless, if you use the -Indexes way then just remove all the index.html files or not--do a search within your gpEasy folder and delete them all, before installation or in your local installation. You can set the -Indexes for each directory by putting a .htaccess file in each one or setting it via cPanel. Personally I like to do it with the root level .htaccess file. This whole bit is a personal preference. Leave the index.html files or not, use -Indexes or not. Don't worry if you use -Indexes and leave the index.html files in the directories/sub-folders it won't cause any problems. It's just what will be displayed, a forbidden message versus a blank html page.

RewriteEngine on
This turns the mod rewrite engine on. Notice it's in the code twice. Not a problem.

# Redirect to remove trailing slashes unless requested URL resolves to an existing directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)/$ http://trueacu.com/$1 [R=301,L]
These lines of code do exactly what the commented line (#) says they will do. It's important that the trailing slash removal only affect files and not directories. On my server a directory will display with or without a trailing slash but with the forbidden access due to -Indexes. 

# Redirect to remove www from url
RewriteCond %{HTTP_HOST} ^www\.trueacu.com [NC]
RewriteRule ^(.*) http://trueacu.com/$1 [R=301,L]
These lines of code due what the comment line suggest they will do as well, remove the www. from all URLs for the site. Notice the "trueacu.com" domain names? You need to change those to your domain name and or directory depending on how you installed gpEasy.

So, what I've added to the gpEasy .htaccess mod rewrite rules are some options and two sets of rules and conditions. Not a lot but it does exactly what I want it to do, resolve every page to one single URL.

Potential Problems

There are a host of problems that can arise when dealing with mod rewrite rules however, most of those problems have to do with your host provider's server settings. Therefore, it's not possible to guarantee that a certain set of mod rewrite rules will work on every server. While I've tested and use the above rules on an Apache server your host may have different settings for their Apache server or may use a completely different server software altogether, and that's where the problems come in. All you can do is test my rewrite rules on your server and if they work, great! If they don't work. . . you can always hunt for some other rules and apply them or speak directly to your host provider.

Other Considerations

One of the reasons to remove the trailing slash is to make it a little easier on yourself when doing internal linking of your pages. However, you need to be consistent with your internal links. Always make sure file names don't end with a trailing slash. Best to always use relative links, it's just easier especially if you ever change domains. 

The main thing, regardless of the rewrite rules you choose or if you don't use any, is to be consistent with you linking within your site as well as external links coming into your site but you can't always control the latter.

That's a long and complicated tutorial however, all you really need to do is apply the above code to your .htaccess file and see if it works or not.

 

gpEasy B2sq Theme by CS @True Acupuncture