Archive

Archive for the ‘Development’ Category

Rails Sessions Across Multiple Subdomains

May 26th, 2010

Okay, so I’m working on a new Rails project. Things are coming along great. Then we hit a snag where our SSL is not working as expected. We want it to work on Staging and Production only, and only for the actions that we need them on. So, the SslRequirement Gem did the trick.

However, we have many (and many more to come) sub-subdomains which caused another dilemma. We have a wildcard SSL certificate, however, although we can get one that also handles sub-subdomains, it’s not necessarily supported by the user’s browser. So, our other option was to put all the public stuff on the subdomains and have all the private stuff on a “private”.domain.com address which would adequately be handled by SSL at the application and certificate levels. After some finagling, I managed to dynamically change the subdomain based on whether or not the action requested should be SSL’d.

Everything seemed to be humming along, but this new code snippet was relying on something that we hadn’t previously tested thoroughly… sessions. Session are just supposed to work right? However, evidently they don’t work by default across subdomains. So, after some hunting around, this little snippet put into my “/config/[environment].rb” file did the trick.

1
config.action_controller.session = { :domain => ".[domain].com" }

Evidently, this tells the session to share across anything within the main domain. You can also restrict it further by using “.[subdomain].[domain].com”.

Works like a charm.

fudnik Ruby , ,

Understanding Click Fraud

November 13th, 2009

What is click fraud?

Let’s start out by defining some key terms that are important in understanding click fraud.

  • Advertiser – The entity that pays money to get traffic to their site in the way of bidding on keywords or topical categories (bid auctions).
  • Publisher – Any entity which displays advertiser ads on their web site or in some other publicly viewable medium.
  • Visitor – A legitimate user who clicked on something to get to the appropriate target web site.
  • Click – A visitor to the advertiser’s site that came by route of one or more publishers.
  • PPC (Pay Per Click) – An internet advertising model where the amount the advertiser pays is dictated on a per click basis for the terms (keyword or categories) being bid on.
  • CPC (Cost Per Click) – The amount (bid price) paid by the advertiser to receive one visitor for a particular term. The amount is paid only if the visit occurs.
  • CPA (Cost Per Action) – An advertising cost associated to a particular desired visitor action, i.e. purchased a product or service, filled out a survey, or signed up for a newsletter.
  • Conversion – A completion of the advertiser’s desired action under a CPA advertising model.
  • Click Stream – The route the click traffic takes from the time the click is made through the time the web user arrives on the advertiser’s targeted URL. There can often be URL redirects and several publishers (usually tracked by cookies or ID’s in the URL) that receive information for each click, completely transparent to the user.
  • Rev Share – A single publisher’s fraction of the revenue generated by specific click and conversion sources. For example, a smaller publisher might arrange to send click traffic into a larger publisher’s click stream, providing the larger publisher with more traffic and retaining a 5% rev share of the total per click amount for the smaller publisher. Rev Share can be seen as a multi-tier sales commission.
  • Ad Feed – ad listings/data provided by an n-tier publisher by request to display to users on another publisher’s web site or application.

Click fraud, generally speaking, occurs when something (person, web bot, etc…) posing as a legitimate internet user follows (or clicks) a paid advertisement URL to the advertisers web site from which money is generated for some entity other than the advertiser.

Valid User

An advertiser pays good money for advertising, expecting that a portion of the traffic received in return will generate revenue in some fashion. Non-legitimate visitors produce bad clicks which in effect spend advertiser dollars with no hope of a return for the advertiser. This expense is instead divvied up between the layers (rev share) of publishers that are likely to be present in the click stream. Publishers, especially the ones on the end of the chain, often have the most to gain from this practice and will devise all sorts of innovative ways to game the system. Larger publishers in the click stream will often ignore or downplay this activity, knowing that it lines their pockets in the process.

In short, advertisers are being hijacked of their advertising dollars from inflated term bidding marketplaces because of traffic that is posing as real, live, interested web site visitors. It is theft akin to diverting fractions of a penny from financial transactions to a private account.

What is a valid visitor?

A non-valid visitor is essentially anything that can pose as a user that has no ability or intention of producing revenue for the advertiser, however, there are endless ways this can happen. A fraudulent click bot (any automated application used for the purpose of methodically sifting through sites in order to perform a specific operation) can target specific sites, following advertiser links and sending traffic header information which falsely identifies the bot as a user from a real browser. These bots can get very creative by spoofing IP’s, using proxies and randomizing the browser agent and other key data elements.

This can get somewhat fuzzy, as there are also legitimate scripts and bots that, although not valid users, are not considered fraudulent. Their purpose is to scan the internet, spidering through sites to map them out, extract data for search engines, and many other reasons. While no fraud is intended, they create traffic that will create “clicks” by following a URL and ultimately charge the advertiser for the traffic, unless some kind of safeguard is put in place to filter this traffic.

A valid visitor is a real person using an actual web browser (or application where ads may appear) that clicks on the ad URLs of their own decision and has an interest in what the advertiser is providing, therefore potentially resulting in a conversion for the advertiser. A valid user is always a person, but a person isn’t always a valid user. Even a real person can be considered a fraudulent user (more on this later).

Location, location, location

Publishers provide a means to create clicks that lead to advertisers’ web sites. There are many ways this can happen.

A publisher can have an existing site which provides something of value to a demographic and therefore gets a lot of “organic” search traffic. The publisher would like to further monetize their website, so they tap into an ad feed, parse the data and place the ads on their website. When an interested user clicks on one of the ad links, a click is registered through one or more publishers resulting in a visitor to the advertiser’s site. The web site publisher then gets paid for the click. This same process can be done for applications which display ads. Banners function in a similar manner.

Many publishers are not simply successful web site owners trying to maximize online real estate. Many will funnel traffic through their system without involving any user-facing applications or web sites. A publisher can get ads from another publisher’s ad feed, determine their own pricing model and associated key words, then take those ads along with others and insert them into Google’s or Yahoo’s ad listing via an API. This is one way in which a publisher becomes an advertiser. They play a publishing/distribution role with their ad providers. However, they play an advertising role with their ad distributors as they are purchase traffic at newly negotiated rates, often based on quantity or key word variations.

Publishers will often create a combination between monetizing online real estate and utilizing ad feeds. They can do this by creating dynamic, topic-based web sites with a search engine or topic directory facade. Sometimes they are hand created with relatively decent designs and basic content. Often they are purely dynamic, totally created on-the-fly based on the user’s search term. The goal of these sites is to do nothing more than gain interest from the user in order to get them to click on an ad link which will bring money to the site owner. These sites are often never ending webs of topic pages where the user never really finds what they are looking for. Fraud is common where the link text does not actually represent the target advertiser URL. This is fraudulent because a user may click on a link that says “pet supplies” when the actual URL takes the user to an advertiser who sells “college text books”. Assuming that “college text books” is a more costly search term, when the advertiser pays for the click, the publisher gets paid the higher price. The user, who still hasn’t found what they are looking for, will usually continue to click on links hoping to find what they are looking for, costing many advertisers money while making the publisher wealthy.

Arbitrage has been a recent problem in the industry. The idea is to sell something back to the ad provider at a higher price than what was originally paid for it. Using this method, an entity can buy click traffic (AdWords) for really inexpensive keywords in a particular category, then direct the traffic to a simple web site that shows nothing but ads (Google’s AdSense) that are in the same or similar category but cost more per click. Nothing of value is provided to the user, except more ads. When a user does click on one of the ads, the entity may get paid $0.25 for an ad that only cost them $0.15. This entity essentially plays the role of both advertiser and publisher and provides virtually no technology or service other then automatically reselling the click traffic. This creates a superfluous, middle-man tier that directly drains money from the original ad provider (Google).

A web of deceit

While a good many publishers are running a perfectly clean business, this picture gets muddied up by the web of relationships and interaction involved in the online advertising sphere. There are often complex relationships involved in bringing a user from their click to the final destination. As previously described, you can see how advertisers can also be publishers and visa versa, creating an interesting matrix of profitability and fuzzy responsibility. For all intents and purposes, most players in the PPC industry can be considered both, including most of the innocent small website owners who not only buy traffic but also show ads on their site. So, for general purposes, consider the relationship diagram below.

Advertiser Publisher Relationship

If a fraudulent click occurs, and there are 4 publishers paid in the click stream, where do you go for answers? Much of the click-related data has a possibility of being faked and the sheer quantity of it turns making anything useful out of it into a daunting task.
Perhaps you start with the publisher at the start of the click stream where the click happened. Perhaps they are the most likely to falsify the click. Do you collect the money back from all the publishers even though 3 out of 4 were operating legitimately. What if the second publisher in the click stream knows that 80% of their traffic comes from publisher 1, so they setup a click bot that hits specific sites provided by publisher 1. This would cause fraudulent clicks to the first publisher’s traffic that they weren’t even aware of.

The art of war

There are several different methods being used to address the click fraud problem. Some of the larger ad providers (such as Yahoo) provide publishers with a traffic quality score (based on conversions and internal statistics) which determines how much and of what quality traffic they continue to receive. The greater efforts the publisher makes to clean up their traffic, the more likely it is they will stay profitable.

Those that are trying to adequately deal with click fraud will often use a combination of their own internal custom methods with a 3rd party service to help determine which traffic is fraudulent. The methods used will often look at the same data in multiple ways and work in conjunction to derive something useful. Internet traffic is highly irregular and unreliable when trying to determine something accurately, making the challenge a moving target. With this in mind, a great portion of potentially fraudulent traffic cannot be deemed 100% fraudulent, leaving many methods to rely on some sort of fraudulent probability scale employed to determine whether or not the affected clicks generate any income for the publishers.

Server-side and client-side validation

Both the web browser (or other ad display client) and the web server can provide a lot of information about a user and their activities. However, anything useful needs to be derived from a series of time stamps, IP addresses, generic browser information and any ad/click related data that can be gathered. There are many arguments regarding using server-side and client-side data for this type of validation. For the most part, server- side data (database records, web server logs, error logs, headers, etc..) can be considered more reliable than client-side data. Server data can provide much in regards to the click, the ads, user location and environment, but can only be gathered in spurts AFTER actions have been taken, leaving lots of room for assumptions regarding the overall user experience. Client-side code, while often likely to be tampered with, can provide a plethora of data regarding user events, intentions, order of events and environment that is not available to the server. Properly obfuscated and used within a limited scope of reliability, client-side validation can filter out a large percentage of fraudulent traffic, leading a handful of click protection companies to rely on it heavily in their products.

Forensic and symptomatic analyzation

Many click fraud detection systems attempt to focus more on factual information and on tracking/gathering more data from the involved symptoms. While this forensic approach can provide a good idea of the concrete data available, falsified or lacking data still leaves a large margin of error in any analysis. As in any adequate security measure, the more the behavior in question is understood, the better it can be targeted and stopped. With click traffic, several valid behavioral assumptions can be made if certain data points exist. So, while the data may not always be completely reliable or correct, using a more symptomatic approach to analyzation can provide many clear behavioral probabilities, which tracked over short periods of time, can isolate bad traffic sources.

Realtime and post analyzation

Server-side analyzation is currently the most prevalent among custom and third party detection solutions. This is due in part to the consistency and availability of server logs that already track many of the needed data points. However, pure server-side analyzation also does not require front end integration, making 3rd party integration relatively simple. A major downside to this method is that all relevant data is analyzed long after the events have happened which only makes it possible to avoid certain traffic sources in the future and makes it necessary to have a feedback loop which tells the analyzed system to undo charges for clicks hours our days after the events have happened. This not only becomes an accounting nightmare, but also misses the opportunity to catch fraudulent behavior as it’s happening. Client-side data coupled with server-side data in a realtime system can identify patterns and known behavior models as they are happening, making it possible to stop fraud short in its tracks before large sums of money are wasted on fraudulent clicks and before upstream publishers and advertisers are affected with the same bad traffic.

In-stream and out-of-stream validation

If the click fraud detection system is realtime and it is done internally or is fully integrated with a third party system, this is considered in-stream detection. Many publishers, especially the small ones with no or little technical resources, are unable to meet the requirements necessary for a full integration, so an out-of-stream option may be provided by a third party. This requires very little integration and allows the provider to capture data (using some kind of tracker pixel or JavaScript ping back) sometime before the click and again sometime after. This methodology is a hybrid between realtime client-side analyzation and pre and post server-side analyzation but doesn’t not offer all the data points and reliability of a full, in-stream integration.
In a model where clicks are rated on a fraud probability scale, any combination of the above methods can be used, within adequate constraints, to provide insight and restriction on a click’s validity.

Smoke and mirrors

One of the greatest challenges facing click fraud security is the rate of change. Browsers are constantly changing. New plugins and their capabilities are constantly making waves in website development. Developers are constantly finding new ways to leverage these tools to their advantage while fraudulent parties are doing the same. Even applications such as email clients are posing great threats to valid click traffic in ways that were not expected.

Key data points

There are many data available for use in click validation, and they are combined and used in many, many ways. For purposes of this article, the many ways these data can be used and analyzed cannot be covered. Below are some of the more common data points.

Server-side
  • IP address. Known bad IP blocks can be ignored and patterns can be determined from recurring IP’s. Geographical location can also be roughly determined.
  • Proxy, if used. Proxies can be blocked or at least treated with more caution.
  • X Forward For. This is a value that is often present when a proxy is used to show the “actual” IP address.
  • Browser Agent. This includes the browser name, version, operating system, etc…
  • Referrer, if available, which includes the referring domain. This is the web URL that the click came from. This may be non-existent or easily faked.
  • Session data. This identifies a particular user’s set of interactions, if the session data isn’t being purged from the user’s end.
  • Cookie data.
  • Time of initial impression (page view).
  • Time of click.
  • Other data specific to the click traffic, i.e. click ID’s, advertiser ID’s, publisher ID’s, impression data, etc…
  • Conversion. Did a conversion occur after the click?
Client-side
  • JavaScript enabled? Sure, there are potentially valid users out there that don’t allow JavaScript, however, if the user’s agent cannot process JavaScript, then the likelihood of them being able to complete the click process and follow through to a conversion becomes negligible.
  • Cookie data. Can it be set and read?
  • Mouse interactivity. Helps to validate actual presence of a user, versus a bot. Mouse rollovers can also be tracked an sent to the server in real time to help determine mouse movement patterns (see PTR below). Was there actual mouseover activity on the ad link that registered a click?
  • Parent window domain. Useful when ad links may be present inside of an iframe.
  • Window size. If the window is too small, it is likely that the publisher is trying to mask links so that the user doesn’t know what they are actually clicking on.

The key to using this data and filtering out bad click traffic is to understand what the fraudulent party is trying to accomplish, identify patterns or oddities, then create methods of validating and deflecting the behavior. While the landscape constantly looks different, there are some methods that fraudulent parties will use that seem pretty consistent.

Methods

Standard bots

Problem: Recognized bots such as the ones run by Google, Yahoo! (and hundreds of other search engines) will create lots of non-user traffic which will often inadvertently create click traffic in the process. This can account for a relatively high percentage of actual site traffic, causing a publisher to make a lot more money than they are entitled to. It is not uncommon for a publisher to implement a click fraud filtering system, only to find out that more than half of their traffic was non-user traffic, effectively slashing their profits. While this traffic isn’t considered fraudulent, it should be filtered out so that advertisers don’t get charged for invalid clicks.

Solution: As a primary filter, don’t allow any known bots that are correctly sending their agent string to register a click. Then put other limits in place regarding the number of clicks allowed by certain IP’s within a time period. Could a cookie be set? Can the requesting agent handle JavaScript? This should catch the bulk of the harmless ones.

Click bots

Problem: One of the primary ways fraudulent parties attempt click fraud is through automated bots. These bots will target certain sites where the target ads are known to be and very efficiently simulate clicks at a very fast rate. New bots are consistently smarter and attempt to simulate user behavior, environment and even run JavaScript.

Solution: Additionally, do some consistency checks for valid browser agents and filter out the blatantly obvious ones. Put in place click restriction limits from a single IP. Even IP’s with large offices behind them should not generate hundreds of clicks on the same ad within a few minutes.

Spoofing IP’s and user agents

Problem: Many click bots and similar systems will take the next obvious step and attempt to make each click look like a unique and valid user by randomizing fake IP’s and incorrect User Agents.

Solution: This is where client-side checking can come into play. If JavaScript is enabled, assume the agent is valid and proceed with mouse event and page/browser property checks. Does a cookie check reveal that the user already clicked this ad recently?

Falsified ad text and diversion

Problem: Ads are displayed with text that is more likely to get the user to click on the link instead of text that accurately describes the ad target. This fools users into clicking on multiple ads that they are not interested in while they look for one that is legit. Often they will be offered links to similar topics that attempt to keep them looking for what they want until they click on an ad. Sometimes, this textual falsification may simply be to show a more common topic in the ad text but have the ad link go to a much higher paid PPC ad. For example, the ad may be for a $0.20 PPC “ring tone download” ad but will go to a $20 “mesothelioma” ad.

Solution: Other than random manual validation of publisher websites, some log analysis can be done on the higher payout PPC traffic to see if any particular referrers have unexpected amounts of traffic to particular keywords and if the referring site content matches the keyword.

Hidden browser layer

Problem: The publishing web developer will blatantly put a transparent layer over the page so that regardless of where a user clicks, they click on a link that takes them to an ad. It is possible to get away with this on a small scale for some time before being caught.

Solution: Because this is a real user, it is up to the ad provider to notice the problem (which can be flagged by lack of conversions from the site referrer) and then go to the site to manually verify the problem.

Hidden clickable iframe

Problem: This is similar to the hidden layer problem, although the web developer puts a transparent 1 pixel by 1 pixel iframe under the mouse pointer which follows the mouse everywhere it goes on the page. Wherever the user clicks, they generate an ad click without knowing it. This can also be tricky to notice and track down.

Solution: This may require some manual checking, but a check for mouseover events on the other ads in the feed may reveal that the user is not able to view the other ads.

Small-scale manual clicking

Problem: Some people looking to make a few quick bucks may simply manually click on the ad links. They may get their friends and associates to do the same, especially for higher priced PPC terms. This is very unsophisticated, but it is easy to stay under the radar until the PPC costs or number of total clicks and conversion ratios becomes noticeable.

Solution: Check for recurring traffic patterns and IP similarities Also check for conversion ratios on curious traffic..

Paid to read (PTR)

Problem: This is a more organized version of the small-scale manual clicking method and often starts as such. In order to keep from being detected, a fraudulent party needs to find ways to make the traffic look like valid and interested users. The users need to have varying IP addresses and need to click on a variety of ads to keep from generating any obvious patterns. Publishers will pay users, many foreign, to do nothing else but click on ads in return for a fraction of the rev share. These users will often click on links from their email clients which won’t send a referrer.

Solution: In this situation the user will show a mouse presence, but may continually click on the same link without “browsing” and rolling over any other links. If the rollover to click ratio is near 1:1, this can be a flag of PTR traffic.

Improper Traffic Purchasing

Problem: Many web site owners will attempt to purchase traffic that is outside the contextual topic of the website intent in order to increase traffic. For example they might purchase traffic for the term “hot rod photos” but the
site topic may be “ring tone downloads”. This unqualified user traffic will often click on some of the displayed ads since they didn’t find what they were looking for.

Solution: The big ad providers will have someone manually verify that a company’s website content matches the terms they are purchasing which stifles much of the problem. Google’s AdSense automatically generates the appropriate ads based off the site content. However, if an ad feed is being given to other publishers, there is not chance for interaction with something like AdSense. Conversion ratios and overall traffic quality can be good measures to raise flags when something looks unusual.

How big is the problem?

Estimations as to how many dollars are wasted through click fraud are all over the place. Realistically, no one really knows for sure. Who gets blamed in the tightly woven web of PPC advertising?

When working with a click fraud detection company, the publishers and advertisers we would deploy would often become very disillusioned with the quality of their traffic. Most of them would go through an initial phase of denial, swearing that our system was broken. We would then go through a phase of traffic validation to substantiate the click fraud claims. After a few weeks of traffic adjustments and analysis, some clients would realize that 60% to 90% of their traffic was completely bogus. Even though they had some incentive to clean up their traffic, they would often drop the click fraud detection service because cleaning it up would mean steep cuts to their revenue. At up to 90% fraudulent traffic, this shows that entire companies are thriving on a bubble of almost pure fraud.

Continuing problems…

Click fraud inflates the PPC market causing continuous challenges for advertisers and the market as a whole. Inflated bidding competition drives term prices up and the likelihood of a conversion down making PPC advertising a questionable long term strategy. But like the demand for oil, PPC advertising costs will continue to soar but won’t likely diminish the PPC market because of advertiser’s dependence on online advertising.

Can click fraud be stopped? A higher standard must be set, however, this is a moving target. Click fraud methods are constantly changing and becoming more complex and those players that are deeply embedded in click fraud are constantly ahead of the game.

Perhaps the question that really needs to be asked is whether or not click fraud be controlled within reasonable limits and allow the PPC market to continue to thrive. Like any other type of security or validation, keeping up with and adjusting for the dynamics of common methods and best practices is a good place to start. Ad providers like Google and Yahoo! need to be more stringent on traffic source quality and create tighter restrictions for those whose traffic is questionable. Safeguarding against click fraud needs to become a standard expectation for operating as a publisher in the PPC market. Publishers need to consider taking a pessimistic approach to their own direct traffic and their applications should reflect this, stopping bad traffic at the root of the problem. Advertisers and upstream publishers should consider taking an optimistic approach to the traffic that passes through them (since they don’t have direct access to the original click context and environment), but still develop the tools necessary to analyze and reject blatantly fraudulent sources.

Do your part!

The best way to get started with validating clicks is to take the first step of filtering out known web bots. Since this accounts for much of the non-convertable traffic which advertisers pay for, it can make a huge dent in the problem. Second, start putting the more advanced validations in place as described above. Third, if necessary, start working with a 3rd Party traffic validation company like Click Forensics, Anchor Intelligence or ValidClick.

fudnik Development , , , , , , , ,

PHP That Just Works

September 18th, 2009

I’m not one of those developers that likes to waste time setting up my dev environment. If I have a project to complete, I should be coding, not messing with config files, compiling Apache or messing with PHP to load that extension I just found out I needed. I like messing with my machine but not with an impending deadline.

With all this in mind, I’ve tried to simplify my entire dev environment over the years… not from a perspective of using simple tools and sticking to basics, but instead from a perspective of optimizing my workflow and keeping my development moving. In the middle of a project, I feel that systems admin focus should be on tweaking the production machines, rather than screwing around with my local dev box.

Zend Server CE Control PanelAlways looking for ways to make development easier, I decided to give the Zend Server CE (Community Edition) a try. The idea is that it installs your PHP, Apache, MySQL (with PHPMyAdmin) and a great management console that allows you to install extensions with just a click. You can still customize our Apache conf and other things, but it works well out of the box (I’m on a Mac). While you can run it along side another Apache installation, I tweaked it to run on port 80 and handle multiple virtual hosts. While this may not be ideal for all teams, it can allow everyone to have the same environment without having to mess with poorly updated all-in-one dev environments.

One of the reasons this excites me, being on a Mac, is that every time in the past that I’ve updated my Mac OS, the install kills something on my system causing my development environment to go all wonky. Then I have to spend precious work time to fix it. The Zend Server CE install keeps everything nice and tidy and, to my knowledge, doesn’t rely on other stuff outside of the install to function (unless you are setting host entries in your hosts file, /etc/hosts on a Mac).

This, oh so sweet, environment gets a little better. While I still have a variety of development tools at my disposal, my main IDE has become Zend Studio (Eclipse). I know, I know… there are a lot of purists out there that say it’s too heavy, or sluggish, or isn’t simplistic enough. There are occasional bugs or things that annoy me, but at the end of the day it is integrated enough that it lets me get my work done. That’s the whole point of an IDE. It also integrates with Flex Builder plugin which is a plus for me.

fudnik Development, PHP, Reviews , , , , , ,

Agility Futility

April 30th, 2009

Agile development methods have really taken the IT world by storm. In the last few years Agile has become THE way to manage and develop software, especially among young, emerging companies. It brings to the table a flexible model for communication and progress as well as a sense of anti-corporatism which is heavily embraced in many IT workplace cultures.

While this almost hippie-ish movement of peace, love and agileness has really relaxed a lot of work cultures and has been a boon for productivity and customer interaction, there are some often ignored pitfalls which eventually leave a work culture devastated and disallusioned.

Do it for the right reasons
It’s not enough to adopt Agile just because it works well for some or because you read about it on a trade blog. Agile, or any other hastily adopted process or methodology, cannot solve all your problems. It will simply make you more of what you already are. Your weaknesses, if not already apparent, will eventually surface and you must be ready and willing to acknowledge and address them.

Successfully adopting any methodology like this requires that you have an adequate paradigm about people, business and clients which instills respect and integrity and is in sync with the methodology. If your efforts are only surface-level rhetoric, and no paradigm shift occurs, the process will fail and you’ll be looking for the next “great thing” to fix your woes.

Use best practices
While Agile lends itself to a more rapid pace of development, it can be easy to leave crucial parts of the SDLC out of the equation in the interest of time. Adequate quality assurance and testing are often the first to go. Test-driven development, which utilizes a testing process as part of your development, is a great way to minimize QA overhead while maintaining work quality. Building test code as, or before you develop may add a little to your initial timeline but will result in fewer deployment panics and provide built-in specifications for your code to adhere to.

Don’t sacrifice quality
Cutting corners is a big no-no. Decide what are features and what are bugs. Determine which of them are in your critical path and develop them properly. If you can’t do them right, choose not to do them or arrange for more time to complete the project. NO ONE benefits from poorly thought out, shoddy work. Management only seems happy until they realize the problem they rushed you to fix ends up worse than before. It is the developer’s job to speak up and communicate risks and issues which then translate into proper timeline and feature negotiation.

Don’t ignore problems. Moderately plan for the future and proactively address problems and improvements through iterations. Ignored problems build up over time and eventually result in a complete rewrite. Iterative development can be your friend. Keep track of issues and slip some into each iteration so you can keep up with the change.

Be realistic
There is an old project management addage that explains how with every project, three factors are desired: speed, low cost, and great quality. You can pick two. Having all three is a fantasy propagated by poor sales teams. This is because any improvement in any one or two of the factors will negatively affect the third. For example, if speed is crucial, it will likely affect quality and cost. If very low cost is required, completion times will often be longer and quality will suffer. The only way to realistically improve one of the factors is to improve your effectiveness in all three of them. Attempting to use Agile development concepts to short-sightedly manipulate any of these factors is counter-productive.

Avoid burn-out
Finally, keep in mind that overworking your developers is counter-intuitive in an Agile model. With a more top-down, waterfall approach, you may get away with piling on extra hours, shoving more into a deadline and driving with a whip. Burn-out doesn’t make for solid code, good morale, communication and low turnover; all which are factors behind a well-functioning Agile machine. Utilize iterations to drive realistic deadlines and continually reassess based on top priorities to keep everyone focused on the same goal.

fudnik Development, Technology , , ,

GE Brings Minority Report to Life?

March 9th, 2009

Okay, well not quite, but I thought this was pretty amazing. GE “brings good things to life”, almost literally.

Tom Cruise doing virtual computing

Tom Cruise doing virtual computing

For those of you that have seen Minority Report, you know that that people have been trying to recreate that type of computing model in the real world since the movie came out. Well, it doesn’t quite exist yet, but GE may be headed in the right direction.

Tom Cruise doing virtual computing

Tom Cruise doing virtual computing

As a way to draw interest to their Smart Grid energy technology, GE has created an interactive 3D experience that is pretty startling at first. It almost seems unreal… until you realize that it is actually interacting with you. Check out a video of the guys at doppelagent.de experiencing this first hand, although you will want to try it out for yourself.

Okay, so it’s not quite Minority Report level computing. However, with the live human 3D interaction inside of a virtual, yet real, space, all done in the comfort of your browser using Adobe Flash… this is quite amazing, nonetheless. I would love to see this kind of technology take off and be available in a browser… maybe we’re not far off.

fudnik Flex, Technology , , , , ,

Variable Conflicts in JavaScript

March 8th, 2009

It is quite common to find yourself with a heinous JavaScript error on a page that until recently seemed to work flawlessly. Perhaps you changed your JavaScript. Maybe you included a 3rd party script or a script from another domain onto your page. Now, everything that was once peachy has turned to sour grapes!

More than likely the problem is that with all the varying scripts on the page, variables from other functions will conflict with variables in the existing code, causing failures and errors, and even worse, overwrite variable values without any notification. It can take hours to track down variables that conflict between scripts before it finally works. Some developers figure that these kinds of issues are probably just inherent in client-side web development and use that as another “reason” as to why JavaScript is inferior.

I don’t think client-side development should be looked at as inherently quirky. Sure there are some browser nuances and environment issues that you can’t control, but you can develop very robust code that works well and adequately serves its purpose.

There are a few key things that you can do to make sure you code is clean and runs in it’s own scope.

First, anytime you create a variable in a function that should not be available outside of the function use the “var” identifier to initialize the variable and restrict it to the local scope.

1
var item_count = 20;

Second, I would recommend putting much of your code into JavaScript prototype objects. The prototype method of creating objects is JavaScript’s way of creating a class-like object (although prototypes are quite different from actual classes, read up on JavaScript prototypes for more info). In short they allow you to create a group of related function that can share assets between prototype functions (method equivalents).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// start by creating your initial prototype function, like a constructor
function Calendar(month, year)
{
    this.month = month;
    this.year = year;
    date = new Date();
    this.current_year = date.getYear();
}

// create functions that inherit characteristics of the prototype Calendar
Calendar.prototype.display_Month = function()
{
    var max_days_in_month = 31;
   
    ... continue body of function
}

// reference variables from the prototype using "this"
Calendar.prototype.display_Week = function()
{
    if(this.year < this.current_year)
    {
        // return error of some kind
    }
    ... continue body of function
}

Later in your code you can instantiate one or more of the prototype classes, each having their own scope and assets. This will keep them from conflicting with any other code.

1
2
3
cal_1 = new Calendar(5, 2005);

cal_2 = new Calendar(3, 2006);

If you abstract your code well enough, using the power of JavaScript in this manner allows you to create very reusable code that can be used in any application with any combination of JavaScript without problems. There are other things you can do to abstract your JavaScript and make it more functional, but these examples serve the purpose of resolving scope issues and get you on the road to cleaner, reusable code.

fudnik Development, JavaScript , , , , , ,

PHP Optimization

February 12th, 2009

Many developers don’t really put much thought into code optimization. Frankly, their applications don’t see enough traffic for optimization to be much of an issue. However, regardless of your application’s actual needs and whether or not you are having speed issues, there are some good habits that you can develop that will either help you in a bind or just ensure that all of your applications are finely tuned machines.

Optimization can be frustrating at times. I am quite familiar with server systems, however, I do not consider myself a full-fledged systems admin. The more I learn, the more I realize I don’t know. I do know that finding the right balance between server and code optimization takes skill. Too much customization in either direction can make the code or server difficult to manage. Sometimes, throwing more hardware at the problem can do the trick, but this is usually just temporary as the problem usually multiplies itself by the number of servers your application is running on. Inversely, good optimization improves performance across all the servers the application is running on.

Typically, I figure that if a particular change makes something overly difficult to manage, then it probably is not worth doing, because there are usually other people involved and there is too much room for mistakes. I will sometimes break this rule for my own personal stuff since I am the only one involved. You need to decide at what point it becomes to difficult. Good documentation goes a long way.

Here are some of the things (using PHP as the example) that I will often do to optimize things at the software level, without going too far.

  • It’s good practice to use literal strings wherever possible. Using the doublequotes tells the parser to expect potential interpretable values in the string, slowing down processing just a bit. This can add up with HTML intensive applications.
  • Keep code files from getting too large. If two chunks of code are rarely used together split them into separate files so that PHP doesn’t have to load more code than necessary.
  • Keep file inclusions to a minimum. Inclusions require additional disk reads and adds more time to processing. Don’t go overboard and sacrifice code organization in the process… includes can be your friend. If the include makes sense, do it. MVC development often ignores this because of the nature of the methodology, but it is still a good practice to keep in mind whether using MVC or not.
  • Try to convert any uses of division to multiplication. Division eats up processing, especially if iterated several times. Example: instead of $var / 10 do $var * .1
  • Utilize break or continue to control code flow in loops, etc… If you review enough of your code, there are probably some areas where you are using a for() or foreach() that are running beyond their necessary iterations. In other words, you run the loop to accomplish a certain task or value, but the loop continues to run through all possible interations, even after the task or value is complete. Either find ways to use while() or use break or continue where possible.
  • Sometime a caching application can help ease unnecessary processing. TurkMM or APC can dramatically improve PHP processing speeds by keeping realtime code compiling to a minimum.
  • Other types of caching can be done using Memcached or other similar code. This type of caching can cache files, database results, large data objects, etc… If there is any data that requires processing that doesn’t change much between pages or between users, this type of caching can drastically speed up response times. This is not only helpful for bypassing unnecessary processing, but it can also limit your application’s need to hit the database. Memcached can be used on many levels of the application to reduce processing and web service request overhead. For example, a list of users may not change very often, so there isn’t a need to retrieve a fresh list upon every request. You could cache the list for 2 hours, for instance, and your application would only have to query the database for the list once every 2 hours, instead of every page load.
  • Only use SSL where necessary. SSL encryption slows the response of your application.
  • A browser will hold on to the connection with the webserver as long as it is waiting to receive data, hence requiring the webserver, in this case Apache, to standby until everything is processed and ready to send. In some cases, your data may not require the user to review anything afterward, so it may be a good thing to consider forking your code into multiple threads. This can be tricky, but if done properly, Apache will respond to the user faster, leaving connections open for other users and allowing the code to finish in it’s own time. There are also functions in PHP that allow you to check if the user’s browser is still responding to the connection and close the connection if needed.
  • When a database is involved, use good SQL and data handling processes. I won’t go into a lot of SQL specifics here, but the following are things to consider:
    • Only request the data you need from the database. The more data that is requested, the more that has to be sent across the wire and get parsed by the application.
    • When your application is done processing a large data set, it should release the result set to free up memory for other operations. If only parts of the data are needed for processing later in the request, those parts of the data can be copied to another data object while the original object is cleared. Most applications don’t work with large enough data sets to have this concern, but when you do, you’ll find that automatic garbage collection won’t be enough.
    • Most modern databases will allow you to combine multiple operations into a single SQL request. Taking advantage of this can GREATLY minimize your application’s overhead of going back and forth to the database. For example, you can combine multiple SELECT queries using UNIONs. You can do conditional INSERTs when you might normally do a SELECT, check the logic in your code, then do an INSERT. Take advantage of subqueries. Also, in some cases it can be beneficial to do a few large queries early in your application and avoid the several smaller ones that might be required later.
    • Take advantage of query caching and preparing if your database and code support this functionality.
    • When possible, have the database be on the same machine as the application. This introduces some scalability challenges (to be discussed at another time), but can be worth the effort, even with large, distributed applications. This will keep your code-to-database processing times VERY fast.
  • Keep what your store in sessions to a minimum. You can also use Memcached as a custom session handler between servers to replace database sessions across multiple servers, for improved performance.
  • When possible, “minify” your JavaScript, CSS and HTML. Keep the output to a minimum and write your code so that the browser will take optimal advantage of cached CSS, JavaScript and HTML
  • PHP is an interpreted scripting language. When it comes down to it, it can only move so fast. For larger applications that do a lot of processing, it can greatly decrease the system load to offload major processing to a program written using a compiled language such as C, C++, or yes, even Java. You can build an RPC interface to the compiled app using JSON, SOAP or XMLRPC. Service communication may take some overhead, but the compiled app will more than make up for it. Using this method allows you to keep your interface code flexible using PHP while gradually putting any labor-intensive operations on to something more suitable.

Doing any or all of these things will benefit your applications greatly and still keep your code manageable. This doesn’t address all of the innovative things you can do on the UI and client-side of an application to improve performance. Of course there is always more you can do, but this should get you started.

fudnik Development, PHP , , , ,

PHP Error Handling

January 5th, 2009

For many developers, error handling is somewhat of a myth. They’ve heard about it but with pressing deadlines and management ignorance they are often not given ample opportunity to learn about, let alone implement, proper error handling and debugging techniques.

So, like many things, there are many ways to skin this cat. When working with debugging and error handling, I usually keep the following things in mind:

  • How to trap errors and debug messages
  • How to collect messages
  • How to output messages without disrupting the display
  • How to integrate throughout entire application

One of the most important things to keep in mind with debugging is that OOP is your friend. I won’t get into the pros and cons of OOP here, but just mention that keeping things in classes allows you to abstract your debugging, handle it cleanly and keep it specific to the needs of the class. With this in mind, my following recommendations will be based on this premise.

How to Trap Errors and Debug Messages

Trapping code problems is relatively straightforward. You want to do some kind of test or comparison to determine if there is a problem. Then you determine how to get the message that adequately explains the problem and make it available to the rest of your program. Sometimes a simple if/else block will do the trick to test values that would not otherwise error out, but may need to be set for certain logic to function properly. PHP 5+ provides exception handling using try/catch which is very useful for catching errors that would normally kill your script, allowing you to then handle it accordingly and capture the error message. For example:

1
2
3
4
5
6
7
8
9
10
try
{
    file('my_file.txt');
}
catch(Exception $e)
{
    $error = $e;

    // do something here to compensate for the error
}

You can also create your own exception class that allows you to automate specific actions when certain errrors occur, like logging to a file, emailing the admin, etc.. You can find more information about this at http://www.php.net/manual/en/language.exceptions.php

How to Collect Messages

The first thing you’ll want to do is determine a standard method of collecting messages that you can use in all your applications and classes. You can either write a separate class to do this and extend it in your other classes, or you can create methods within each class that handle things specific to the classes needs. In my example, I collect the messages local to the class.

You can do the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
class My_Class
{
    private $last_message = '';
    private $message_array = array();
    private $error_array = array();
    private $debug = false;

    function __contruct($debug)
    {
        $this->debug = $debug;
    }

    private function add_Message($message)
    {
        $this->last_message = $message;
        $this->message_array[] = $message;

        return true;
    }

    private function add_Error($error)
    {
        if($this->debug)
        {
            $this->error_array[] = $error;
        }

        return true;
    }

    public function get_Last_Message()
    {
        return $this->last_message;
    }

    public function get_Message_Array()
    {
        return $this->message_array;
    }

    public function get_Error_Array()
    {
        return $this->error_array();
    }

    public function another_Method()
    {
        // some code goes here
       
        if($problem_found)
        {
            $this->add_Message('This did not work.');
            $this->add_Error('Very complex error for only me to see.');
        }

        // continue method
    }

    public function yet_Another_Method()
    {
        // some code goes here

        try
        {
            // some code that could fail
        }
        catch(Exception $e)
        {
            $this->add_Error('Another very complex error message. ' . $e);
        }
    }
}

This allows you to add a set of methods to a class that help to control messaging as needed. The methods add_Message() and add_Error() are always called if there is a problem, however, error messages are only collected if $debug is passed in as ‘true’ into the constructor. This allows your user to see friendly messages and react as needed, but keeps horrid errors from displaying all over the screen.

Implementation of the class would look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$debug = true;

$my_instance = new My_Class($debug);
$my_instance->another_Method();
$my_instance->yet_Another_Method();

// imporant message for user to see
echo $my_instance->get_Last_Message();

// if you need to show the user the entire list of user friendly messages
echo implode('<br />', $my_instance->get_Message_Array());

// output debug and error info and set to string to be handled later
if($debug)
{
    $msg_output = implode('<br />', $my_instance->get_Error_Array());
}

This example, however, is somewhat limiting as it only collects debug data within the class its self. This may be useful for a really low-level class like a database abstraction layer or some other core application class where you cannot guarantee that assets like a database connection or session will be available to do more advanced debugging logging or output. In this scenario, you can abstract your debug collection methods to their own class or have them present in each class and build methods to pass the debug around so it will “bubble to the top” of your application where you can display it and make other logical decisions.

In cases where you are dealing with more advanced classes or you know that a database connection and/or session is available, you have many other options. I use the method above for my core classes, otherwise I use a session and database dependent static class that allows me to make all kinds of debug calls throughout my application. Because they can be stored in the session, and optionally the database, I can access them for display when needed but also track recurring issues across sessions. Combined with custom exception handlers, I am also able to respond to different levels of errors appropriately and send notifications as needed.

How to Output Messages Without Disrupting the Display

Now that you can trap the errors and collect them, you need to properly output them so that they work well with your display elements. Perhaps you only want to show the user the user friendly messages and output actual errors elsewhere if you are in debug mode. I usually do two things to accomplish this properly:

  • Never echo or print ANYTHING unless it is a method or class that outputs something for display.
  • Collect the display content until all necessary logic has been processed and it is safe to determine whether or not it should be displayed.

Doing these two things alone, can make your applications operate much smoother. Here’s two examples: a good and bad way to display page content.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// GOOD WAY
$html = '';

$html .= '
    This is some page HTML and content.
'
;

$html .= '
    This is more page content.
'
;

// processing is done, echo content
echo $html;



// BAD WAY
echo 'This is some page HTML and content.';

echo 'This is more page content.';

In the BAD WAY, if an error happens between the two echos, it’s too late to properly respond. Using the GOOD WAY, if an error happens, the user display can be changed or even redirected to a new page before anything is displayed to the user. Something else to keep in mind is that if you are coding a display method that is used somewhere deep in your program, you may want to consider returning a string rather than echoing out at the end of the method. This way you can still give control to the top level display methods and echo it out when you are ready.

How to Integrate Throughout the Entire Application

Now you are capturing debug and error messages, you have them integrated into your class, and you are outputting the info at the right time, however, your application is probably more than one class, so how do you integrate this into a large application?

If all of your classes are handling errors and messages like the examples above, you can pass the arrays of messages up through the class hierarchy and merge arrays of messages, so that the final display class can handle them properly. It’s ideal to also pass the $debug parameter down into all your classes so that as they extend or instantiate each other, everything has the same debug mode, either on or off. Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
class New_Class
{
    private $last_message = '';
    private $message_array = array();
    private $error_array = array();
    private $debug = false;

    function __contruct($debug)
    {
        $this->debug = $debug;
    }

    public function some_Method()
    {
        // some code goes here
       
        // instantiate needed class with debug mode from this class    
        $m_class = new My_Class($this->debug);
        $m_class->another_Method();

        // collect messages and errors from $m_class
        $this->message_array = array_merge($this->message_array, $m_class->get_Message_Array());   
        $this->error_array = array_merge($this-> error_array, $m_class->get_Error_Array());

        // continue method
    }

    private function add_Message($message)
    {
        $this->last_message = $message;
        $this->message_array[] = $message;

        return true;
    }

    private function add_Error($error)
    {
        if($this->debug)
        {
            $this->error_array[] = $error;
        }

        return true;
    }

    public function get_Last_Message()
    {
        return $this->last_message;
    }

    public function get_Message_Array()
    {
        return $this->message_array;
    }

    public function get_Error_Array()
    {
        return $this->error_array();
    }

}

Obviously, there are several ways to handle these kinds of things. As long as you are consistent with all your code and provide a simple way for you to capture errors, you’ll have a great start… and your process will grow and mature with your application requirements. These concepts only scratch the surface, as they can be abstracted into a very dynamic and robust debugging framework with a lot of flexibility… but that is a topic for another time. Regardless of how you do it, if you put some good thought into it, you’ll save yourself a lot of time and headache down the road.

fudnik Development, PHP , , , ,

Handling Foreign HTTP Variables

December 19th, 2008

I have been asked several times about various ways to secure e-commerce applications and other systems from unexpected or badly formatted POST/GET variables. This is a common issue as many developers only develop for the expected and test their application accordingly. Developing for the unexpected can be a bit tricky.

I’ll try to address this issue generically enough that it can be useful to anyone who has a similar issue. I’ll show examples using PHP, however, this process can also be implemented in a similar way using any similar language.

Scenario: You have a small shopping cart that you built for a client and have recently found out that one of the client’s customers had found a way to submit incorrect data to the shopping cart by saving the HTML form page to his computer, then modifying some of the values, and sending the form from his computer. This allowed him to add more options to his product and also lower the price at the same time. Your system didn’t check for anything like that, so you didn’t find it until last week, a whole month later, when you noticed the purchase price in the database was drastically different.

There are a few things you can do to safeguard your application against this kind of problem.

First, you need to make sure that the request comes from a page that is on the same server as the application receiving the request by comparing the referring domain, or you can even go the level of checking the page it came from as well.

Example of domain validation in PHP with Apache:

1
2
3
4
5
6
7
8
// regex match to find referrer "www.domain.com"
preg_match("/https?://([a-zA-Z0-9\-\.]+)/", $_REQUEST['HTTP_REFERER'], $matches);

// check if referrer domain is the same as local
if($matches[1] == $_SERVER['SERVER_NAME'])
{
    // continue processing
}

The previous example verifies that the referring domain IS actually the application domain. You can’t just do a check to see if the server name exists in the referrer, as that can be fooled as well. If the referrer is http://www.mydomain.com/yourdomain.com/hack_you.php, the following would allow the referrer to pass and send invalid data.

1
2
3
4
if(strpos($_REQUEST['HTTP_REFERER'], $_SERVER['SERVER_NAME']) > -1)
{
    // continue processing
}

Second, it’s bad practice to pass anything as a form field value if you can reference using a database id instead. In other words, if a price is associated to a product in a database table, don’t pass the price of $10.00, pass the product id (say 342) and reference the price in your code using that id to extract the information you need from the database.

It’s good practice in general to do whatever you can to never expose data that could be harmful if modified. Along with the previous price scenario, you want to keep user data, database config data, product information, etc… out of sessions, cookies, form fields, etc… Use id’s and other types of identifiers to reference the data in your code.

Now, it’s important to understand that anything is hackable if the attacker has enough time and resources, so there is always the possibility that someone could not only spoof your domain, but with enough attention to detail, they could learn id’s and other values that you are passing them and find ways to send them or modify them in transit or while doing a domain spoof. If your database table structure is setup right and you have guarded against SQL injection (which is a whole separate topic), if id 342 is being passed in your program and they change the id to 549, when your program queries 549 in the database, it should find the appropriate price and associated information. In contrast, if you pass multiple variables in association with one product, changing one of the values, could completely change what you expect to see in your program.

Third, consider creating some tight restrictions on the incoming variables. The first way to do this is to not use the $_REQUEST array, as this allows either incoming POST or GET values. If your form uses POST or GET, you should use the corresponding $_POST or $_GET arrays to access the data. This will limit the ways a hacker can send values into your application as well as lessen the amount of validation you need to do.

In addtion to using the proper incoming array, you can create a variable register that limits the values you expect on that particular page. For example, if you are expecting to see “product_number” and “product_id’ in $_POST, but your application also receives a “price”, you can be sure that someone is sending values to your application that are unexpected. You can either set these values to null, or set flags or logs in your system to notify you that you may have someone trying to manipulate your application.

Checking for unwanted variables in PHP:

1
2
3
4
5
6
7
8
9
10
11
$allowed_vars = array('product_number', 'product_id');

foreach($_POST as $key => $value)
{
    // if not in array of keys, delete it
    if(!in_array($key, $_POST))
    {
        // be careful not to delete vars that may be automatically sent by PHP or Apache
        unset($_POST[$key]);
    }
}

The last thing you’ll want to do with your most important incoming values is verify that limits and values are not out of range and are acceptable. If someone is registering for something and they get free points or bonuses or something, you want to make sure that any variables being sent that contain these values are checked for minimums and maximums, etc… The same can be applied to various other types of values. Just be smart and validate things that your know are crucial to the functionality of your application.

fudnik Development, PHP, Ruby , ,

Smarty For Dummies

September 5th, 2008

Are Smarty Templates really that useful? Well, I have my opinions, but decide for yourself. I hope you’ll do a doubletake after reading this.

For a long time now, I’ve avoided using Smarty Templates as much as possible. Perhaps it’s the cheesy assuming name, or perhaps it’s that I like to keep my applications simple to deploy and free of unneeded dependencies. More than anything, I think it’s because templating has been a solved problem for me for some time now and I didn’t have the desire to fix what was not broken. 

A while back, I had the opportunity (tongue in cheek) to use Smarty Templates for an existing project I was working on, and I was able to evaluate its functionality, its proposed NEW paradigm and how it integrates into the development process. Now, by no means am I a Smarty expert now and I don’t know everything about its benefits, but for better or for worse, I have formed an opinion that seems to adequately describe the “reality” of Smarty Templates. They pretty much work about how I expected. No surprises there, but here are some things to think about. 

  • Smarty Templates are simply one way of implementing a template based system into a website. It adds another layer of processing to your development. It has the benefit of caching parts of the design so that processing is lessened.
  • The main idea behind any templating system or CMS is to provide separation between the various layers of development: design, site structure, basic logic, core libraries, etc… Some systems provide some basic separation and others provide many layers of separation. Smarty Templates aim to resolve a lot of the problems that come about with having multiple roles working on a site: designer, interactive developer, programmer, etc… Separation of these roles can often be difficult. I think much of this comes down to a core problem with the PHP community as a whole. PHP has become a very popular language but has also been adopted by a lot of people that are not programmers by trade. Thus, the lack of standards and experienced developers is not at the same level that you might find in other web or application language communities. Smarty Templates, although seen by many as a godsend, seems to be a temporary solution for a community that for the most part lacks the structure and standardization to solve the problem in a more appropriate way.
  • Smarty Templates allow you to build backend code, allows designers to build templates, and then allows the developers to hodge-podge them all together with a series of inclusions. This, like much of the PHP code I see on the web, lends itself to disasterous application structures and does not enforce a paradigm. I’m all for flexibility, but all Smarty Templates have done is add another arbitrary level of confusion on top of what is usually already messy code.
  • I understand that Smarty Templates is supposed to shelter core logic from the designer, allowing them to use a “templating language” to create the display. However, the Smarty language itself uses programming methodologies to display data. So, not only do you have programming on top of programming, but the designer still has a way to destroy the interface from lack of knowledge, understanding and perspective. If the interface looks wrong or breaks in some fashion, where do you go to fix the problem?
  • Smarty Templates were obviously created from a developers perspective. I am a developer with a design background and I have to say that with good understanding of both perspectives the lines between design and development are blurred with Smarty Templates. A developer can simply develop less logic and assume that the designer will take care of the rest with the variables he has access to, or a developer could limit the designer by only making available the core necessary values. The point is that, this line is blurry and just adds yet another level of abstraction with which internal standards must be enforced. Not so ironically, although developers have told me that they use Smarty Templates to allow the site to scale and to allow the designers to maintain the templates, I have never seen a situation where the roles were separated. The developer creates the PHP, then the same developer has to go in and modify the templates, bypassing the whole proposed benefit of using Smarty Templates in the first place.
  • With some more forethought, developers can create very structured and modular code, that with the proper API’s and CSS integration, can give the designer all the control necessary while still protecting data. Ideally, you would have a robust CMS, that allows your designers to have control over certain components, while still being able to deploy your core logic and modules. Then, for areas of the site that need more flexibility, you could use Smarty Templates within the CMS for those designers with a bit more technical skill and allow them to control sections of the page with Smarty Templates, not the entire site structure.
  • Although Smarty Templates can be useful if standards and structure is enforced, it seems as though it’s just another bandwagon thing that gets adopted out of pure social acceptance before the needs and resources of the project are adequately assessed.

I know that some of you may LOVE Smarty Templates and I understand that it may play a crucial role in your development, but please keep in mind that I am always ready and willing to use technologies that TRULY benefit my development (I’ve used Smarty on blogs for instance). This isn’t the place for flaming, but perhaps you can share something that I have missed. 

If I think really hard about what my needs are as a developer, Smarty kinda makes sense. The problem is that Smarty Templates seem to be the result of a desparate developer who’s fed up with designers messing up his apps, and/or who simply hates HTML (or all of the presentation layer for that matter) and doesn’t want to do anything but pump out logic. I don’t see how Smarty Templates benefit the designer or the project management process. I know, I know…. there are all the arguments about separation of presentation, from logic, from data, etc…. blah, blah, blah. Smarty is no more a clean separation of these layers than adding more frivolous layers of management to an organization to make it more effective. Smarty attempts to make the separation, but it’s not clean. The designers job should be completely separate from any kind of backend logic, allowing them to focus on HTML, graphics, CSS, etc… Smarty Templates don’t enforce this separation as HTML can be done by the designer, or a Smarty variable can contain HTML. So where is this separation they speak of? The second a developer includes HTML into any Smarty variable, the notion that the designer has full control over the UI goes right out the window and all you’re left with is another blurry layer of logic….. logic that cannot adequately communicate errors or debug info back to the backend logic. 

Ideally, if you want a cleaner separation of code and design, consider using or creating an MVC style framework which more strictly enforces this separation and allows for constrained usage of Smarty Templates or similar templating code. Zend Framework, Code Igniter and Cake are a few worth looking at.

I’m not trying to deter people from using Smarty Templates. I want to make it clear that I am simply urging you to use them as a tool to aid and not replace proper development. I see the value that Smarty Templates has the potential to bring to the table and will probably continue to use it for some things in my development. However, in the meantime, I am jumping off the wagon until it becomes something that I cannot live without.

fudnik PHP , , , ,





agile ajax black hat Cake PHP centering clifford stoll css cuckoo's egg energy energy drinks espionage flash Flex hacker jquery modular MVC objects optimization performance PHP script timer smarty smarty templates stylesheet up-time uptime variable scope web 2.0 Zend Framework