Friday, July 30, 2004

Great Hacking vs Great Writing

The java blog world is up in arms about Paul Graham's anti-java tone in his recent controversial post.

See here and here for instance.

If you are a passionate programmer, Paul's writing can be very inspirational. His ongoing anti-java bias is, however, very offputting. Regardless of your preferred language, at the end of the day, we're programming finite state automata. Different languages give us more or less abstraction levels, possibly allowing certain concepts to be expressed more succinctly, but nothing can change the fact that all we are doing is manipulating bytes, comparing bytes and diverting to other memory locations based on those comparisons.

Update:

Here is a well written response, the last sentence in particular is very telling.
Here, on the other hand, is a cutting response, and the comments are even better - check out Starsky McFlirt's comments: " Oh sorry, I forgot, he also applied a basic statistical concept to email filtering as well that worked for like at least one year. Somebody give the man a f**kin Nobel prize tout suite. " still has me chuckling.

Tuesday, July 27, 2004

Open Source Databases Compared

MySQL, PostgreSQL and Firebird: well-known databases to be sure. But which one to choose? This site provides an excellent comparison and checklist of which features each database supports.

Thursday, July 22, 2004

MySQL Gotchas

This site contains a very detailed list of gotcha's with MySQL (i.e. unusual behaviour not consistent with the majority of other databases out there). They are not necessarily bugs. It's quite hard core and very detailed, but could be useful if you're ever scratching your head wondering why something odd is going on with your MySQL queries.
 
Topics include
  • Strange null behaviour
  • Varchar fields are non-case sensitive by default
  • Peculiar comment behaviour
  • Division by zero gives null rather than an error

and dozens more

Tuesday, July 20, 2004

Genetic Algorithms vs Natural Selection

A recent post by quinton got me thinking about genetic algorithms, and more specifically, about the fact that the algorithm is essentially based on Darwin's theory of natural selection. It's a great technique for solving certain classes of problems, but it troubles me somewhat that the process of evolution (upon which the algorithm is based) is not properly established.

In general, I think the evidence for evolution is very strong. However, the mechanics of how it works still needs proper explanation - at the present time natural selection seems unable to explain things properly. Let me outline a few objections to the theory of natural selection:

1) How the first replicator could arise is not at all explained. (The first replicator is the first entity able to reproduce itself). There are any many suggested theories, none satisfying. Currently, the first replicator mechanism is not explained, not proved and not reproducable.

2) The fossil subset is very poor. Only a tiny fraction of species show intermediate forms, and no species show smooth transitions. Darwin himself thought that the lack of fossil records was the biggest threat to his theory, but was confident that in the fullness of time, sufficient fossil data would come to light. Good hominid fossil records have been found in recent times (presumably because human evolution is more interesting), but that aside, it's arguable that over 100 years after his death, the fossil records are even poorer (as some promising fossil records from his time were proved to be incorrect).

3) Irreducible complexity. This is a popular weapon of creationists, typically they refer to complex components such as the eye. More compelling are some of the metabolic pathways present in organisms - the fact that complex molecules are synthezed in 12 or more steps, with no useful by-products. Hence the whole process would have to have been discovered "at once".

4) Staggeringly variable rates of evolution. Certain species seem to have "forgotten" to evolve despite being subject to the same evolutionary stresses.

5) Problems with DNA itself. Despite the genome mapping project, it is becoming increasingly difficult to see how DNA could contain enough information to define a complex phenotype.

6) The timescales present major difficulties, perhaps even the most significant objection. The timescales to evovle from hominoid to hominid seem too short by many orders of magnitude.

Now, the bulk of these objections can be overcome by either (a) substantially increasing the time available or (b) coming up with a better mechanism than natural selection.

Let me stress again that I have no time for creationists, who for me fall into the same category as astrologists, homeopaths and psychics (i.e. people who believe in things despite the absence of scientific evidence). However, based on the points above, I think the theory of natural selection is currently inadequate to explain evolution properly. And as such, its use as a basis for a computer algorithm is suspect.

Thanks to Alex for his input in structuring this article.


Monday, July 19, 2004

Struts Tutorial

I've found what seems to be a well put together HTML-based tutorial for Struts. Part 1 is here and Part 2 is here

Thursday, July 15, 2004

How Can Challenge Response Deal with Spoofing?

The original premise of C-R is that when spam is sent, you will effectively double the resulting internet traffic, because every spam will result in a challenge (which presumably will be ignored, or at least that's the premise upon which the C-R solution is based). However, if the spammer spoofs a legitimate email address, which I believe is very common, then the situation is much worse, as I illustrate below.

Let's say Spammer Sam sends a spam email to Alice (who uses a C-R system), but spoofs Bob's email address. There are two cases to consider:

1. Bob does not use a C-R system

- Sam sends email to Alice
- Alice's C-R system sends challenge to Bob
- Bob gets puzzling challenge mail and sends angry mail to Alice
- Alice receives this angry mail as well as the orginal spam, as Bob's mail has now passed the challenge responded to the challenge

Two extra emails have been generated (effectively tripling the email load of the original spam), and confusion for Alice and Bob is the result. Alice has also received the spam.

2. Bob also uses a C-R system
- Sam sends email to Alice
- Alice's C-R system sends challenge to Bob
- Bob's C-R system sends challenge to Alice
- this may carry on indefinitely, but one would hope the C-R systems would be smart enough to recognize duplicates and ignore subsequent emails

In this case, neither Alice nor Bob receive any confusing mail, Alice does not receive her spam, but again the original spam has been tripled.

Challenge-Response Dashed Upon the Rocks

Well, wasn't that a short-lived moment?

This article makes the important point that a critical part of the C-R system is that the challenge e-mail (that gets sent from the recipient to the sender) MUST reach the sender. If the sender's anti-spam system blocks it, then the whole process falls over. Therefore, all spammers need to do is to construct their drivel in the same way as a challenge e-mail. Game over.

My good friend rant came up with an even more damning problem: spammers use real emails which belong to others.

So the spammer sends you an email, you send a challenge back...that challenge goes to a spoofed email address...that spoofed user gets annoyed and writes you an angry email, or more likely (and much worse), gets flooded by thousands of challenge emails.

Dealing with Spam - Greylists

Yet another anti-spam methodology is gathering momentum, so-called Grey Lists. This is a server-based method for dealing with spam that sounds intriguing. The basic idea is to block unrecognized mail for a short period with a temporary failure method. The premise being that typical spam tools don't deal with timeouts and temporary failures gracefully, they just blitz the internet with their evil content and terminate (fire-and-forget). Legitimate (but unrecognized) mail with still get through, as mail servers are designed to deal gracefully with temporary failures.

One obvious concern is that you'd be increasing the traffic on the internet by some degree, as you're asking all unrecognized mail to be sent twice. On the other hand, the majority of your mail (I hope) is from people you know already (and would thus be on your white list - these would not be considered for grey listing).

Further articles can be found here and here.

Further to my recent post regarding simplified Challenge-Response systems in the ongoing battle against spam, one concern that has cropped up is when you register with internet sites and mailing lists - you typically get a confirmation e-mail to which you are expected to reply. I'm not sure how these would get through the Challenge Response system proposed.

.htaccess Explained

Well, I'm not going to explain it, I'll leave it to this site. They provide a brief, clear but comprehensive explanation of what .htaccess can do for you and provide good examples in each case.

Topics include:

- Error Documents
- Password protection
- Enabling SSI via htaccess
- Deny users by IP
- Change your default directory page
- Redirects
- Prevent viewing of htaccess
- Adding MIME types
- Preventing hot linking of your images
- Preventing directory listing

Update: Here's another tutorial (please ignore the spelling errors)

Byte Code Enhancement

I recently posted some concerns about byte code enhancements (BCE). I've tried to do a little research on the matter, but there isn't much information about it on the internet. However, having thought about the issue a little further, I've realized that, conceptually at least, there is no difference between trusting byte code enhanced classes and trusting a 3rd party library. If a 3rd party has written poor code, you're going to get bugs either way, and both cases are out of your control. Just something to be aware of, though: if there are errors in your enhanced classes, your class is still going to appear in the exception call stack - so be ready with your explanations when irate users or developers moan at you!

Writing Your Own Firefox Extensions

Ever fancied writing your own extension? Seems it's easier than you may have thought. Mostly it's just some javascript that needs to be packaged up in the right format. Here is a clear and well-written introduction.

Wednesday, July 14, 2004

JDO Concerns

JDO is largely a very impressive specification. Some issues remain troubling, however.

Firstly, and most importantly, I'm concerned about adopting JDO wholeheartedly. I know from experience that any successful database is going to attract users who may not use java. They may find it very difficult to plug their excel-ODBC or php web pages on a JDO-created database.

Secondly, the issue of byte-code enhancement. In theory, there's nothing wrong with byte-code enhancement. But the grim reality in the real world is that sometimes you have to deploy your code on less-then-perfect JVM's (HPUX anyone?) When the code falls over, you want to be sure it's your code, not the enhanced code.

Answers anyone?

Here's a good article on using JDO with legacy database.

Tuesday, July 13, 2004

A "new" Plan for Spam?

This article on a new plan for spam makes for an interesting read. The author asserts that Bayesian filters are becoming less and less effective, and that a radically simplified challenge-response system is the key to really killing off spam.

Personally I'm finding that Popfile (a Bayesian filter) combined with an anti-virus checker works very well. However, I accept that some of the anti-Bayesian devices he describes may become more common in the future. His simple challenge-response approach essentially revolves around asking the unknown sender (for it is only unknown senders that are an issue) to send a second mail. I think it has a lot of merit, with one important caveat that he doesn't mention - you will still need a virus checker, as many e-mail viruses rely on sending mails from trusted/known individuals.

Monday, July 12, 2004

Web Services and CORBA

A sobering article about how web services are evolving to reach the same complexity as CORBA. As someone who has used CORBA for many years (and still does in fact), I can tell you it solves the multi-language RPC problem very very well, but it's a large and complex beast. Web Services should be very wary of going down this route. Note that CORBA spent many years evolving under the guidance of the OMG - are Web Services as well controlled?

Dynamic Proxies in Java

An interesting and well-written article on Java Dynamic Proxies is available here. Dynamic proxies are an easy way to add dynamic behaviour to your code, after the fact. The typical example is logging, but the article also discusses hiding EJB or RMI complexities from clients. I've seen some very heavyweight (non-EJB) architectures that could benefit greatly from an approach like this (rather than relying on the poor developers to implement huge amounts of implementation code for every form). This is essentially the promise of Aspect-Oriented Programming, however, this article provides a nice java-based introduction to it.

More Essential Firefox Extensions

Live Http Headers - view and replay HTTP Headers interactively. Very useful for script debugging.

Don Box on Java

Don Box (he of COM fame) writes some interesting perspectives on the Java community (and his recent migration to Java). Nothing too profound, but it's an interesting read from someone who "personifies" the Microsoft way (or used to).

Firefox Security Issue

In the interests of fairness, I will take a brief break from my Firefox evangelism to point out that a potential security flaw has been identified for users of Windows XP and Mozilla Firefox. It allows sites to potentially run arbitrary programs. A fix for this issue is available here

Pagerank

I've recommended the Googlebar plugin before. One thing it is missing is the google pagerank. This is a popular feature, strangely missing from Googlebar. However, this page from the Googlebar site explains in more detail why it's not available, as well as providing some links to plugin sites that do provide a pagerank. Note that they do voice some privacy concerns, namely that you are in effect sending your IP address and your entire browsing history to every site you visit.

Friday, July 09, 2004

BitchX Tip

I've recently taken to using BitchX as my preferred linux IRC client (OK, my only linux IRC client). As I access my linux server remotely via ssh/putty, and typically only use IRC to DCC/CTCP files, I don't really want to keep a terminal open just for this. So after entering my CTCP request, I enter "/detach" at the BitchX prompt. This "closes" BitchX, allowing me to carry on with other stuff, but leaves BitchX running in the background (which is essential for ensuring the download completes). Of course, this is similar to using "CTRL-Z, bg", but even that causes BitchX to pause for a moment.

Removing Windows Messenger

Many people have asked me how to remove Windows Messenger. Typically because it clashes with MSN Messenger (NOT the same thing). It's also quite a severe security hole. Detailed instructions on how to uninstall it are here.

Thursday, July 08, 2004

Yet More on PHP Scaling

Ian writes a scathing piece on PHP vs J2EE scaling here. His main conclusion is that "Common Stupid Mistakes" that cause scaling problems are platform agnostic. Naturally he goes on to describe some of these mistakes


1. Development shortcuts that are quick-n-dirty that also happen to introduce slow-n-dirty runtime characteristics (let's just call those "crappy hacks")
2. Insufficient use of caching or pre-generation of components that are static or have a low change rate (let's just call those "gratuitous dynamicism")
3. Inappropriate use of caching... does that logic need to be cached or is its invocation infrequent enough that maybe a plain old CGI is exactly how it should be implemented? (that'd be "gratuitous caching")
4. Excessive round-tripping to the database (well, that's just "excessive round-tripping")
5. Tight coupling of architectural pieces that have independent scaling and/or stability requirements... (score that: "tight coupling")
6. Nailing up resources i.e. does each child thread/process require its own database connection? (another potential effect of "tight coupling")

"Writing Code is Stupid"

The author of this article contends that the vast majority of code written today could and *should* be automated. Interesting read.

Avi Info Tip

Just a quick punt for a utility I wrote a little while ago. Avi Info Tip is a small windows explorer plugin that allows you to quickly view the vital properties of an avi file, simply by moving your mouse over the file, and letting the toolip pop up.



Get it here.

Monday, July 05, 2004

More on PHP Scaling

My recent post on PHP's "enhanced" scaleability needs some clarification. Although PHP does "start afresh" with each new request, it's worth bearing in mind that PHP now supports server-based session management. This naturally requires memory on the server, so if this feature is used by the developer, one should expect to see scaling issues more in line with other server-side technology.

Installing mySQL

Here's a clear and well-written discussion on installing and configuring mySQL on a Windows XP computer. In particular, I like the discussion on how to secure the installation, which by default is very open.

Not mentioned is that you may wish to close the mySQL port (3306) on your firewall - this will effectively prevent any unauthorised remote access. This way, the only way to remotely access the database would be through your web applications talking to the local database.

Friday, July 02, 2004

Some Essential Firefox Extensions

Firefox uses tabs, so lets use them! This extension will force all links that open in new windows to open in a new tab.

You may have grown to love the google toolbar. At this stage, Google do not provide one for Mozilla. You can get an excellent equivalent of it here. For those of you wondering why to use the google toolbar, the main benefit is that the search terms appear as clickable buttons, allowing you to click the buttons to find the search terms in the page. Very nifty.

Microsoft Responding to Firefox Success

Seems the excitement about firefox has driven the IE development team to reform!

Convenient FTP

You need to do some FTP in a hurry, and you haven't got an FTP client installed? Try this web based one, it works rather well. It seems that this is a demo of the product they are selling (namely an FTP applet for web sites).

Thursday, July 01, 2004

Internet Browsing with Firefox

Get firefox. The short version is that there is no reason to stick with IE anymore. It's served it's purpose, it's been superceded, end of story.

The longer version is that Mozilla have finally come up with a browser that is distinctly better than Internet Explorer, but is also largely compatible with most sites out there. Previous versions of Mozilla (and Opera) suffered terribly from sites that were customised for IE. The features contained within Mozilla are of course fantastic, tabbed browsing is wonderful (admittedly Opera has had this for ages).

What swung it for me, however, was the vibrant plugin community. There are already loads of genuinely useful plugins available that extend the browser - allowing you to customise the browser to your heart's content. Development ended on IE some years ago. Once they killed the then browser market, development unofficially terminated. (Conspiracy theorists suggest that this was because Microsoft feared an overly-functional browser killing off windows, their cash cow, as a sufficiently rich and powerful browser would render the need for windows and client side applications irrelevant).

Javascript Formatting

On and off I've been looking for a decent javascript formatter for some time now. The recent excitement about the javascript used by the gmail (google email) website brought this to a head. After some extensive searching, I found this fantastic online service. It did a great job on the gmail javascript, and claims to support PHP, Java, C++, C, Perl as well!

"Lazy" PHP

I was interested to read about the extreme laziness of php today. In contrast with other frameworks like J2EE (and presumably ASP.NET), it throws out EVERYTHING after every request - variables, interpreted code, the lot. One would then presume that a framework like J2EE, which has the compiled bytecode all ready to go (and probably some request-relevant information cached) should serve an individual request quicker. However, the apparent benefit to PHP then is that it scales better, as there are simply no memory issues.

My experience is that PHP serves requests very quickly - even with the source code parsing, it's lightning-quick. However, this angle on PHP suggests obvious ways to improve performance further, such as avoiding the unnecessary interpretation of code (e.g. don't import utility libraries needlessly).

Update:
Although PHP does "start afresh" with each new request, it's worth bearing in mind that PHP now supports server-based session management. This naturally requires memory on the server, so if this feature is used by the developer, one should expect to see scaling issues more in line with other server-side technology.