Archive for the ‘Testing’ Category


The anonymization is the process where all sensitive data is removed from the production data before it is used at testing. That is supposed to produce test data which is anonymous, and can be used without security and privacy risks. It has to be done really carefully, and the structure of production data must be understood very well before this can success. At this blog entry I show how difficult it actually is and how it can fail.

Software under testing is Twitter-like website. There is possibility to send public and private messages between users. Our simplified forum includes following three database tables:

User profile – it has numerical identifier for user (UID), username, password hash and e-mail address
Message – it has ID for message, UID of submitter, content as text, timestamp
Private message – ID for private message, UID of submitter, UID of receiver, content, timestamp

The site is already in production and open, so anyone can go to check what others has written in the public messages. Also the profiles are public. The UID is used to identify them.

Let’s start to anonymize this data. If we start from the username and message contents, are those really enough to make data anonymous? Definitely not If the original forum is public, anyone can still check private information like who has messaged and to who. Numerical UID is still the same, so we have change that also. If we want to keep statistics correct, we can’t just assign random number to messages and user profiles. E.g. if account “Teme” had UID 1, then to maintain proper statistics we have to convert all UID 1 to e.g 234.

It is still very easy to find that UID 1 is changed to 234. The features which reveal the information are timestamps and amount of messages. So we have to change also all timestamps. The new timestamps must change the order of message to keep things anonymous. We have change the number of messages of each profile also.

Even after this change we can still in some cases find the real profile from test data. Small external information like “I know that this person has messaged to that person” can help the evil person to find the real profile.

Instead of anonymizing the production data the test data should be based to model of production data. For example at production data we should have same amount of users and messages as the production data. But then on the other hand it usually doesn’t matter if the users have proper distribution of messages.

Instead of using production data like that, generate your own data and inspect, what kind of properties are the most important for your testing. Good test data increases the test coverage and possibility to find the bugs.

See also Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization by Paul Ohm

Production data at testing

Posted: 30.7.2014 in Testing
Tags:

Several times I’ve seen the organization where the test data is taken from the production. I have to say that in that case the normal cases are covered quite well, but I hate that kind of approach. Why?

There are couple different risks. First and probably the biggest one is the security and privacy risk. Production data usually have some security or privacy sensitive data. That can be e.g. e-mail addresses, usernames, password hashes, birthdays and so on. Usually the test environment is not as well secured as the production environment. E.g. for debugging reasons testers and developers can access to the database, the testing can be outsourced to somewhere outside the organization. During testing we can accidentally or for the purpose see sensitive data which we definitely not see.

Another reason is that often there is some data which can cause extra traffic for the user. The traffic can be related to e-mail or even snail mail. Imagine the situation where the tested application contains healthcare patient records. Tester creates the death certificate, and presses the button:”Send to the relatives.” Usually the test environment should stop there. But misconfigured environment can be connected to printing and mailing services. This can cause heart attacks to the receiver.

Production data can have problems also. There can be “one in a million case”, which hasn’t happened at the production yet. As the testers our data should consider that also. So we have to even with production data to check if the test data covers all required cases. With half million user record it can be enormous task.

Instead of using the production data investigate what kind of data it contains. Make the artificial data based to that information. During this you should consider what aspects are important for the testing. E.g. if the age distribution doesn’t have any matter for the testing, why to spend time for that? Or what if it has? Then investigate what kind the distribution is and base your data to that information.

Often at performance testing the test environment (or even production environment) is loaded with the production data. Even in those cases I try to make my performance testing scripts so that they do not touch to any sensitive data (e.g. user data). I always try to create artificial data which can be easily cleaned up from the database. E.g. if there is first and second names I usually create them as: -CUSTOMER (Teemu-CUSTOMER) and -TESTING (Vesala-TESTING). This kind of naming distributes the data correctly to the database and later it is easy to find such users and remove them. If you use Fake Name Generator (http://www.fakenamegenerator.com/) to create your data, it is very easy to modify the name with short perl script.


Last time I wrote something about test data at manual testing. This blog entry has several words about test data at test automation.

Now the computer is taking care of submitting and checking the data, so now we can use different kind of data. We don’t have to keep the data simple, and we can use more data than during manual testing. E.g. you don’t want to type 100 megabytes of data, but test automation can do that. Test automation can also detect small changes which are difficult for human to notice. For example making difference between I (capital i), I (lower L) and 1 (number one) can be difficult for human. But test automation usually detects changes at binary level, and all of those have different Ascii code. So it can detect those.

At discussion forum checking the paging of 100 messages thread correctly would be nearly impossible for manual tester. Or at least to result would be poor because he would most likely just put a few characters to each message. But with test automation we can create longer and shorter messages with realistic looking data. What is realistic in this case? Or how do we know it? Internet is full of forums, so we can select almost any of them for analyzing. We could get e.g. in average how many words each message has, and what kind of standard variation they have. Then test data can be constructed based to that information.

Test automation can use random data. The data is created at run time by computer. It has some issues which we have to remember: Test must be reproducible. Random generator should not be cryptographically strong. It should be such that when we set its seed to specific value; it always produces the same result. Initial seed can be related to time, but then next values must be always based to previous number. Also the logging should make sure that all steps and initial stages are logged.

One good way to cheat at randomness is to create predefined list of data. You can reproduce it all the time, and it is “random enough” if you trust that different order doesn’t change the result. At one project I wanted to test how application reacted to broken input files. Manually it would be impossible to test, but with computer it was quite simple. I just had to create the inputs and I used Ramasa for that. (http://code.google.com/p/ouspg/wiki/Radamsa) Seed for that were couple working inputs. During testing I found one major crash which could have been also the security issue. Without test automation and possibility to test huge mass of data, we could have left major security issue to the application.

It seems that I should have structured my series a bit differently. There are still plenty of important things which I should describe. And they touch test automation as well as manual testing also, but also security and performance testing. And then there is plenty of miscellaneous issues like using production data.


When I see the number input, I have several patterns which I like to test. Here are few of them:

  • 08, 0100 – reason behind this is, that text to integer might interpret that to octal number. In that case 08 is illegal value, which can result strange things.
  • 8-1 – reason behind that is, that sometimes SQL query calculates that
  • 0xa – in that case the text to integer might translate the number to hex number.
  • 1e3 and 1e-3 – those might be interpreted 1*10^3 and 1*10^-3 (=1000 and 0.001)
  • 2147483646, 2147483647, 2147483648 – these are maximum ints in many cases
  • -2147483647, -2147483648, -2147483649 – these are minimum ints in many cases
  • 4294967294, 4294967295, 4294967296 – this is maximum on unsigned integer
  • Some huge number which is far beyond previous numbers

And let’s go a bit more detailed and real life situations to some of these.

At one C++ project the logic were following:

input number X
if (x+2 < fixed number)
loop from 0 to x

So if we input anything below 2147483646 we get correct functionality. But if we insert 2147483646, the result is suddenly -2147483648 and we enter to the loop. This is far from the expected result and in worst case it even opens the security issue. That system didn’t crash. It just stalled for 15 minutes which blocked some batch processing.

Then another issue is 8-1. I usually test that at web applications where I expect the number to be index. If the result is same with numbers 7 and 8-1, there is most likely SQL injection security issue. At the code is query: SELECT * FROM table WHERE id=$intput$. If it calculates 8-1, then it can also parse any other query. That can be e.g. 8+or+1=1 which might cause some really exciting result. Or it can be even such query, which dumps out the user database.

08 is interesting. I’ve seen it only at build number and as compile time error. But give it a shot. You never know what kind of number parser is at the engine room. It can lead to strange errors, or some other fancy effect which the user might dislike. And in that case try also 0100, because if it is parsed to octal, the result is 64. And it is clearly wrong. 0xa is same kind of thing. If the parser parses it to 10, then you will be in trouble if users don’t know that 0x is prefix for hex number. 0x100 is not same as 100, it is 256.

1e3 is exciting thing. I’ve met that kind of input parsed wrongly once. The system were going thru the document and catching all URLs. For some reason if URL contained that kind of string, it was parsed to number and to normal format. E.g. that would have been 1000.

Of course I try also normal border cases, classes, some random text etc. But these are the cases outside them. Do you have some specific patters which you try? And why are you trying them? Leave me the comment.


Many times I hear that exploratory testing (ET later) is pure manual testing. But that’s not true. You can use any possible tools to help ET. This is the part one of multiple articles where I present tools which you can use to assist your ET and other manual testing.

What is the purpose of tool? Its main purpose is to release tester to do meaningful tasks. If initializing the test takes more than 1 second and needs to be repeated over and over again, it is preventing the good testing. And tools should be used to remove that kind of obstacles.

Unfortunately often the tool itself becomes the “purpose of testing”. I know that – I am usually doing test automation. It seems that very often the tool itself becomes the obstacle because someone thinks that it is the silver bullet for all testing problems. At that point the tool turns to testing problem.

After short introduction let’s start with very simple case. Let’s imagine we were testing wordpress.com. For tester that is really boring case, because every time he wants to do something, he has to login. Easiest way to get around the problem is to get tool to do login. If the case is this simple, I’d take Selenium IDE. It is Firefox plugin which records the test case. After recording it can be played over and over again to get the test to specific point. The screenshot below shows the whole test for WordPress.com login. (Credentials are not real ones…)

Selenium IDE

Selenium IDE script for WordPress.com

Selenium IDE is good for small tasks, but I would not recommend any recording tool for large scale test automation or complex tasks. Its simplicity justifies its use.

I will write later more about tools which can help exploratory and other manual testing styles.

How to become a tester?

Posted: 12.7.2011 in Testing
Tags: ,

So you want to become a tester but don’t know how to do it? This is my opionion how to become one. I’m not going to recommend any book, I’m not going to give you list of courses you should take. But go on – read rest of the post and think how you can learn skills which are needed at testing.

The most important mindset at testing is curiosity. Be excited and interested everything you see around you. Computers and software are just tiny part of it. You should go to walk, do it slow enough, open your eyes, look around like child. If you look at the ground, you can see different kind of insects crawling around. Look at them, be excited how they look, what they do, how they react to you. Or look at tree. At forest every pine-tree has own outlook. Stare at them, touch them, even taste them, compare them, and learn that they are unique. That sound childish, but that’s what curiosity is. To be a tester you have to be curious about every application you ever test.

Read everything you can. There isn’t such thing as “unnecessary knowledge”. I suppose I don’t have to know Russian history at my work, but reading it helps me to see the World at wider perspective. I know how scheduler of Windows 2000 works. Outdated information, but it still gives me some idea, how machines are working. If is find new protocol, I want to find how it works, who has defined it I try to find implementations and even dig the transmission packets. These kinds of things are never unnecessary stuff. They keep my brains awake. Large part of testing is reading, understanding, asking and digging.

Learn new technical skills by doing. You never know what you will need at your work. I know how to code with PHP, C, C++, Java, Perl, Python, how to build shell scripts, I’ve tried Ruby, Smalltalk, Lisp – uh.. I don’t even know what all languages. After I know them, I also know their weaknesses. Quite often I develop small scripts which are helping me at testing. I know different operating systems. Some of them are dead or vanished to history. I don’t use most of them anymore. But every now and then I end up to project where I have to use knowledge about different operating systems. BSD, Solaris, different Linux distros, QNX, different Windows versions, DOS, CP/M – I can’t even remember all I’ve used.

Take part to open source projects. They are good field to learn new things, and also if you have done something to them, you can show from bug reports that you have skills to find problems and report them. If you have done something at open source project, mention it at CV. It looks good! Open source is my favorite way to learn new tricks, learn new ways to blow up systems. Just take part to random project at Freshmeat, start beating and contribute with the bug reports. If you don’t know what to do, go thru their bug database and see if closed bugs still exist or make sure that older reports are still valid. Reproducing and retesting teaches what kind the good reproducible bug report is but also how others have tested the application.

So that’s my short list. It’s far for complete. It’s only from my perspective. At testing field is plenty of different kind of persons. Some are artists like me, some are engineers. We have lot of different skills and we can contribute testing community with our personalities. So welcome, enjoy your way to testing community. Become the active member at testing world at different medias.

I love testing world, its exciting new ideas, large knowledge it needs.


Thanks testedtested about good thoughts @Twitter. 🙂 I had to start thinking about testability and its relationship with customer happiness and usability. 140 characters is too less for very well formulated ideas and comments so I had to write blog entry.

If I can use application, it doesn’t mean that I can test it. In many cases the testers must request developers to add code and features to increase testability. In many applications good usability means better testability, but that’s not case in many embedded systems.

Good example is Google’s self-driving car. From user point of view it is very simple system. Just submit where you are going to, hit the button and car takes you there. (This is very simplified view.) Even if that process is done simple, it is definitely not the same as testability.

I slice the simplified system to multiple parts:

  • Sensors
  • Motor
  • Wheel
  • User’s GUI which inputs the destination
  • Button which starts to drive

The software which controls all of these is software under testing (SUT) in my example. For me I’d need following things this to be testable:

  • I have to be able to send artificial messages from the sensors to SUT
  • I have to be able to inspect what kind of messages motors get
  • I have to be able to give artificial feedback from motor to SUT
  • I have to be able to control (virtual) wheels same way as normal road affects to wheels

If any of these is missing, I would not be able to test the control system. These need plenty of simulators and stubs which tester or test automation can inspect and control. But even without those simulators and stubs, extra interfaces the user could use the system and even be happy (if it works as expected). And even tester could use the system without them.

So testability is not always automatically part of usable software. Sometimes testability needs plenty of additional work for developers. And in those cases it is usually testers’ responsibility to give details how to make application more testable.