Tuesday, December 3, 2013

Profiling based on Social Media Behavior

Rise of social media and networks is a hot topic of the decade. People have been flocking to social media sites like never before. Large proportion of traffic flowing on the internet is carrying updates from social networks. These social networks have become platform for everything from showcasing your lunch food to clamoring against barbaric governments. In essence, these digital platforms are gaining deeper roots in society.

If you look at them from an Analyst's perspective, you will find a golden pot of data on social sentiments. This huge collection of user generated opinions, likes, expressions and patterns waiting to be explored and analysed. There is an entire field called social media analytics or simply, data analytics that seeks to tap this golden pot. Data analytics is instrumental for advertising and marketing. The data collected from a user's behavior on social networking sites or e-commerce sites is processed to throw targeted advertising at them. That is, highly customized and tailored adverts based on the customer's interests, age, sex, geographical location and browsing history. This field has been gaining lot of attention these days but its focus has been only one - targeted advertising. Clearly, application of social media analytics in marketing has lot of benefits.

I believe there is another application of analyzing social media behavior - behavioral profiling. And I am not talking in terms profiling-for-targeted-advertising or profiling-for-product-suggestions. This behavioral profiling can be thought of as a unique 'digital footprint of user's behavior'. I will explain what it exactly stands for in the next paragraph. Please note that is is a relatively new idea and I have found very little to no material available about this.

I think every regular user of social media has their own unique way of using it and communicating through it. Each regular user can have different interests. It can be understood by analyzing-
1. the kind of pages they visit
2. the kind of content they like
3. the kind of group they are part of
4. the profiles they visit
5. the kind of content they comment on
6. the people they interact with
Each user can generate different data sets for above mentioned parameters. This largely requires judging the user and finding these traits. 

Also I think users can be distinguished by analyzing their chat logs or messages sent. 'Chatting styles' can vary from person to person. Some characteristics based on which the chats or messages can differ are - 
1. Capitalization used
2. Use of trailing dots after a sentence
3. Short-forms used
4. Use of punctuation and places at which it is used
5. Use of smileys, their type and frequency
6. The frequency of splitting sentences
7. Frequency of common words
8. Spelling mistakes
For example, some users may religiously follow capitalization rule for nouns and some may not use capitalization at all. Sometimes, all the first letters in a sentence of a chat log maybe capital, indicating that the user was probably chatting through a smartphone because most smartphone keypads make first letter capital while typing. Some novice users may use lot of trailing dots and some may put two or three after their sentences. Some people may use a lot of short forms while chatting, some may use them only for few words. A few may always use one short form for a particular word which almost becomes a trademark to their chats. People can also be distinguished by the amount of punctuation and smileys they use. Some may put an exclamation mark after many of their messages, others may heavily use their favorite smiley. Some people may type long chats and then hit 'send' while some others may hit 'send' after every few words. Some users may involuntarily repeat a word or phrase many times. Someone may spell a word wrongly and it may be exhibited in many of their chats. Some may type very fast and some might be very slow. If you are a regular user, I am sure you will find at least one of these traits in one of your friends.

The point I am trying to get across is, all these chatting traits can be recorded to form a unique identity of each user. It is very easy to fake the display identity (i.e. name) on the social networking sites, but still these chatting styles remain unique to the human. These traits cannot be changed unless the user is conscious and aware about this type of monitoring. This brings us to the applications of such analysis...

This type of analysis is of very little use to the marketers and advertisers but it can be very useful for intelligence agencies. As I said, criminals and terrorists often spoof their identities but may be unaware of these subtle details that go unnoticed in their chats. Intelligence and government can exploit this to identify suspects. They can try matching these behavioral traits of a known criminal with that of a suspect that is being monitored and listened onto.

Again, the application is limited to intelligence agencies in detecting fraud and criminals but I believe it can be immensely useful to them. Also this kind of intelligence gathering requires access to private data (i.e. chat logs, user behavior info etc) which is not revealed to public but a federal agency may access.

Please note that this is a relatively new idea on which I found no documentation available. I have tried to explain it the best I can. Feel free to contact me if you have anything to say. Also I will be glad to receive some pointers if there is any work being done on this.

Thank you :)

Thursday, October 31, 2013

Analysis of searches for Abandonwares

Greetings to all!

        My first semester is finished and we have a fortnight of preparation leave before the final exam. It was pretty busy, specially near the end of semester for completion of all the remaining stuff. During this semester we were supposed to make a management system project in vb6. Yes you read that right. VB6. As you might be aware, many engineering colleges in India are still using very old softwares written in the 1990's for their academics. Even though we were free to choose any latter versions of visual basic, most of the students preferred to go with vb6 for their project. This is because we were supposed to use that software for all our college assignments. Even the official support from microsoft for vb6 ended a few years ago. 

        This made me curious about the current usage of such old softwares like vb6, turbo c, oracle database 8/9 etc. So, since we're having some free time in hand during PLs, I did some analysis on google trends about these softwares (Or as they are called -Abandonwares- the softwares which are abandoned or rarely used). It came up with some interesting findings which I am going to share here with u all...

1) Turbo C 
This screenshot shows how the usage of turbo c has been prominently declining, especially during 2005-2006. Since then it has been steadily going down. Below that, we can see the top countries which are googling for 'turbo c'. Philipines at the top followed by India and its neighbours. Striking fact is that Philipines and India are rapidly developing in IT related sectors. 

This chart is for just another related search term "turbo c" (notice the space). As it shows, it is most googled for from India.

This graph shows comparative statistics for "turboc" search term from four different countries.

Developing countries-
1. India (blue)
2. Brazil (green)

Developed countries-
1. USA (red)
2. UK (yellow)

As you can see, the search volume from developed countries is negligibly low, and even for Brazil, it has considerably gone down.




This shows the results for the same search term "turboc" from different parts of India. As there are many engineering colleges in southern India, you can see those states leading this chart.

This graph shows the time series of the same search term for India. What I find interesting is that, it reflects the time range followed by the semester pattern in most engineering colleges in India. In January, the spring semester begins which is reflected by the rise you can see in Jan 2013. The semester usually ends in April and in may we have summer breaks. This is outlined by the dip in graph after Apr 2013. In June-July, the fall semester starts which is again indicated by the rise in searches for "turboc".


2) Visual Basic 6
The world seems to be steadily giving up on Visual Basic 6. Here what we have is almost linear line indicating how the searches for vb6 has been declined. 



Coming to the comparative graph of the previous 4 countries. USA, UK and Brazil seems to have gave up on vb6 and moved on to more recent versions. While India is seen catching up very fastly. And again the south states are leading from India - Tamilnadu, Pondicherry and Kerala being first, second and third respectively.

Similarly for the search term "visual basic 6". We dont see much difference here.

Here I got little more curious about what my fellow mates are searching about vb projects and did a little more digging...

This graph shows interest over time for the search term "vb project". Again it outlines the time range for semester pattern followed - Jan to Apr and Jul to Nov - Indicated by crests and dips at May and Dec. The time series graph for the search term "projects in vb" has a very identical shape to this one. 

I checked the interest of different cities in india for this "projects in vb" search term and found that Coimbatore is the winner at 100 and chennai coming after far margin at 18. My city, Pune, is at 3rd position.

According to google trends, these 5 are related searches for the term "projects in vb" -
1. Code project
2. Vb projects
3. Library Management System
4. Hotel Management System
5. Hospital Management System




India seems to be the only out of the 4 countries we are considering, googling for "vb project".


3) Windows XP

We use windows xp in many of the academic institutions in our country and the following graph reflects the same. Although there has been fast transition to newer operating systems.

The legends are same as followed in previous graphs. 

4) Oracle 9i
Again, being an old software, the searches for oracle 9i are declining throughout the world. The part of the graph after 2005 is very similar to the graph of inverse function (y = 1/x).

Again, India is the leading googler for the search term "oracle 9i".

But there is silver lining too. We have moved on to the newer versions of oracle database and searches for oracle 9i have been significantly reduced. Almost at par with other countries after 2011.

5) Internet Explorer

Ok this is not an old software, still newer versions are coming but just for fun, here are the top countries searching for IE :P

And this last graph shows the comparison for the 4 countries we are considering... We can see surge in searches for the years in which major versions of IE were released. Although there are some other factors responsible for less number of searches, people's interest in IE seems to be waning! :P


Thank you for reading my article. I hope you found it informative. :)

Friday, August 23, 2013

My articles for a magazine

Hello all,
Here I am going to share a few articles I had written for a local magazine of my college. Couldn't get hold of the one published last year though... Here we go.
          Risks of Automated Algorithms 
23rd April, 2013. It was a fine day when The Associated Press –one of the leading journalism houses in USA- posted a tweet. It claimed that two explosions had rocked the white house and that Barack Obama was injured. The next thing you know, the Stock Market had crashed! Dow Jones Average slipped by 143 points. One of the biggest downfall we had seen after the great economic crisis of 2008. Obviously the twitter account of the Associated Press was hacked, but it sent down impulses of serious questions in the spine of community.The reason for this was computerized stock trading algorithms. Thomas Peterffy, a Hungarian Immigrant and Entrepreneur had revolutionized trading of securities in stock markets by introducing computerized trading back in 1980’s. Fast forward, the wave of social media in 21st century. Stock market traders were analyzing constant input from social networking sites and news websites to predict the market behavior and make their bids accordingly. Shortcomings of this practice came into focus in 2010 when one fine day, there was a similar heavy crash in the market. This incident is popularly known as the Flash Crash of 2010. The reasons were partially attributed to the algorithmic glitch present in the software of Thomson Reuters – a leading provider of such software.In the incident of 2013, the Associated Press was quick to acknowledge about the fake tweet and compromise of their account. A group of criminal hackers going by the name ‘Syrian Electronic Army’ accepted the responsibility for this hack. They are already notorious for their hacks on BBC Weather (posted about possible tsunami in UK), National Public Radio and CBS news. They claimed that the hacking attack was executed through sophisticated Phishing attempts on AP’s employees’ accounts. Such hacks on news websites are common these days. Another group of hackers had managed to get access to The New York Times website and posted an article showing support for Wikileaks. Thomas Reuters’ website was also compromised and interviews of Free Syrian Army movement leader were posted. It turned out that they were using an old version of WordPress, popular blogging software.Direct or Indirect hacking attacks on critical infrastructure like stock markets and power grid are on the rise. With our increasing dependence and reliance on technology, there is greater scope for these kinds of attacks to happen again. Indiatimes reported that half of the world’s stock markets faced hacking attacks last year. 90% of stock markets perceive this to be a critical systemic risk. As it is evident from the above incident, it is relatively easy to manipulate the current system. Even though hacking attacks happened till now have been benign in nature, there is possibility of more intense attacks which can cause losses to the nation as a whole and put their reputation on stake.  India stands at even higher risk. On the brighter side, the government is proactively involved in beefing up the security infrastructure. We will need thousands of programmers, security professionals and ethical hackers to take up the challenge of combating these threats.


Whatsapp Security 
(this is just a cut down version of a post I made earlier)WhatsApp is something you probably use right now ,or have atleast heard of. It is a cross platform IM application for smartphones that handles a few billion messages every day. That is approximately more than 50,000 messages sent every second. However, it comes packed with its own set of drawbacks, being criticized for its security issues, mainly cryptographic standards and the way it handles users’ personal data.
Like most of the popular IM clients, such as YAHOO! and Gtalk , WhatsApp implements a modified version of XMPP i.e. eXtensible Messaging and Presence Protocol. On installing this app, it creates the user’s account on WhatsApp’s server. Its username, or technically known as ‘Jabber ID’ is concatenation of the user’s country code and their mobile number.
An interesting fact about its password generation is its usage of the phone’s IMEI number or MAC address. For android devices, the password is md5 hash of the phone IMEI no. reversed.
Password: md5(strrev($imei))
For iOS, the password is md5 hash of the device’s Wireless Interface card’s MAC address, written twice. Mac address is concatenated with itself and its md5 is found out.
Password: md5(mac+mac)
As WhatsApp claims, no messages are stored on their servers once they are delivered to the recipient. For sending of multimedia like audio, video or images, the data is first uploaded to their HTTP server. A link to the uploaded file is sent to the recipient’s phone along with thumbnail file, if required.
Until recently, messages were being sent in clear text. Anyone using the same WiFi network could sniff the packets and read messages. This gave attackers ability to launch a session hijacking attack. However, currently messages are being sent encrypted for iOS and Android. The encryption algorithm it uses has been reverse engineered, especially for iOS devices. Overall verdict is that the current encryption mechanisms used does not follow normal standards.
As it uses phone’s IMEI number or MAC address as its password, someone in physical contact with the phone can easily access it. Someone sharing the same WiFi network can easily find out the MAC address of iOS device with sniffing. All an attacker has to know is the victim’s mobile number and the phone’s MAC or IMEI to enter into a script. Then they can send and receive messages from the compromised account.
Its API, reverse engineered by an open source project called ‘WhatsAPI’, is coded in both PHP and Python and can be integrated with web apps. It can be used to send WhatsApp messages to any number supporting it. This app uploads your contacts to their server to see which of them are registered on WhatsApp. This is carried out in an insecure environment. The mobile numbers of your contacts are sent as an array through the HTTP request parameter.
The verdict is, WhatsApp has to work on its security but with the recent updates, they have made reasonable improvements in this aspect. We hope that they come up with a better alternative to current passwords and cryptography implementation to increase reputability with the security community.
Vipul Chaskar
Pune Institute of Computer Technology
Information Technology

Thursday, March 7, 2013

Buffer Overflows : overwriting eip with strcpy()

Buffer overflows is one of the very common vulnerabilities found out there. Many of the exploit codes existing today take advantage of this particular vulnerability. Therefore, skills of overflowing the buffer is an indispensable tool in any hacker's arsenal. I struggled to understand and exploit this vulnerability a few years back. I hope I'll make it easy for you through this post. I shall only cover overwriting the eip stored on a stack frame which is relatively easy but important step to launch a buffer overflow exploit.

Before proceeding ahead, I recommend that you have knowledge of CPU registers and stack from computer organization and architecture class, memory address space of a process and working with gcc.

Here is a diagram of a stack frame for quick reference-

Let's consider a piece of code-
//strcopy.c
#include<string.h>
int main()
{
char string[10];
strcpy(string,"ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ");
return 0;
}
Here we are declaring a string of 10 characters and trying to store around 40 Z's in that with the strcpy function which does not perform bounds checking. What happens when we compile and try to run it? Let's find out-
vipulc@ubuntu:~/Desktop$ gcc -ggdb -mpreferred-stack-boundary=2 -fno-stack-protector -o strcopy strcopy.c
vipulc@ubuntu:~/Desktop$ ./strcopy
Segmentation fault (core dumped)
vipulc@ubuntu:~/Desktop$ 
Here I am using gcc on ubuntu 12.
The -ggdb parameter enables debugging of the program. 
-mpreferred-stack-boundary=2 keeps the stack boundary aligned to 2^2. We will not go into details of this.
-fno-stack-protector disables stack protection to help us learn about the exploit. 

Now here since we got a segmentation fault, we'll try to find out using gdb.
We will run the program in gdb and then see the values of registers to see what went wrong.
First we run the program with the "run" command which comes to a halt due to segmentation fault. Then we try to see the register values with "info reg" command.
Concentrate on EBP and EIP values. They're all 0x5a5a5a5a. Hey! 5a is the hex of ascii of "Z" character which we used to overflow the buffer. That means, we have overwritten the values of ebp and eip in that stack frame!
Now try reducing the number of "Z"s in the program and try to see for how many characters it just overflows the eip...
We overflowed eip and ebp register through a buffer of 10 characters. Well if you put only 14 characters instead of 40, you will notice that you can overwrite ebp with those 4 extra characters. If you use 18 characters, you can overwrite the eip too, with the last 4 characters. 
Now the last 4 bytes which will overwrite the eip can be replaced by the memory address where the exploit to be launched is stored. This can lead to total compromise of the system and attacker can gain shell access or even root access to the system.

Saturday, February 23, 2013

Reversing Subnet mask and bit notation

     This article is a follow up to the IP subnetting. In this article I will discuss about reversing subnet mask to find IP address ranges inside a network. I will also explain what is bit notation. Let's get started...

Reversing Subnet Mask

Reversing a subnet mask is very easy to do and it is generally used to find address ranges and class of a network. The default subnet masks for three classes of networks are as follows:
Class A - 255.0.0.0
Class B - 255.255.0.0
Class C - 255.255.255.0
And as we perform subnetting, the host bits on the left side are converted to network bits.
Let's say, for example, we have a subnet mask 255.255.255.224. Now this is a class C network subnet mask (although it is not always the case). Let's convert it to binary-
11111111.11111111.11111111.11100000
Consider the last octet. It has 3 network bits and 5 host bits. It reveals two things. First, since there are 3 network bits and maximum number represented with 3 binary digits is 7 (111), the address range is divided into total 7 segments. Similarly, 5 host bits and the maximum number represented by 5 bits is 32 (11111), there are 32 addresses available per network segment. This is including the network identifier and the broadcast address, so we have to subtract that to get the actual number of usable hosts.

Let's take another example. The subnet mask is 255.255.255.252.
Converting it to binary-
11111111.11111111.11111111.11111100
Concentrate on the last octet. Since it has 6 network bits, there can be 64 individual network segments in this address range. (max. number represented by 6 binary bits (111111) is 64) The given address has only two host bits. Hence, there are only 4 addresses available per segment. Excluding the first and last. There are only two usable addresses per network segment. Two addresses per segment is not very useful inside a corporate network. This type of subnet masking is mainly used in WAN links to avoid wastage of address ranges.

Bit Notation

Bit notation is just another way of representing subnet masks. Subnet masks till now were represented as a series of 4 octets. In the bit notation, we consider only the number of 1's in the binary form of subnet mask and write it against the ip address.

For example, consider ip address 192.168.10.34 having subnet mask 255.255.255.240.
Its binary representation is 11111111.11111111.11111111.11110000
Here we count the total number of 1's in this form which is 28. It can be written with the IP address as-
192.168.10.34/28.
Another example, an ip address 192.168.2.3 has subnet mask 255.255.255.192. Its binary form is 11111111.11111111.11111111.11000000
Here there are 26 1's present and hence bit notation becomes-
192.168.2.3/26.

Thanks for reading! Do leave a comment below...

Thursday, February 21, 2013

IP Subnetting

     In this post I am going to explain what is IP subnetting and how it is done. Subnetting is used to divide a network into two or more segments of networks. Subnetting involves breaking of the available IP address range into different networks as per requirement. There are two main types of subnetting i.e. subnetting based on number of networks required and subnetting based on number of hosts required per network. You figure out the subnet mask through subnetting which will be then applied in configuring the routers and hosts on your network. Lot of subnetting calculators are available which do the calculation for you but learning it is essential if you are planning to take cisco certifications like CCENT and CCNA.

     Before getting started, I would like to talk about the structure of IP address and subnet mask. Each ip address consist of 4 octets i.e. four sets of 8 bit each. Each octet can represent a value of min. 0 (00000000) to max. 255 (11111111). All the four octets can have any value between 0 to 255. A 0 present in the last octets (e.g. 192.168.10.0) indicates or identifies the network segment. A 255 in the last octet (e.g. 192.168.10.255) is usually the broadcast IP address for the network. An address ending in 0 or 255 cannot be used to assign to a host. However, this is not always the case in IP subnetting. 
     A subnet mask defines the network part and the hosts part in an IP address. For example, the subnet mask for class C ip addresses (255.255.255.0) indicates that the first 3 octets in the address are used to indicate the network and the last octet identifies the host from an IP address. When the network is divided, the "hosts part" of the subnet mask is modified according to number of networks or hosts per network desired.
     It is recommended to have basic knowledge of binary conversion and computer networking before going ahead. The examples I will show below are specific to class C networks, however, the same concept can be extended to class A and B as well. 

Subnetting based on number of networks

     Consider a scenario, you have been given an IP address range (say 192.168.1.0 - 192.168.1.255). And you are assigned to divide this range into 10 networks and to figure out the subnet mask. The steps you will follow are as follows:

1. Convert the number of networks required to binary and find out the number of bits.
In this example, we have 10 networks. Its binary equivalent is 1010 which can be represented by 4 bits. Therefore, 4 bits are required to represent number of networks (which is 10).

2. Reserve bits in the subnet mask and find out new subnet mask
The original subnet mask of class C network is 255.255.255.0. Lets represent it in binary. 
11111111.11111111.11111111.00000000
Now, in the above step, we found out that 4 bits are required to represent networks. Hence, in the above subnet mask, we convert 4 host bits to network bits (remember that all 1's represent host bits and all 0's represent network bits.)
11111111.11111111.11111111.11110000
Notice that we converted leftmost host bits to network bits. Now this is the subnet mask required, in binary. Converting it back to decimal,
255.255.255.240 (decimal of 11110000 is 240)

3. Find out the increment and corresponding network ranges
Let's consider the binary form of subnet mask again.
11111111.11111111.11111111.11110000
To find out the increment, we consider only the octet in which the last network bit i.e. 1 lies. Here it is the last octet.
11110000
Now we find out the place value of last "1". Since it is at 5th position from right hand side, its place value becomes 2^(5-1) = 16. (this is equivalent to making all other bits except last network bit "0" and converting back to decimal).
We've got 16 as the increment. This increment is used to find the starting address of each network range. In our case, the starting address of each network range will be 192.168.1.0, 192.168.1.16, 192.168.1.32, 192.168.1.48 and so on...
And the corresponding network ranges will be
192.168.1.0 - 192.168.1.15
192.168.1.16 - 192.168.1.31
192.168.1.32 - 192.168.1.47
192.168.1.48 - 192.168.1.63
and so on...

Here, the starting address of each range will become the network identifier and last address of the range will become broadcast address for that segment. e.g. in the first range, 192.168.1.0 is identifier and 192.168.1.15 is the broadcast address.

Subnetting based on number of hosts

     Let's say we have been given the same IP address range (192.168.1.0-192.168.1.255) and now we're asked to divide this range into segments of 20 hosts/network irrespective of number of networks that will be created. The steps that we'll follow now are a little different - 

1. Convert the number of hosts required to binary and find out the number of bits.
Here, we have 20 hosts per network. 20 can be represented in binary as 10100. So, 20 can be represented in binary with 5 bits. 

2. Reserve the bits in subnet mask and find out new subnet mask
Class C subnet mask in binary form is
11111111.11111111.11111111.00000000
Here since we are concerned with number of hosts, we have to reserve host bits i.e. zeroes. The host bits are always reserved from the right hand side of the last octet.
11111111.11111111.11111111.000"00000"
Here the 5 bits (quoted) will be saved or reserved for the hosts and rest of them will be used for network (i.e. turning them to "1").
Hence the subnet mask becomes
11111111.11111111.11111111.11100000
Decimal representation : 255.255.255.224 (decimal of 11100000 is 224).

3. Find out the increment and corresponding network ranges
Increment can be found out by the similar method as last example. By considering the last bit of the network bits
11111111.11111111.11111111.11100000
Calculating place value of the highlighted bit, we get 32 (2^5).
Therefore, the starting address of each thus formed network range will be incremented by 32. The starting addresses will be 
192.168.1.0, 192.168.1.32, 192.168.1.64, 192.168.1.96, 192.168.1.128 and so on...
The network ranges thus formed are
192.168.1.0 - 192.168.1.31
192.168.1.32 - 192.168.1.63
192.168.1.64 - 192.168.1.95
so on...
and again, the starting and ending addresses of each range will be identifiers and broadcast addresses respectively. 

Well, i'll conclude here for now. In the next article I will describe reversing the subnet mask and talk about bit notation.

Thanks for reading :)

Tuesday, February 19, 2013

My first web server

Hello readers,
I had been willing to develop a small and efficient web server hosted in a LAN for quick transfer of files. A couple of days ago I had some issues with my FTP client when I wanted to transfer something from my ubuntu machine. So I decided to write a simple HTTP server from scratch.

Operation is fairly simple, it binds itself to the port 80 and listens for connections. Once it receives a request, it checks for the presence of requested file. If found, it sends back the same file with HTTP 200 response. Otherwise sends HTTP 404 with the "notfound.htm" under server root directory. If no specific file is requested, it checks for the presence of "index.htm" in the requested directory (default being '/' ) and then serves index.htm page under that directory. The server serves only static pages and files.

It is relatively easy, less than 200 lines of code. Developed on ubuntu linux 12.04. Implements only HTTP Responses 200 and 404. The default server root is "/usr/local/www/". It requires root privileges for binding of socket to the port 80. Interrupts recognized are Ctrl+C for termination. I call it the 'ViServer'.

Please note that this piece of code was written for learning purposes only and you are likely to find it buggy. 

Here is the link to the source code:
https://docs.google.com/file/d/0B_oonTu4H_SIWTRJM3ljNm1KTUk/edit

 Use it for testing on your local machine only and never deploy it in untrusted LAN or worse, facing the internet!

Thank you! :)


Thursday, February 7, 2013

Web development primer

Hello friends,

This is an on demand article about web development. I will cover some basic terms, concepts and ideas related to web development. This article will help those who are planning to jump start into learning web development. The concepts I explained are from a beginner point of view. In the end I will conclude by showing a simple HTML file.

How does HTTP work?
HTTP is the protocol of the internet. HTTP stands for HyperText Transfer Protocol. For the time being, just keep in mind that whatever data transfer occurs through the web follows the HTTP protocol.
HTTP works on request and response. Your browser requests something, the server on the other side sends back the page you requested. Consider an example,
you enter google.com into your browser. Your browser will send a request to the Google's server requesting their homepage. Google's server will send back their homepage in response which will be displayed on your screen. Very simple.
Similar thing is repeated whenever you follow a link or navigate from one webpage to another.

 HTML-
HTML stands for HyperText Markup Language. It is the language in which webpages are written. Whenever a server sends you a webpage, It is essentially sending the webpage in plain text i.e. the contents of the webpage which is nothing but HTML code. Your browser then interprets this HTML code and constructs the webpage on screen according to HTML tags specified. As you maybe aware, HTML files end in ".htm" or ".html" extension. Webpages are just HTML files stored on the web server.

Web Server and Web Application-
Now lets take a closer look at what happens on server's side. Web servers are just like normal computers with higher processing powers. Just like your computer, an OS runs on web servers.  And web server is ultimately an application software that runs over the OS. Popular web server softwares are microsoft IIS and Apache HTTPD. Once the server's hardware receives the request, it is passed to the Operating System which runs it. The OS in turn passes this request to the web server software. The web server software is always running and is listening for requests. On receiving, the server picks up the requested webpage from its directory, processes it and sends to the user. Generally in any directory present on the web server, "index.htm" or "index.php" is the default page if the page name is not specified.
Web application is simply a set of related webpages on the server and maybe connected to a database in the background where they can store the data. Examples of web applications are online shopping websites and user forums.

Client Side Scripting and Server Side Scripting-
Websites have grown more dynamic with time and simple HTML is never enough for the functionality they provide. There are some scripts which can be embedded into HTML pages. These are client side scripts and server side scripts.
Client side scripts are executed inside the user's browser while it is interpreting the HTML code. They are used to add functionality to the user's end. Examples are JavaScript and VB Script.
In the above section, I said that the web server processes the page before sending it to the user. This means that the server itself executes the server side scripts which maybe present in the page. Again, the server side scripts are used for added functionality such as database interaction, command execution and getting user's info. Examples are PHP and ASP.

Sample HTML Program-
<html>
<head>
 <title>Sample</title>
</head>
<body>
 <h1>This is sample text.</h1>
 <br>
 <script language=JavaScript>
  document.write("<h1>This is another sample text</h1>");
 </script>
</body>
</html>

Copy the above code in notepad file and save it as sample.htm or sample.html. Then run it through your web browser, (preferably firefox or chrome). It should show two lines of text. The first line is inserted through html and second one is generated through javascript present in the page.

Resources-
There are many websites where you can learn HTML or other languages for free. The best one according to me is 
http://www.w3schools.com
Also many books focusing on a specific language and platform are also available.

Thanks for reading :)

Friday, February 1, 2013

Shut down windows from web and mobile

Hello Readers,

It's been a while since I posted on the blog and this post is little off-topic. Anyway, today my friend put forth this idea. What if we can shut down our machines remotely. I said it is possible and infact many people are using it to remotely shut down their servers. I couldn't resist the temptation to come home and try it out myself. Through this post I will share with you how we can do it.

The basic idea is very simple. Your computer runs a web server on which a php page having the shutdown code is stored. A remote host or mobile simply accesses this page which runs the code and system command is executed.

For this technique, you will need wamp or xampp server hosted on your computer and a mobile or pc preferably on the same network. I wont go into details of installing servers. For this tutorial, I am using wamp server and mobile both on the same wifi network.

First of all, create a simple PHP file-
<?php
$output = shell_exec('shutdown /s /t 00');
echo " $output ";
?>
Give it any name  (say shutdown.php) and store it under root directory of your web server.

Then start the web server. Make sure that the firewall is turned off. Note down the ip address of this computer by going into cmd and typing "ipconfig".

In my case, it is 192.168.0.100.
Check if your web server is running properly by opening http://localhost/ from the address bar of your web browser.

Now all you have to do is access this php file from any other location on the network. You can access it through mobile or any computer by typing
http://192.168.0.100/shutdown.php
replace the ip with your internal network ip address.

And once you hit enter, your mobile will request the shutdown.php page which will be received by the server installed on the pc. It will then execute the php code inside the page which contains nothing but a system command "shutdown" that starts the process of shutdown. Hence the windows will shut down!

Same concept can be extended to linux platform by simply replacing the shutdown command by its equivalent in linux. Also make sure that the server process running this code has enough privileges to execute the command.

The php program I used is very simple just for the sake of demonstration. In real life, you should not allow the remote user to execute system commands unless he/she is authenticated to the system.
With port forwarding on your router enabled, you can also shut down your PC from anywhere, but again, that is risky from security point of view.

Thanks!
Feel free to ask your doubts in the comments below :)