The Complex yet Powerful World of DateTime in Data Science
I still remember coming across my first DateTime variable when I was learning Python. It was an e-commerce project where I had to figure out the supply chain pipeline – the time it takes for an order to be shipped, the number of days it takes for an order to be delivered, etc. It was quite a fascinating problem from a data science perspective.
The issue – I wasn’t familiar with how to extract and play around with the date and time components in Python.
There is an added complexity to the DateTime features, an extra layer that isn’t present in numerical variables. Being able to master these DateTime features will help you go a long way towards becoming a better (and more efficient) data scientist. It’s definitely helped me a lot!
And the date and time features are ubiquitous in data science projects. Think about it – they are a rich source of valuable information, and hence, can give some deep insights about any dataset at hand. Plus the amount of flexibility they offer when we’re performing feature engineering – priceless!
In this article, we will first have a look at how to handle date and time features with Python’s DateTime module and then we will explore Pandas functions for the same!
Note: I assume you’re familiar with Python and the Pandas library. If not, I highly recommend taking the awesome free courses below:
- Python for Business Analytics and Data Science
- Pandas for Data Analysis in Python
Table of Contents
- The Importance of the Date-Time Component
- Working with Dates in Python
- Working with Time in Python
- DateTime in Python
- Updating old dates
- Extracting Weekday from DateTime
- What week is it?
- Leap year or not? Use the calendar!
- The Different Datetime formats
- Advanced DateTime formatting with Strptime & Strftime
- Timedelta
- DateTime with Pandas
- DateTime and Timedelta objects in Pandas
- Date range in Pandas
- Making DateTime features in Pandas
The Importance of the Date-Time Component
It’s worth reiterating, dates and times are a treasure trove of information and that is why data scientists love them so much.
Before we dive into the crux of the article, I want you to experience this yourself. Take a look at the date and time right now. Try and imagine all kinds of information that you can extract from it to understand your reading habit. The year, month, day, hour, and minute are the usual suspects.
But if you dig a little further, you can determine whether you prefer reading on weekdays or weekends, whether you are a morning person or a night owl (we are in the same boat here!), or whether you accumulate all the interesting articles to read at the end of the month!
Clearly, the list will go on and you will gradually learn a lot about your reading habits if you repeat this exercise after collecting the data over a period of time, say a month. Now imagine how useful this feature would be in a real-world scenario where information is collected over a long period of time.
Date and time features find importance in data science problems spanning industries from sales, marketing, and finance to HR, e-commerce, retail, and many more. Predicting how the stock markets will behave tomorrow, how many products will be sold in the upcoming week, when is the best time to launch a new product, how long before a position at the company gets filled, etc. are some of the problems that we can find answers to using date and time data.
This incredible amount of insight that you can unravel from the data is what makes date and time components so fun to work with! So let’s get down to the business of mastering date-time manipulation in Python.
Working with Dates in Python
The date class in the DateTime module of Python deals with dates in the Gregorian calendar. It accepts three integer arguments: year, month, and day. Let’s have a look at how it’s done:
You can see how easy it was to create a date object of datetime class. And it’s even easier to extract features like day, month, and year from the date. This can be done using the day, month, and year attributes. We will see how to do that on the current local day date object that we will create using the today() function:
Python Code:
Working with Time in Python
time is another class of the DateTime module that accepts integer arguments for time up to microseconds and returns a DateTime object:
You can extract features like hour, minute, second, and microsecond from the time object using the respective attributes. Here is an example:
This is just the tip of the iceberg. There is so much more we can do with DateTime features in Python and that’s what we’ll look at in the next section.
DateTime in Python
So far, we have seen how to create a date and a time object using the DateTime module. But the beauty of the DateTime module is that it lets you dovetail both the properties into a single object, DateTime!
datetime is a class and an object in Python’s DateTime module, just like date and time. The arguments are a combination of date and time attributes, starting from the year and ending in microseconds.
So, let’s see how you can create a DateTime object:
Or you could even create an object on the local date and time using the now() method:
You can go on and extract whichever value you want to from the DateTime object using the same attributes we used with the date and time objects individually.
Next, let’s look at some of the methods in the DateTime class.
Updating old Dates
First, we’ll see how to separate date and time from the DateTime object using the date() and time() methods. But you could also replace a value in the DateTime objects without having to change the entire date using the replace() method:
Weekday from DateTime
One really cool thing that you can do with the DateTime function is to extract the day of the week! This is especially helpful in feature engineering because the value of the target variable can be dependent on the day of the week, like sales of a product are generally higher on a weekend or traffic on StackOverflow could be higher on a weekday when people are working, etc.
The weekday() method returns an integer value for the day of the week, where Monday is 0 and Sunday is 6. But if you wanted it to return the weekday value between 1 and 7, like in a real-world scenario, you should use isoweekday():
What Week is it?
Alright, you know the day of the week, but do you know what week of the year is it? This is another very important feature that you can generate from the given date in a dataset.
Sometimes the value of the target variable might be higher during certain times of the year. For example, the sales of products on e-commerce websites are generally higher during vacations.
You can get the week of the year by slicing the value returned by the isocalendar() method:
Leap Year or Not? Use Calendar!
Want to check whether it is a leap year or not? You will need to use the isleap() method from the calendar module and pass the year as an attribute:
Congratulations – you are living in a leap year! What did you do with the extra day? Oh, you missed it? Don’t worry! Just take a day this month and do the stuff that you love! But where are you going? You got your calendar right here!
Not free this month? You can have a look at the entire calendar for the year:
Pretty cool, right? Plan your year wisely and take out some time to do the things you love!
DateTime Formats
The Datetime module lets you interchange the format of DateTime between a few options.
First up is the ISO format. If you wanted to create a DateTime object from the string form of the date in ISO format, use the fromisoformat() method. And if you intended to do the reverse, use the isoformat() method:
If you wanted to convert DateTime into a string format, you could use the ctime() method. This returns the date in a string format. And if you wanted to extract just the date from that, well, you would have to use slicing:
And if none of these functions strike your fancy, you could use the format() method which lets you define your own format:
Wait – what are these arguments I passed to the function? These are called formatted string codes and we will look at them in detail in the next section.
Advanced DateTime Formatting with Strptime & Strftime
These functions are very important as they let you define the format of the DateTime object explicitly. This can give you a lot of flexibility with handling DateTime features.
strptime() creates a DateTime object from a string representing date and time. It takes two arguments: the date and the format in which your date is present. Have a look below:
You define the format using the formatting codes as I did above. There are a number of formatting codes and you can have a look at them in the documentation.
The stftime() method, on the other hand, can be used to convert the DateTime object into a string representing date and time:
But you can also extract some important information from the DateTime object like weekday name, month name, week number, etc. which can turn out to be very useful in terms of features as we saw in previous sections.
Timedelta
So far, we have seen how to create a DateTime object and how to format it. But sometimes, you might have to find the duration between two dates, which can be another very useful feature that you can derive from a dataset. This duration is, however, returned as a timedelta object.
As you can see, the duration is returned as the number of days for the date and seconds for the time between the dates. So you can actually retrieve these values for your features:
But what if you actually wanted the duration in hours or minutes? Well, there is a simple solution for that.
timedelta is also a class in the DateTime module. So, you could use it to convert your duration into hours and minutes as I’ve done below:
Now, what if you wanted to get the date 5 days from today? Do you simply add 5 to the present date?
Not quite. So how do you go about it then? You use timedelta of course!
timedelta makes it possible to add and subtract integers from a DateTime object.
DateTime in Pandas
We already know that Pandas is a great library for doing data analysis tasks. And so it goes without saying that Pandas also supports Python DateTime objects. It has some great methods for handling dates and times, such as to_datetime() and to_timedelta().
DateTime and Timedelta objects in Pandas
The to_datetime() method converts the date and time in string format to a DateTime object:
You might have noticed something strange here. The type of the object returned by to_datetime() is not DateTime but Timestamp. Well, don’t worry, it is just the Pandas equivalent of Python’s DateTime.
We already know that timedelta gives differences in times. The Pandas to_timedelta() method does just this:
Here, the unit determines the unit of the argument, whether that’s day, month, year, hours, etc.
Date Range in Pandas
To make the creation of date sequences a convenient task, Pandas provides the date_range() method. It accepts a start date, an end date, and an optional frequency code:
Instead of defining the end date, you could define the period or number of time periods you want to generate:
Making DateTime Features in Pandas
Let’s also create a series of end dates and make a dummy dataset from which we can derive some new features and bring our learning about DateTime to fruition.
Perfect! So we have a dataset containing start date, end date, and a target variable:
We can create multiple new features from the date column, like the day, month, year, hour, minute, etc. using the dt attribute as shown below:
Our duration feature is great, but what if we would like to have the duration in minutes or seconds? Remember how in the timedelta section we converted the date to seconds? We could do the same here!
Great! Can you see how many new features we created from just the dates?
Now, let’s make the start date the index of the DataFrame. This will help us easily analyze our dataset because we can use slicing to find data representing our desired dates:
Awesome! This is super useful when you want to do visualizations or any data analysis.
End Notes
I hope you found this article on how to manipulate date and time features with Python and Pandas useful. But nothing is complete without practice. Working with time series datasets is a wonderful way to practice what we have learned in this article.
I recommend taking part in atime series hackathon on the DataHack platform. You might want to go through this and this article first in order to gear up for that hackathon.
Related
FAQs
What is all about datetime in Python? ›
datetime in Python is the combination between dates and times. The attributes of this class are similar to both date and separate classes. These attributes include day, month, year, minute, second, microsecond, hour, and tzinfo.
How to deal with datetime in pandas? ›Pandas has a built-in function called to_datetime()that converts date and time in string format to a DateTime object. As you can see, the 'date' column in the DataFrame is currently of a string-type object. Thus, to_datetime() converts the column to a series of the appropriate datetime64 dtype.
What is the maximum possible datetime in Python? ›The most negative timedelta object, timedelta(-999999999) . The most positive timedelta object, timedelta(days=999999999, hours=23, minutes=59, seconds=59, microseconds=999999) .
What data types does pandas use for datetime? ›Timestamp is the pandas equivalent of python's Datetime and is interchangeable with it in most cases. It's the type used for the entries that make up a DatetimeIndex, and other timeseries oriented data structures in pandas.
What is the difference between date and datetime in Python? ›datetime – Allows us to manipulate times and dates together (month, day, year, hour, second, microsecond). date – Allows us to manipulate dates independent of time (month, day, year).
What is the difference between time and datetime in Python? ›time – refers to time independent of the day (hour, minute, second, microsecond). datetime – combines date and time information. timedelta – represents the difference between two dates or times.
What is the limit of datetime in pandas? ›Since pandas represents timestamps in nanosecond resolution, the time span that can be represented using a 64-bit integer is limited to approximately 584 years.
How to select datetime between two dates in pandas? ›In order to select rows between two dates in pandas DataFrame, first, create a boolean mask using mask = (df['InsertedDates'] > start_date) & (df['InsertedDates'] <= end_date) to represent the start and end of the date range. Then you select the DataFrame that lies within the range using the DataFrame.
How to extract date and time from datetime in pandas? ›- Define a dataframe.
- Apply pd.to_datetime() function inside df['datetime'] and select date using dt.date then save it as df['date']
- Apply pd.to_datetime() function inside df['datetime'] and select time using dt.time then save it as df['time']
- Create a. datetime. timedelta. datetime. object by calling datetime. timedelta(duration=n) . ...
- Add the timedelta object to the datetime object to create a new datetime object with the added time.
How to convert datetime to string in pandas? ›
Use astype() to Change datetime to String Format
You can use this if the date is already in the format you want it in string form. The below example returns the date as a string with format %Y/%m/%d . dtype of column ConvertedDate will be object ( string ).
- from datetime import datetime.
-
- date_time_str = '18/09/19 01:55:19'
-
- date_time_obj = datetime. strptime(date_time_str, '%d/%m/%y %H:%M:%S')
-
-
- print ("The type of the date is now", type(date_time_obj))
The datetime data types are DATE, TIME, and TIMESTAMP.
What are the properties of datetime? ›DateTime Properties
It contains properties like Day, Month, Year, Hour, Minute, Second, DayOfWeek and others in a DateTime object. It specifies day of the week like Sunday, Monday etc.
The DATETIME type is used for values that contain both date and time parts. MySQL retrieves and displays DATETIME values in ' YYYY-MM-DD hh:mm:ss ' format. The supported range is '1000-01-01 00:00:00' to '9999-12-31 23:59:59' . The TIMESTAMP data type is used for values that contain both date and time parts.
How to compare 2 datetime objects in Python? ›Compare two datetime objects
Use comparison operators (like < , > , <= , >= , != , etc.) to compare dates in Python. For example, datetime_1 > datetime_2 to check if a datetime_1 is greater than datetime_2.
A date in Python is not a data type of its own, but we can import a module named datetime to work with dates as date objects.
How to convert datetime to string in Python? ›- We imported datetime class from the datetime module. ...
- The datetime object containing current date and time is stored in now variable.
- The strftime() method can be used to create formatted strings.
now() takes tzinfo as keyword argument but datetime. today() does not take any keyword arguments. datetime. now() return the current local date and time.
What is the difference between datetime now and datetime now () Python? ›datetime. now refers to the function as an object and datetime. now() executes the function and returns the current time.
How to use datetime time in Python? ›
- combine() import datetime # (hours, minutes) start_time = datetime.time(7, 0) # (year, month, day) start_date = datetime.date(2015, 5, 1) # Create a datetime object start_datetime = datetime.datetime.combine( start_date, start_time) ...
- timedelta. ...
- Timestamps. ...
- weekday() ...
- Date strings.
- DataFrame({"A":pd. to_datetime(["2021/12/25 15:30","2021/12/26 08:00"])}) df. A. 0 2021-12-25 15:30:00. ...
- df_date_and_time = df['datetime']. dt. strftime("%d-%m-%y %H:%M"). str. split(" ", expand=True) ...
- df["A"]. dt. strftime("%d-%m-%y %H:%M") 0 25-12-21 15:30.
Remarks. The value of this constant is equivalent to 23:59:59.9999999 UTC, December 31, 9999 in the Gregorian calendar, exactly one 100-nanosecond tick before 00:00:00 UTC, January 1, 10000.
How to convert datetime column to time in pandas? ›Use pandas to_datetime() function to convert the column to DateTime on DataFrame. Use the format parameter of this method to specify the pattern of the DateTime string you wanted to convert.
How to find difference between two datetime columns in pandas? ›- 1) Add-On Libraries and Data Initialization.
- 2) Example 1.1: Using the Minus Operator to Calculate Days, Hours, Minutes & Seconds.
- 3) Example 1.2: Using the Minus Operator to Calculate Total Seconds.
- 4) Example 2: Using a Custom Function.
Converting a String to a datetime object using datetime.strptime() The datetime.strptime() method returns a datetime object that matches the date_string parsed by the format. Both arguments are required and must be strings.
How to separate year from datetime in pandas? ›Pandas Extract Year using Datetime.
strftime() method takes the datetime format and returns a string representing the specific format. You can use %Y as format code to extract the year from the DataFrame.
Use pandas DatetimeIndex() to Extract Month and Year
Also, to extract the month and year from the pandas Datetime column, use DatetimeIndex. month attribute to find the month and use DatetimeIndex. year attribute to find the year present in the date.
For example, you can specify pd. Timedelta(hours=5) to simply add five hours to a datetime value.
How do you split time into hours and minutes in Python? ›- First, Create the timedelta object by passing seconds to it.
- Next, convert the timedelta object to a string.
- Next, split the string into individual components to get hours, minutes, and seconds.
How to convert time to timestamp in datetime Python? ›
- To begin, we use the datetime. now() function in Python to obtain the current date and time.
- Then, to the datetime, we pass the current datetime. timestamp() function to obtain the UNIX timestamp.
How to Get the Current Time with the datetime Module. To get the current time in particular, you can use the strftime() method and pass into it the string ”%H:%M:%S” representing hours, minutes, and seconds.
How to get day from datetime variable in Python? ›Use the strftime() method of a datetime module to get the day's name in English in Python. It uses some standard directives to represent a datetime in a string format. The %A directive returns the full name of the weekday.
How to convert datetime to specific format in Python? ›To convert a datetime object into a string using the specified format, use datetime. strftime(format). The format codes are standard directives for specifying the format in which you want to represent datetime. The%d-%m-%Y%H:%M:%S codes, for example, convert dates to dd-mm-yyyy hh:mm:ss format.
How do I change the date format in pandas? ›To change the datetime format from YYYY-MM-DD to DD-MM-YYYY use the dt. strftime('%d-%m-%Y') function.
How to check the format of date in pandas? ›strftime() method. For example, you can choose to display the output date as MM/DD/YYYY by specifying dt. strftime('%m/%d/%Y') . There you go!
How to convert datetime to mm dd yyyy format in Python? ›Use datetime. strftime(format) to convert a datetime object into a string as per the corresponding format . The format codes are standard directives for mentioning in which format you want to represent datetime. For example, the %d-%m-%Y %H:%M:%S codes convert date to dd-mm-yyyy hh:mm:ss format.
How to split date and time from datetime in Python? ›- Step 1 - Import the library. import pandas as pd. ...
- Step 2 - Setting up the Data. We have created an empty dataframe then we have created a column 'date'. ...
- Step 3 - Creating features of Date Time Stamps. We have to split the date time stamp into few features like Year, Month, Day, Hour, Minute and Seconds.
The DateTime value type represents dates and times with values ranging from 00:00:00 (midnight), January 1, 0001 Anno Domini (Common Era) through 11:59:59 P.M., December 31, 9999 A.D. (C.E.) in the Gregorian calendar. Time values are measured in 100-nanosecond units called ticks.
What is DateTime used for? ›datetime.datetime - represents a single point in time, including a date and a time. datetime.date - represents a date (year, month, and day) without a time. datetime.time - represents a time (hour, minute, second, and microsecond) without a date.
What is the function of DateTime? ›
A DateTime function performs an action or calculation on a date and time value. Use a DateTime function to add or subtract intervals, find the current date, find the first or last day of the month, extract a component of a DateTime value, or convert a value to a different format.
What does the DateTime () function do? ›The DATETIME function is a combination of DATE and TIME functions. DATE returns the sequential serial number that represents a particular date.
What is a datetime variable? ›DateTime (Type of variable) In french: DateHeure. The DateTime type is used to easily handle a date and a time. The DateTime type can be used to perform calculations (subtraction, addition, ...) on the dates, times and durations. For more details, see Operations that can be performed on the DateTime type.
What data types are datetime? ›Data type | Format | Range |
---|---|---|
date | YYYY-MM-DD | 0001-01-01 through 9999-12-31 |
smalldatetime | YYYY-MM-DD hh:mm:ss | 1900-01-01 through 2079-06-06 |
datetime | YYYY-MM-DD hh:mm:ss[.nnn] | 1753-01-01 through 9999-12-31 |
datetime2 | YYYY-MM-DD hh:mm:ss[.nnnnnnn] | 0001-01-01 00:00:00.0000000 through 9999-12-31 23:59:59.9999999 |
The date. today() function returns the date object, which is assigned to the today variable.
How to convert a string to datetime? ›Converting a String to a datetime object using datetime.strptime() The datetime.strptime() method returns a datetime object that matches the date_string parsed by the format. Both arguments are required and must be strings.
How to format date in Python datetime? ›To convert a datetime object into a string using the specified format, use datetime. strftime(format). The format codes are standard directives for specifying the format in which you want to represent datetime. The%d-%m-%Y%H:%M:%S codes, for example, convert dates to dd-mm-yyyy hh:mm:ss format.