How To Keep Columns In Python

When working with large datasets, it’s not uncommon to encounter cases where you only need a subset of columns. In this blog post, we’ll go over how to keep specific columns in Python using the popular data manipulation library pandas.

Getting Started

First, let’s make sure you have pandas installed. If you don’t have it yet, you can install it by running the following command in your terminal or command prompt:

pip install pandas

Loading a Dataset

Let’s start by loading a sample dataset using pandas. Here, we will use the read_csv function to load a CSV file. You can replace the file path with your own CSV file.

import pandas as pd

data = pd.read_csv("your_file.csv")
print(data.head())

Selecting Columns

Now that we have our dataset loaded, let’s say we only want to keep specific columns. There are a few ways to do this in pandas.

Method 1: Using Bracket Notation

One way to select specific columns is by using bracket notation, passing a list of the column names you want to keep. For example, let’s say we want to keep only columns ‘A’ and ‘B’:

selected_columns = data[['A', 'B']]
print(selected_columns.head())

Method 2: Using the filter() Function

Another way to select columns is by using the filter() function, which allows us to filter columns by specifying the items parameter. This method is particularly useful when you have a long list of columns to select.

selected_columns = data.filter(items=['A', 'B'])
print(selected_columns.head())

Method 3: Using the drop() Function

Instead of selecting the columns to keep, you can also drop the columns you don’t want to keep by using the drop() function. To do this, you need to set the axis parameter to 1 (column-wise) and pass the list of column names you want to drop.

dropped_columns = data.drop(['C', 'D'], axis=1)
print(dropped_columns.head())

Conclusion

In this blog post, we covered how to keep specific columns in a dataset using pandas in Python. We showed three different methods to achieve this: using bracket notation, using the filter() function, and using the drop() function. Keep these methods in your toolkit as you continue to work with large datasets in Python.