ValueError: Duplicate column labels (ambiguous columns)
Duplicate column labels cause ambiguity and errors
$ python -c "import pandas as pd; df = pd.DataFrame([[1,2]], columns=['a','a']); print(df['a'])"
a a
0 1 2
# Some operations expect unique column labels and may raise an error later, for example when pivoting
$ python - << 'PY'
import pandas as pd
df = pd.DataFrame({'x': [1, 1], 'key': [10, 10], 'val': [3, 4]})
# pivot requires unique index/columns combinations - duplicates cause a ValueError when reshaping
try:
df.pivot(index='x', columns='key', values='val')
except Exception as e:
print(type(e).__name__ + ':', e)
PY
Why this happens
Pandas allows duplicate column labels but many operations rely on unique column names. When columns are duplicated, selection and reshape operations can be ambiguous or fail. For example, pivot and unstack assume unique index-level combinations.
Fix
Give columns unique names, use a MultiIndex to disambiguate, or use groupby/aggregation to reduce duplicates before reshaping. Use df.columns.duplicated() to detect duplicates and df.columns = df.columns.str.replace or df.columns = pd.io.parsers.base_parser.ParserBase({'names': df.columns})._maybe_dedup_names(df.columns) to make them unique. A simple way is to append suffixes or use df.rename.
Wrong code
import pandas as pd
# Duplicate column labels make some operations ambiguous
df = pd.DataFrame([[1, 2]], columns=['a', 'a'])
print(df)
# Other operations like pivot/unstack may fail on ambiguous labels
Fixed code
import pandas as pd
# Ensure columns are unique before doing operations that require them
df = pd.DataFrame([[1, 2]], columns=['a', 'a'])
# Method 1: Rename duplicates (append suffixes)
new_cols = []
counts = {}
for col in df.columns:
counts[col] = counts.get(col, 0) + 1
suffix = counts[col] if counts[col] > 1 else ''
new_cols.append(f"{col}{('' if suffix=='' else '_' + str(suffix))}")
df.columns = new_cols
print(df)
# Method 2: Use a unique index for pivot/reshape or aggregate duplicates
# Example: pivot after aggregating
df2 = pd.DataFrame({'x': [1, 1], 'key': [10, 10], 'val': [3, 4]})
agg = df2.groupby(['x', 'key']).sum().reset_index()
print(agg.pivot(index='x', columns='key', values='val'))