Compute the significance of a feature. — features.importance • phovafer

Compute the importance of each feature that identifies PV and non-PV consumers based on the p-value of kruskal test.

features.importance(feature.PV.NPV, label.PV)

Arguments

feature.PV.NPV	A numeric vector of feature values of a certain feature computed from all the PV and non-PV load profiles.
label.PV	A 0-1 vector with 0 and 1 demonstrate PV and non-PV consumers, respectively.

Value

p.value --- The p-value used to demonstrate the significance of the feature that differs in PV and non-PV load profile. The smaller the p-value, the more significant the feature is. Usually, those features with p-values < 0.05 should be regarded as significant features.

Examples

# Load the required packages.

rm(list=ls())

library(matrixStats)

### Load the smart meter readings and the simulated PV generations.

data("loadsample")
data("pvSample")

### Get the load profiles of both PV and non-PV consumers.

N.house<-ncol(loadsample[,-c(1:6)]) # N.house is the total number of households.

NPV.day<-lapply(loadsample[,-c(1:6)], load.daily, load.date=loadsample[,1], num.obs=48)

### Assuming 30% of the households use PV. Generate the PV samples.

perc=0.3
pv.house<-floor(N.house*perc) # pv.house is the number of households which are PV users.
M<-ncol(pvSample[,-c(1:6)])
pv.rand<-sample(1:M, pv.house, replace = TRUE, prob = rep(1/M, times=M))
pvSample<-cbind(pvSample[ ,c(1:6)], pvSample[ ,-c(1:6)][ ,pv.rand])

date.PV.NPV<-unique(pvSample$date)
PV.day<-lapply(pvSample[,-c(1:6)], load.daily, load.date=pvSample$date, num.obs=48)

house_pv<-sample(N.house)[1:pv.house] # Randomly selected PV households.

# Compute the real load demand of the PV consumers using the smart meter readings (loadsample),
# and the simulated PV samples (pvSample).
demand_pv<-vector(mode = "list", length = length(PV.day))

for (k in 1:pv.house){
  demand_pv[[k]]<-NPV.day[[house_pv[k]]]-PV.day[[k]]
}
demand_npv<-NPV.day[-house_pv]

# Label the PV consumers with 0, and non-PV consumers with 1.
label.PV<-c(rep(0, times=length(house_pv)),
           rep(1, times=length(demand_npv)))
label.PV<-factor(label.PV)

### Prepare the features used to identify PV and non-PV users.

date.week<-weekdays(as.Date(date.PV.NPV))
x.week<-date.week
mor.be<-13 # It presents 06:00:00.
aft.be<-39 # It presents 19:00:00.

feature.PV<-lapply(demand_pv, features.load, x.week, mor.be, aft.be)
feature.NPV<-lapply(demand_npv, features.load, x.week, mor.be, aft.be)

### Get the name of the features.

features.name<-feature.PV[[1]]$features.name # Get the name of the computed features.

### Prepare the feature matrix.

feature.PV.new<-feature.PV[[1]]$output.features
for (i in 2:length(feature.PV)){
  feature.PV.new<-rbind(feature.PV.new, feature.PV[[i]]$output.features)
}

feature.NPV.new<-feature.NPV[[1]]$output.features
for (i in 2:length(feature.NPV)){
  feature.NPV.new<-rbind(feature.NPV.new, feature.NPV[[i]]$output.features)
}

### Compute the significance of each feature and select the 12 most significant features.

feature.PV.NPV<-rbind(feature.PV.new, feature.NPV.new)

p.value<-apply(feature.PV.NPV, 2, features.importance, label.PV)